Improper Statistical Analysis: A Cause of Poor Translation of New Biomarkers into Clinical Practice
While efforts to discover and validate new biomarkers are increasing, very few biomarkers are being implemented in clinical practice. A cause of concern, which can be linked to improper utilization of statistical methods, is that many published "biomarkers" do not perform well in a clinical setting—later studies reveal disappointing performance relative to the published results. The majority of statistical problems regarding the analysis of biomarker data can be traced to problems with multiple hypothesis testing, model overfitting, and model validation. Through a series of simulated examples, we show that an improper analysis may result in the discovery of useless biomarkers or the publication of optimistic performance estimates of a predictive model. In addition to outlining the improper utilization of statistical methods, we also present some approaches of performing an appropriate analysis and demonstrate the utility of such approaches. It is our opinion that as future physicians and scientists learn about, utilize and promote the practice of proper statistical methods, the pursuit of biomarkers will more effectively result in the discovery of those that can be utilized in clinical practice.
Biomarkers Definitions Working Group. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clinical Pharmacology and Therapeutics. 2001; 69(3): 89-95. doi: 10.1067/mcp.2001.113989.
Drucker E, Krapfenbauer K. Pitfalls and limitations in translation from biomarker discovery to clinical utility in predictive and personalised medicine. EPMA J. 2013; 4(1): 7. doi: 10.1186/1878-5085-4-7.
Diamandis EP. Cancer biomarkers: can we turn recent failures into success?. Journal of the National Cancer Institute. 2010; 102: 1-6. doi: 10.1093/jnci/djq306.
Visintin I, Feng Z, Longton G, et al. Diagnostic markers for early detection of ovarian cancer. Clinical Cancer Research. 2008; 14(4): 1065-72. doi: 10.1158/1078-0432.CCR-07-1569.
Pavlou MP, Diamandis EP, Blasutig IM. The long journey of cancer biomarkers from the bench to the clinic. Clinical chemistry. 2013; 59(1): 147-157. doi: 10.1373/clinchem.2012.184614.
Zhang Z, Chan DW. Cancer proteomics: in pursuit of “true” biomarker discovery. Cancer Epidemiology Biomarkers & Prevention. 2005; 14(10): 2283-6. doi: 10.1158/1055-9965.EPI-05-0774.
Lindquist M. Statistical Methods in functional MRI: Multiple Comparisons. Talk presented at: John Hopkins Bloomberg School of Public Health; April 23, 2013; Baltimore, MD. http://www.stat.columbia.edu/~martin/Tools/Lec7-MultipleComparisons.pdf. Accessed November 25, 2016.
Harrell FE, Lee KL, Mark DB. Tutorial in biostatistics multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in medicine. 1996; 15(4): 361-87. doi: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.
Harrell FE. Statistical Methods and Statistical Pitfalls in Biomarker Research. Talk presented at: Vanderbilt University Biomarker Research Summit; June 22, 2007; Nashville, TN. https://www.researchgate.net/profile/Frank_Harrell/publication/237423804_Statistical_Methods_and_Statistical_Pitfalls_in_Biomarker_Research/links/551a8c360cf2f51a6fea6077.pdf. Accessed December 11, 2015.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution - Non-Commercial 4.0 International License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in the Medical Student Press Journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).