, ,

Not only does analyzing statistics count as research, proper application of statistics is the cornerstone of credible research. I’ll use the example of biomedical research to elucidate the decisive role of statistics in research.

Home – ClinicalTrials.gov is a human clinical trial database run by the US National Library of Medicine (ClinicalTrials.gov). Currently, it lists 196,459 studies with locations in all 50 [US] States and in 190 countries. That’s a lot of trials.

  • How could we assess the value and validity of the data human clinical trials generate for a given disease?
  • If a particular trial concludes that a treatment approach is 90% effective, can we/should we take it at face value?

This is where statistics come in. In particular, a statistical methodology called Meta-analysis. In meta-analysis, data from numerous clinical trials are statistically analyzed in order to rigorously query the methodology they used and the data they generated, and to identify as-yet-unseen patterns among the data. Combining results from many studies in this manner yields more robust data. How?

  • For one, when we aggregate data from different studies, we end up with larger data sets. For example, larger number of study patients in the case of human clinical trials. Obviously, as an example of a hypothetical meta-analysis, aggregate data from 10000 individuals is more applicable to the population at large compared to data from 1000 individuals from a single trial.
  • For another, biases inherent to the study researchers in one trial get minimized (normalized) when multiple studies are combined together for analysis. And bias is inherent in all of us, even when we set up a study with the best of intentions to minimize bias in its design.
  • Further, subjecting multiple studies to such rigorous, impartial scrutiny reveals flaws, weaknesses, deficiencies that serve as signposts to the larger research community, enabling them to get a better grasp of their own research field by helping them sift out and separate relevant studies from the more questionable ones.

Of course, meta-analyses also have flaws such as their unavoidable reliance on published studies. Since negative results hardly ever get published, this can result in skewing. Nevertheless, in the aggregate, benefits of meta-analyses outweigh their drawbacks. In fact, over the years a particular kind of clinical trial meta-analysis called the Cochrane Library has become the gold standard for rigorous statistical assessment of clinical trial data, helping guide the development of Evidence-based medicine.

I will even go so far as to say that lack of rigorous statistics in basic biomedical research is a key reason for its current reproducibility crisis (1, 2, 3). My personal journey in science enabled me to understand that this is a major problem.

  • My Ph.D. project on mycobacteria addressed a question of relevance to the largest vaccine trial for tuberculosis (TB), the South Indian BCG vaccine trial.
  • Conducted in South India on ~360,000 people spread across 209 villages and 1 town, it showed that BCG didn’t protect against adult TB. Why?
  • While that question still hasn’t been answered conclusively, prior exposure to environmental mycobacteria emerged as a plausible factor.
  • What kinds of environmental mycobacteria and in which environments? In a nutshell, my Ph.D project had to answer these questions. But how could I, just one person, cover such a vast area?
  • My road map was an experiment design worked out with the help of a trained statistician.
  • Every step of my Ph.D project was rigorously vetted and co-ordinated by this trained statistician using rigorous statistical tools.
  • And it didn’t stop with just the experiment design but worked its way through every step of the research process.
  • In fact, I still remember how I’d come back from the study area with my bottles of soil, water and dust samples, and hand over my sample list to the statistician. He’d come back with a list of randomly generated numbers. I’d turn my back, and he’d rub out all my labels and re-label my sample bottles with those random numbers. Samples would be decoded only after I’d generated all the data from each sample. That’s how rigorous the study was.

Then I came to the US and found, instead of equivalent rigor, a process of basic biomedical research that was notable for being totally arbitrary and capricious.

  • In basic biomedical research, scientists design experiments how they like, can and do drop animals from the final data set on a whim, and then use statistics rather loosely and arbitrarily at the back end to try to make sense of the data they generate.
  • In fact, misuse statistics is a more accurate way of putting it since statistics isn’t used to draw up the study design at the front end, rather only for analyzing the data at the back end.
  • Even more shockingly, consulting trained statisticians anywhere in this process is a rarity, certainly not the norm.
  • Thus, increasingly over the last few decades, even as the mouse model became the de facto experimental model in both basic and applied biomedical research, an anything-goes, Wild West type of science culture accompanied its experiment design and data analysis.
  • Ironically, at the same time, the statistical science behind human clinical trials only became more, not less, rigorous.
  • We may finally be coming to the proverbial light at the end of tunnel with the recent publication of the first randomized clinical trial in mice (4).
  • Applying the rigorous statistically undergirded clinical trial model to basic biomedical research, i.e., preclinical research, makes the data generated from such models more rigorous and therefore more credible.
  • And it also increases opportunity for those trained in statistics because their expertise is now needed and valued across a larger spectrum of biomedical research rather than remaining confined to the purview of human clinical trials alone.

This is why even though I’ve worked in basic research since I came to the US, I know that none of this work approaches the rigor of my Ph.D. study. After all, it was the only one where statistics were properly applied from beginning to end by a trained statistician, starting with the study design itself. So an emphatic yes, in my book, analyzing and organizing statistics counts as research of the highest order, as long as it’s done with guidance and input from trained statisticians rather than at the dictate of statistical novices, be they colleagues or bosses.


  1. Prinz, Florian, Thomas Schlange, and Khusru Asadullah. “Believe it or not: how much can we rely on published data on potential drug targets?.” Nature reviews Drug discovery 10.9 (2011): 712-712.  Page on wustl.edu
  2. Begley, C. Glenn, and Lee M. Ellis. “Drug development: Raise standards for preclinical cancer research.” Nature 483.7391 (2012): 531-533. Page on mckeonreview.org.au
  3. Landis, Story C., et al. “A call for transparent reporting to optimize the predictive value of preclinical research.” Nature 490.7419 (2012): 187-191. Page on nih.gov
  4. Llovera, Gemma, et al. “Results of a preclinical randomized controlled multicenter trial (pRCT): Anti-CD49d treatment for acute brain ischemia.” Science Translational Medicine 7.299 (2015): 299ra121-299ra121. Page on sciencemag.org