‘Is it possible to diagnose infections by analyzing the cytokine spectrum?‘
Rather than cytokine spectrum alone, blood gene signatures of not just cytokines but other genes as well are being mooted as diagnostic possibilities for infections. For this approach to become a practical reality, initial promising reports from some basic research labs need to be replicated by others using other, larger patient data-sets, and one or more such approach needs to be spun out to commercial entities having the resources necessary to accelerate their translation to clinical medicine. Currently, this type of research is still in the earliest stages of discovery work.
Irony is posting this answer after the US SEC charged Theranos, Elizabeth Holmes and Sunny Balwani with fraud (, ) only serves to emphasize just how important and lucrative it’s becoming to rapidly and accurately diagnose not just infections but also other diseases using just a drop of blood.
Potential moneymaking scope isn’t the only reason though. Increasingly irresponsible antibiotic usage over recent decades has made responsible antibiotic stewardship an urgent medical priority, where rapidly diagnosing an infection as viral rather than bacterial as early as possible could help minimize unnecessary antibiotic use (, , ).
As illustrative examples of proof of principle, this answer focuses on the work being done in the lab of Purvesh Khatri, a computational immunologist at Stanford University who has in recent years published a few high-profile studies on this topic (). The approach is something we’re going to see more of in the coming years, scouring publicly available large genetic data-sets to discern clinically important patterns.
Khatri’s group examined such data-sets obtained from either whole blood or circulating blood cells (PBMC) from both healthy individuals as well as those with bacterial or viral infections or with other types of inflammatory conditions. Specifically, his group examines publicgene expression databases such as (NCBI) GEO (Gene Expression Omnibus) looking to see if gene expression patterns map to specific infections.
Microarray analysis attempts to find differentially expressed genes between different sets of samples while examining thousands and even tens of thousands of genes simultaneously.
In their 2015 study, Khatri’s group analyzed 205 samples in 3 data-sets that included healthy controls as well as patients with respiratory viral or bacterial infections (),
- They identified 396 genes as being differentially expressed in bacterial versus viral respiratory infections.
- They identified a unique signature for flu that set it apart from other viral infections.
- They could even separately identify asymptomatic flu patients who were shedding the flu virus as well as those with flu-like symptoms but who weren’t infected with it.
In their 2016 study (),
- They identified 7 genes they claimed could differentiate bacterial or viral infections.
- Validated these gene signatures in a group of 96 critically ill children.
- They identified a 3-gene signature they claim could distinguish patients with active or latent tuberculosis, another infection or no infection. Specifically, they claim this test is far better at identifying those without TB unlike standard TB tests which often miss it in those patients who are unable to spit up sufficient amount of needed for diagnosis.
While such basic research data appears promising at first blush, using microarray data for clinical diagnosis comes with its own set of limitations that require mitigation in order to make such an approach practical.
- Need a robust approach to distinguish true from false positives.
- Data reproducibility between data-sets. Since it can and often is the case that results range all the way from a given gene being found significantly differentially expressed to borderline significance to not significant at all, effect sizes are of greater importance. All the more reason for biologists to wean themselves away from the unfortunate tendency to focus on significance (p value) to the exclusion of more biologically meaningful measures.
- of some publicly available human microarray data-sets suggest there’s greater value not in single large studies for a given disease but rather in larger numbers of smaller studies that are moderately powered ( ) ( ).
- Not all human microarray data studies make all their data publicly available. This is a science policy, not research matter, issue that can only be remedied by funders and regulators.
- Researchers tend to focus their attention on better annotated genes, which are usually simply those that started being studied earlier, rather than on those that appear to have the strongest evidence supporting their role in a given disease. Such stereotypical ‘looking under the lamp-post‘ attitude stymies discovery of most relevant genetic signatures associated with a particular disease (see below from ).
‘Collectively, our results provide an evidence of a strong research bias in literature that focuses on well-annotated genes instead of those with the most significant disease relationship in terms of both expression and genetic variation. We show that the inequality follows a “rich-getting-richer” pattern, where annotation growth is biased towards genes that were richly annotated in the initial versions of GO [Gene Ontology]. We believe this stems from the typical experimental design. To illustrate this, consider an omics experiment that generates a list of hundreds or thousands of interesting genes. To interpret these genes, researchers use GO and pathway analysis tools. The researchers then generate targeted hypotheses for validation by interpreting the list of significant GO terms, focusing the genes or proteins annotated with that GO term. The researchers learn more about those targeted genes, leading to additional GO annotations for the already annotated genes. In this process, the list of unannotated genes is simply ignored because pathway analysis tools cannot map them to any GO terms. Hence, the self-perpetuating cycle of inequality continues.
While focusing research on the best characterized genes may be natural because it is easy to formulate a mechanistic hypothesis of the gene’s function in disease, we propose that the researchers in the era of omics should instead allow data to drive their hypotheses. We have repeatedly shown that expanding research outside of the streetlight of well characterized genes identifies novel disease-gene relationships35–37, identifies FDA-approved drugs that can be repurposed for other diseases27, and identifies clinically translatable diagnostic and prognostic disease signatures27,30–34,39. For example, we have previously identified PTK7 as causally involved in non-small cell lung cancer37. At the time of publication, PTK7 was labelled as an orphan tyrosine kinase receptor. In a very short span, this finding was transformed into an antibody-drug conjugate targeting PTK7 that induced sustained tumor regression, outperformed standard-of-care chemotherapy, and reduced frequency of tumor-initiating cells in a preclinical study45. A Phase 1 clinical trial (NCT02222922) of PTK7 antibody drug conjugate, PF-06647020, has already completed with acceptable and manageable safety profile, and is now being considered for further clinical development. To enable researchers to pursue data-driven hypotheses, we have made our rigorously validated gene expression multicohort analysis data publicly available () where it may be explored based on either diseases or genes of interest29,46. Focusing on genes with the strongest molecular evidence instead of the most annotations would enable researchers to break the self-perpetuating annotation inequality cycle that results in research bias.’
4. MacDougall, Conan, and Ron E. Polk. “Antimicrobial stewardship programs in health care systems.” Clinical microbiology reviews 18.4 (2005): 638-656.
5. Morency-Potvin, Philippe, David N. Schwartz, and Robert A. Weinstein. “Antimicrobial stewardship: how the microbiology laboratory can right the ship.” Clinical microbiology reviews 30.1 (2017): 381-407.
6. SOHN, EMILY. “Frontiers in blood testing.”
7. Andres-Terre, Marta, et al. “Integrated, multi-cohort analysis identifies conserved transcriptional signatures across multiple respiratory viruses.” Immunity 43.6 (2015): 1199-1211.
8. Sweeney, Timothy E., Hector R. Wong, and Purvesh Khatri. “Robust classification of bacterial and viral infections via integrated host gene expression diagnostics.” Science translational medicine 8.346 (2016): 346ra91-346ra91.
9. Sweeney, Timothy E., et al. “Methods to increase reproducibility in differential gene expression via meta-analysis.” Nucleic acids research 45.1 (2016): e1-e1.
10. Haynes, Winston A., Aurelie Tomczak, and Purvesh Khatri. “Gene annotation bias impedes biomedical research.” Scientific Reports 8.1 (2018): 1362.