Abstract:Eukaryotic transcriptome is diverse and complex. RNA sequencing (RNA-seq) can be used to provide a global profile of the transcriptome, find biomarkers associated with dieases and support the diagnosis and treatment of complex diseases. The process of RNA-seq includes sample preparation, library preparation, sequencing and analysis. The technical errors that are introduced during all the workflow influence the accuracy of RNA-seq. These errors can be understood and mitigated through the use of RNA spike-in, which can determine quality control for RNA-seq and improve the reliability of RNA-seq data. Therefore, RNA spike-in and its applications are introduced, including normalization between samples, gene expression measurements, detection and quantification of diverse transcripts, which provides powerful reference for improving the quality of RNA-seq by using RNA spike-in effectively.
Voelkerding K V, Dames S A, Durtschi J D. Next-generation sequencing: from basic research to diagnostics [J]. Clin Chem, 2009, 55(4): 641-658.
[24]
Paul L, Kubala P, Horner G, et al. SIRVs: spike-in RNA variants as external isoform controls in RNA-sequencing [J]. bioRxiv, 2016, https://doi.org/10.1101/080747.
[34]
Andrews T S, Hemberg M. Identifying cell populations with scRNASeq [J]. Mol Aspects Med, 2018, 59: 114-122.
[2]
Serratì S, Simona D S, Brunella P, et al. Next-generation sequencing: advances and applications in cancer diagnosis [J]. Onco Targets Ther, 2016, 9: 7355-7365.
[4]
Lockwood W W, Wilson I M, Coe B P, et al. Divergent genomic and epigenomic landscapes of lung cancer subtypes underscore the selection of different oncogenic pathways during tumor development [J]. PloS One, 2012, 7(5): e37775.
[7]
Yang Z, Zhuang B, Yan Y, et al. Identification of gene markers in the development of smoking-induced lung cancer [J]. Gene, 2016, 576(1): 451-457.
[9]
Tomei S, Marchetti I, Zavaglia K, et al. A molecular computational model improves the preoperative diagnosis of thyroid nodules [J]. BMC Cancer, 2012, 12(1): 1-10.
[10]
Panebianco F, Mazzanti C, Tomei S, et al. The combination of four molecular markers improves thyroid cancer cytologic diagnosis and patient management [J]. BMC Cancer, 2015, 15(1): 1-11.
[12]
van Dijk E L, Auger H, Jaszczyszyn Y, et al. Ten years of next-generation sequencing technology [J]. Trends Genet, 2014, 30(9): 418-426.
[14]
Shi L, Reid L H, Jones W D,et al. The MicroArray Quality Control (MAQC) project shows inter and intra platform reproducibility of gene expression measurements [J]. Nat Biotechnol, 2006, 24(9): 1151-1161.
[22]
External RNA Controls Consortium. Proposed methods for testing and selecting the ERCC external RNA controls [J]. BMC Genomics, 2005, 6: 150.
[32]
Lovén J, Orlando D A, Sigova A A, et al. Revisiting global gene expression analysis [J]. Cell, 2012, 151(3): 476-482.
[5]
Zhang A, Wang C, Wang S J, et al. Visualization-aided classification ensembles discriminate lung adenocarcinoma and squamous cell carcinoma samples using their gene expression profiles [J]. PloS One, 2014, 9(10): e110052.
[15]
Bunk D M. Reference materials and reference measurement procedures: an overview from a national metrology institute [J]. Clin Biochem Rev, 2007, 28(4): 131-137.
[17]
Munro S A. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures [J]. Nat Commun, 2014, 5(1): 1-10.
[19]
Mane S P, Evans C, Cooper K L, et al. Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing [J]. BMC Genomics, 2009, 10(1): 1-12.
[20]
Sims D J, Harrington R D, Polley E C, et al. Plasmid-based materials as multiplex quality controls and calibrators for clinical next-generation sequencing assays [J]. J Mol Diagn, 2016, 18(3): 336-349.
[25]
Hardwick S A, Chen W Y, Wong T, et al. Spliced synthetic genes as internal controls in RNA sequencing experiments [J]. Nat Methods, 2016, 13(9): 792-798.
[27]
Locati M D, Terpstra I, de Leeuw W C, et al. Improving small RNA-seq by using a synthetic spike-in set for size-range quality control together with a set for data normalization [J]. Nucleic Acids Res, 2015, 43(14): 86-89.
[29]
Bullard J H, Purdom E, Hansen K D, et al. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J]. BMC Bioinformatics, 2010, 11(1): 1-13.
[3]
Byron S A, Van Keuren-Jensen K R, Engelthaler D M, et al. Translating RNA sequencing into clinical diagnostics: opportunities and challenges [J]. Nat Rev Genet, 2016, 17(5): 257-271.
[6]
Price N D, Trent J, EI-Naggar A K, et al. Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas [J]. Proc Natl Acad Sci U S A, 2007, 104(9): 3414-3419.
[8]
Faquin, W C. Can a gene-expression classifier with high negative predictive value solve the indeterminate thyroid fine-needle aspiration dilemma? [J]. Cancer Cytopathol, 2013, 121(3): 116-119.
[13]
van Dijk E L, Jaszczyszyn Y, Thermes C. Thermes, Library preparation methods for next-generation sequencing: tone down the bias [J]. Exp Cell Res, 2014, 322(1): 12-20.
[16]
Jiang L, Schlesinger F, Davis C A, et al. Synthetic spike-in standards for RNA-seq experiments [J]. Genome Res, 2011, 21(9): 1543-1551.
[18]
Sprang M, Andrade-Navarro M A, Fontaine J F. Batch effect detection and correction in RNA-seq data using machine-learning-based automated assessment of quality [J]. BMC Bioinformatics, 2022, 23(6): 1-15.
[23]
Baker S C, Bauer S R, Beyer R P, et al. The External RNA Controls Consortium: a progress report [J]. Nat Methods, 2005, 2(10): 731-734.
[26]
Risso D, Ngai J, Speed T P, et al. Normalization of RNA-seq data using factor analysis of control genes or samples [J]. Nat Biotechnol, 2014, 32(9): 896-902.
[28]
Conesa A, Madrigal P, Tarazona S, et al. A survey of best practices for RNA-seq data analysis [J]. Genome Biol, 2016, 17(1): 1-19.
[30]
Robinson M D, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data [J]. Genome Biol, 2010, 11(3): 1-9.
[33]
Lovén J, Hoke H A, Lin C Y, et al. Selective inhibition of tumor oncogenes by disruption of super-enhancers [J]. Cell, 2013. 153(2): 320-334.
[35]
Vallejos C A, Risso D, Scialdone A, et al. Normalizing single-cell RNA sequencing data: challenges and opportunities [J]. Nat Methods, 2017, 14(6): 565-571.
[36]
Gierahn T, Wadsworth M H, Hughes T K, et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput [J]. Nat Methods, 2017, 14: 395-398.
[38]
Mercer T R, Gerhardt D J, Dinger M E, et al. Targeted RNA sequencing reveals the deep complexity of the human transcriptome [J]. Nat Biotechnol, 2012, 30(1): 99-107.
[48]
Stransky N, Cerami E, Schalm S, et al. The landscape of kinase fusions in cancer [J]. Nature Commun, 2014, 5(1): 1-10.
[11]
Morrissy A S, Morin R D, Delaney A, et al. Next-generation tag sequencing for cancer gene expression profiling [J]. Genome Res, 2009, 19(10): 1825-1835.
[21]
Pine P S, Munro S A, Parsons J R, et al. Evaluation of the External RNA Controls Consortium (ERCC) reference material using a modified Latin square design [J]. BMC Biotechnol, 2016, 16(1): 1-15.
[31]
Hu Z, Chen K, Xia Z, et al. Nucleosome loss leads to global transcriptional up-regulation and genomic instability during yeast aging [J]. Genes Dev, 2014, 28(4): 396-408.
[40]
Lever J, Krzywinski M, Altman N, et al. Points of significance: model selection and overfitting [J]. Nat Methods, 2016, 13(9): 703-705.
[41]
Altman N, Krzywinski M. Points of Significance: Simple linear regression [J]. Nat Methods, 2015, 12(11): 999-1000.
[43]
Tong L, Yang C, Wu P Y, et al. Evaluating the impact of sequencing error correction for RNA-seq data with ERCC RNA spike-in controls. IEEE EMBS Int Conf Biomed Health Inform, 2016: 74-77.
[45]
SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium [J]. Nat Biotechnol, 2014, 32(9): 903-914.
[46]
Li S, Tighe S W, Nicolet C M, et al. Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study [J]. Nat Biotechnol, 2014, 32(9): 915-925.
[37]
Macosko E Z, Basu A, Satija R, et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets [J]. Cell, 2015, 161(5): 1202-1214.
[39]
Heyer E E, Deveson I W, Wooi D, et al. Diagnosis of fusion genes using targeted RNA sequencing [J]. Nat Commun, 2019, 10(1): 1-12.
[47]
Frankiw L, Baltimore D, Li G. Alternative mRNA splicing in cancer immunotherapy [J]. Nat Rev Immunol, 2019, 19(11): 675-687.
[49]
Tembe W D, Pond S J K, Legendre C, et al. Open-access synthetic spike-in mRNA-seq data for cancer gene fusions [J]. BMC Genomics, 2014, 15(1): 1-9.
[42]
Deveson I W, Chen W Y, Wong T, et al. Representing genetic variation with synthetic DNA standards [J]. Nat Methods, 2016, 13(9): 784-791.
[44]
Cibulskis K, Lawrence M S, Carter S L, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples [J]. Nat Biotechnol, 2013, 31(3): 213-219.