Background: Viral metagenomics next-generation sequencing (mNGS) enables the comprehensive characterization of viral communities in clinical samples. Despite a wide range of potential applications, several hurdles need to be overcome before its implementation in clinical lab. In particular, reagent contaminations (kitome) may critically impair the interpretation of the results. A growing number of studies have developed methods to address this issue, but mainly evaluated for the identification of bacterial contaminants. In this study, we aimed to compare various approaches for the detection of viral contaminants in mNGS data.
Materials/methods: We used a data set corresponding to 236 plasma samples, prospectively collected from multiple myeloma patients. These samples were sequenced in 19 batches using a validated mNGS protocol. One no-template control (NTC) per batch was also processed through the complete workflow. The detection of viral contaminants was performed through (1) the batch effect analysis (2) the comparison of viral abundances between NTC and clinical samples with DESeq2 and (3) Decontam package which includes two methods; one is based on the relationship between read counts and biomass and the second is based on the comparison of the prevalence between NTC and patient samples.
Results: We found that the batch effect approach and methods of the Decontam package identified at random the contaminant status of viral families or genera. Conversely, the differential approach comparing viral abundances between clinical samples versus NTC provided much better classification performances. This classification was based on three criteria, including fold change, significance and background noise. Using these criteria, we generated a list of potential contaminants that mainly derived from bacteriophages or plant viruses. Interestingly, some of these contaminants were found differentially abundant between patients and could have been misinterpreted as clinically relevant.
Conclusions: To our knowledge, this is the first study assessing different computational methods for the kitome identification in viral metagenomic data. Our data highlight that specific approaches are needed to detect viral contaminants and should be systematically applied to avoid clinical misinterpretation.