I'm working with next generation sequencing on a daily basis and hence interpreting a lot of coverage analysis reports to decode the quality of sequencing runs. I'm using the Ion Torrent technology and targeted sequencing.
A coverage analysis report consists of:
- mapped reads (in millions) - How many reads that have been mapped to the reference genome.
- mean depth - a summary statistic for reads that are assigned to specific amplicons.
- On target (%) - the percentage of reads that were mapped in the target regions file to any targeted region of the reference.
- Uniformity (%) - Percentage of bases in targeted regions that is covered by at least 20% of the mean depth.
The aim is to create a single number that gives a quick interpretation in form of a score. This is to efficiently determine if the sequencing is of a quality that can be used in downstream analysis.
The standard parameters for an accepted sequencing in our lab are:
- Mapped reads: 5000000
- Mean depth: 1000
- On target: 80%
- Uniformity: 80%
However, coverage reports can vary a lot, hence a score would be ideal for the assessment.
The equation so far:
Given the above mentioned parameters, would give a SeqScore of 0.090. Meaning that a sequencing with a SeqScore > 0.090 would be of bad quality and a sequencing with a SeqScore ≤ 0.090 would be accepted.
Examples: Sequencing 1.
- Mapped reads: 6902500
- Mean depth: 850
- On target: 70%
- Uniformity: 81%
SeqScore = 0.098 (Bad)
Sequencing 2.
- Mapped reads: 4000000
- Mean depth: 1100
- On target: 75%
- Uniformity: 87% SeqScore = 0.082 (Good)
I'm not sure if this a valid way of creating a score? Constructive criticism and inputs to improve this score are very welcome.
Thank you for your time.