We are checking the accuracy of phred quality values when run on ABI 3700 chromatograms using quality value lookup tables calibrated for ABI 373/377 chromatograms. We used phred version 990722 for the tests described here.
Our goal is to determine the accuracy of the phred quality values on 3700 data using the current lookup tables in phred and whether we must modify phred for these data.
We obtained chromatograms and consensus sequence for finished human BAC projects from the Genome Sequencing Center at Washington University in Saint Louis, selecting projects with at least 10% ABI 3700 chromatograms. These projects contain 3700 chromatograms generated using dye primer and terminator chemistries with the POP5 matrix and dye terminator chemistry with the POP6 matrix. Essentially standard run conditions were used to generate these data except for the dye terminator POP6, which were run at 37C. Table 1 summarizes the quantities of aligned reads and bases that we used for this work.
Table 1. | Aligned Reads and Bases | |||
---|---|---|---|---|
chemistry | matrix | number projects | number reads | number bases |
primer | POP5 | 9 | 5767 | 3477864 |
terminator | POP5 | 18 | 10177 | 6354554 |
terminator | POP6 | 12 | 8274 | 4354631 |
Dye Primer POP5
The dye primer POP5 quality value
accuracy plot
shows good quality value accuracy up to about quality value 25. For larger
phred quality values, the observed quality values are progressively
lower, meaning that phred underestimates the error rates. We examined
discrepancies with assigned quality values of 40 and higher and found a
greater tendency to form compressions in comparison to slab gel runs. Many
of the additional compressions have stem/loop motifs that are not a problem
with slab gels. We consider the number of aligned bases used in this test
to be marginal and hope to obtain additional data in the near future to
improve our confidence in the result.
Dye Terminator POP5
The dye terminator POP5 quality value
accuracy plot
shows consistently good agreement between the phred and observed
quality values up to quality value 30. For higher phred quality values,
the observed quality values vary around the phred values without an
apparent trend, suggesting that the variations are due to statistical
fluctuations resulting from the relatively small number of aligned bases
used for the test.
Dye Terminator POP6
The dye terminator POP6 quality value
accuracy plot
shows consistently good agreement between the phred and observed
quality values up to and slightly above quality value 30. For higher
phred quality values, the observed quality values, again, vary
around the phred values without a clear trend, suggesting that the
variations are due to the relatively small number of aligned bases.
need more data
Based on these limited data sets, it appears that the current phred version (990722) assigns quality values with good accuracy up to quality value 25 for all tested dye chemistry/matrix run combinations. For dye primer chemistry run in the POP5 matrix, the phred quality values above 25 show a trend of progressively overestimating the quality. This trend appears to be due to a greater tendency of the strands to form compressions during the electrophoresis in comparison to slab gel runs, suggesting that we will need to modify phred to recognize a greater range of stem/loop motifs, and possibly create a quality value lookup table specifically for it. For dye terminator chemistry run on the POP5 and POP6 matrices, the phred quality values maintain good accuracy up to about quality value 30. Between phred quality values 30 and 40, the observed quality values exhibit modest, apparently random, variation around the phred quality values; the variation increases above quality value 40. This indicates that the phred quality values are generally valid for these dye terminator data but we need additional data to improve our confidence in the tests.
Ewing, B. & Green, P. Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186-194 (1998).
This page was updated on 11 August 2000.