Bayesian estimation of sequence damage in ancient DNA

Ho SYW, Heupink TH, Rambaut A & and Shapiro B

(2007) Mol Biol Evol 24, 1416-1422.

DNA extracted from archaeological and paleontological remains is usually damaged by biochemical processes postmortem. Some of these processes lead to changes in the structure of the DNA molecule, which can result in the incorporation of incorrect nucleotides during polymerase chain reaction. These base misincorporations, or miscoding lesions, can lead to the inclusion of spurious additional mutations in ancient DNA (aDNA) data sets. This has the potential to affect the outcome of phylogenetic and population genetic analyses, including estimates of mutation rates and genetic diversity. We present a novel model, termed the delta model, which estimates the amount of damage in DNA data and accounts for its effects in a Bayesian phylogenetic framework. The ability of the delta model to estimate damage is first investigated using a simulation study. The model is then applied to 13 aDNA data sets. The amount of damage in these data sets is shown to be significant but low (about 1 damaged base per 750 nt), suggesting that precautions for limiting the influence of damaged sites, such as cloning and enzymatic treatment, are worthwhile. The results also suggest that relatively high rates of mutation previously estimated from aDNA data are not entirely an artifact of sequence damage and are likely to be due to other factors such as the persistence of transient polymorphisms. The delta model appears to be particularly useful for placing upper credibility limits on the amount of sequence damage in an alignment, and this capacity might be beneficial for future aDNA studies or for the estimation of sequencing errors in modern DNA.

Andrew Rambaut, 2007