Using genomics data to reconstruct transmission trees during disease outbreaks

Hall MD, Woolhouse MEJ & Rambaut A

(2016) Revue scientifique et technique-Office international des epizooties 35, 287-296.

Genetic sequence data from pathogens present a novel means to investigate the spread of infectious disease between infected hosts or infected premises, complementing traditional contact-tracing approaches, and much recent work has gone into developing methods for this purpose. The objective is to recover the epidemic transmission tree, which identifies who infected whom. This paper reviews the various approaches that have been taken. The first step is to define a measure of difference between sequences, which must be done while taking into account such factors as recombination and convergent evolution. Three broad categories of method exist, of increasing complexity: those that assume no within host genetic diversity or mutation, those that assume no within-host diversity but allow mutation, and those that allow both. Until recently, the assumption was usually made that every host in the epidemic could be identified, but this is now being relaxed, and some methods are intended for sparsely sampled data, concentrating on the identification of pairs of sequences that are likely to be the result of direct transmission rather than inferring the complete transmission tree. Many of the procedures described here are available to researchers as free software.

 
Andrew Rambaut, 2007