Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST.

Baele G, Lemey P, Rambaut A & Suchard, MA

(2017) Bioinformatics 33, 1798?1805.

Motivation: Advances in sequencing technology continue to deliver increasingly large molecular sequence data sets that are often heavily partitioned in order to accurately model the underlying evolutionary processes. In phylogenetic analyses, partitioning strategies involve estimating conditionally independent models of molecular evolution for different genes and different positions within those genes, requiring a large number of evolutionary parameters that have to be estimated, leading to an increased computational burden for such analyses. The past two decades have also seen the rise of multi-core processors, both in the CPU and GPU processor markets, enabling massively parallel computations that are not yet fully exploited by many software packages for multipartite analyses. Results: We here propose a Markov chain Monte Carlo (MCMC) approach using an adaptive multivariate transition kernel to estimate in parallel a large number of parameters, split across partitioned data, by exploiting multi-core processing. Across several real-world examples, we demonstrate that our approach enables the estimation of these multipartite parameters more efficiently than standard approaches that typically employ a mixture of univariate transition kernels. In one case, when estimating the relative rate parameter of the non-coding partition in a heterochronous data set, MCMC integration efficiency improves by over 14-fold.

Availability: Our implementation is part of the BEAST code base, a widely used open source software package to perform Bayesian phylogenetic inference.

Andrew Rambaut, 2007