SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building, by Gouy, M., Guindon, S., Gascuel, O.
Molecular Biology and Evolution 2010 27(2):221-224; doi:10.1093/molbev/msp259
We present SeaView version 4, a multiplatform program designed to facilitate multiple alignment and phylogenetic tree building from molecular sequence data through the use of a graphical user interface. SeaView version 4 combines all the functions of the widely used programs SeaView (in its previous versions) and Phylo_win, and expands them by adding network access to sequence databases, alignment with arbitrary algorithm, maximum-likelihood tree building with PhyML, and display, printing, and copy-to-clipboard of rooted or unrooted, binary or multifurcating phylogenetic trees. In relation to the wide present offer of tools and algorithms for phylogenetic analyses, SeaView is especially useful for teaching and for occasional users of such software. SeaView is freely available at http://pbil.univ-lyon1.fr/software/seaview.
Rapid Likelihood Analysis on Large Phylogenies Using Partial Sampling of Substitution Histories, by de Koning, A. P. J., Gu, W., Pollock, D. D.
Molecular Biology and Evolution 2010 27(2):249-265; doi:10.1093/molbev/msp228
Likelihood-based approaches can reconstruct evolutionary processes in greater detail and with better precision from larger data sets. The extremely large comparative genomic data sets that are now being generated thus create new opportunities for understanding molecular evolution, but analysis of such large quantities of data poses escalating computational challenges. Recently developed Markov chain Monte Carlo methods that augment substitution histories are a promising approach to alleviate these computational costs. We analyzed the computational costs of several such approaches, considering how they scale with model and data set complexity. This provided a theoretical framework to understand the most important computational bottlenecks, leading us to combine novel variations of our conditional pathway integration approach with recent advances made by others. The resulting technique (“partial sampling” of substitution histories) is considerably faster than all other approaches we considered. It is accurate, simple to implement, and scales exceptionally well with dimensions of model complexity and data set size. In particular, the time complexity of sampling unobserved substitution histories using the new method is much faster than previously existing methods, and model parameter and branch length updates are independent of data set size. We compared the performance of methods on a 224-taxon set of mammalian cytochrome-b sequences. For a simple nucleotide substitution model, partial sampling was at least 10 times faster than the PhyloBayes program, which samples substitutions in continuous time, and about 100 times faster than when using fully integrated substitution histories. Under a general reversible model of amino acid substitution, the partial sampling method was 1,600 times faster than when using fully integrated substitution histories, confirming significantly improved scaling with model state-space complexity. Partial sampling of substitutions thus dramatically improves the utility of likelihood approaches for analyzing complex evolutionary processes on large data sets.
Phylogenetic Distributions and Histories of Proteins Involved in Anaerobic Pyruvate Metabolism in Eukaryotes, by Hug, L. A., Stechmann, A., Roger, A. J.
Molecular Biology and Evolution 2010 27(2):311-324; doi:10.1093/molbev/msp237
Protists that live in low oxygen conditions often oxidize pyruvate to acetate via anaerobic ATP-generating pathways. Key enzymes that commonly occur in these pathways are pyruvate:ferredoxin oxidoreductase (PFO) and [FeFe]-hydrogenase (H2ase) as well as the associated [FeFe]-H2ase maturase proteins HydE, HydF, and HydG. Determining the origins of these proteins in eukaryotes is of key importance to understanding the origins of anaerobic energy metabolism in microbial eukaryotes. We conducted a comprehensive search for genes encoding these proteins in available whole genomes and expressed sequence tag data from diverse eukaryotes. Our analyses of the presence/absence of eukaryotic PFO, [FeFe]-H2ase, and H2ase maturase sequences across eukaryotic diversity reveal orthologs of these proteins encoded in the genomes of a variety of protists previously not known to contain them. Our phylogenetic analyses revealed: 1) extensive lateral gene transfers of both PFO and [FeFe]-H2ase in eubacteria, 2) decreased support for the monophyly of eukaryote PFO domains, and 3) that eukaryotic [FeFe]-H2ases are not monophyletic. Although there are few eukaryote [FeFe]-H2ase maturase orthologs characterized, phylogenies of these proteins do recover eukaryote monophyly, although a consistent eubacterial sister group for eukaryotic homologs could not be determined. An exhaustive search for these five genes in diverse genomes from two representative eubacterial groups, the Clostridiales and the -proteobacteria, shows that although these enzymes are nearly universally present within the former group, they are very rare in the latter. No -proteobacterial genome sequenced to date encodes all five proteins. Molecular phylogenies and the extremely restricted distribution of PFO, [FeFe]-H2ases, and their associated maturases within the -proteobacteria do not support a mitochondrial origin for these enzymes in eukaryotes. However, the unexpected prevalence of PFO, pyruvate:NADP oxidoreductase, [FeFe]-H2ase, and the maturase proteins encoded in genomes of diverse eukaryotes indicates that these enzymes have an important role in the evolution of microbial eukaryote energy metabolism.
Infrequent Transitions between Saline and Fresh Waters in One of the Most Abundant Microbial Lineages (SAR11), by Logares, R., Brate, J., Heinrich, F., Shalchian-Tabrizi, K., Bertilsson, S.
Molecular Biology and Evolution 2010 27(2):347-357; doi:10.1093/molbev/msp239
The aquatic bacterial group SAR11 is one of the most abundant organisms on Earth, with an estimated global population size of 2.4 x 1028 cells in the oceans. Members of SAR11 have also been detected in brackish and fresh waters, but the evolutionary relationships between the species present in the different environments have been ambiguous. In particular, it was not clear how frequently this lineage has crossed the saline–freshwater boundary during its evolutionary diversification. Due to the huge population size of SAR11 and the potential of microbes for long-distance dispersal, we hypothesized that environmental transitions could have occurred repeatedly during the evolutionary diversification of this group. Here, we have constructed extensive 16S rDNA–based molecular phylogenies and undertaken metagenomic data analyses to assess the frequency of saline–freshwater transitions in SAR11 and to investigate the evolutionary implications of this process. Our analyses indicated that very few saline–freshwater transitions occurred during the evolutionary diversification of SAR11, generating genetically distinct saline and freshwater lineages that do not appear to exchange genes extensively via horizontal gene transfer. In contrast to lineages from saline environments, extant freshwater taxa from diverse, and sometimes distant, geographic locations were very closely related. This points to a rapid diversification and dispersal in fresh waters or to slower evolutionary rates in fresh water SAR11 when compared with marine counterparts. In addition, the colonization of both saline and fresh waters appears to have occurred early in the evolution of SAR11. We conclude that the different biogeochemical conditions that prevail in saline and fresh waters have likely prevented the environmental transitions in SAR11, promoting the evolution of clearly distinct lineages in each environment.
A Dirichlet Process Covarion Mixture Model and Its Assessments Using Posterior Predictive Discrepancy Tests, by Zhou, Y., Brinkmann, H., Rodrigue, N., Lartillot, N., Philippe, H.
Molecular Biology and Evolution 2010 27(2):371-384; doi:10.1093/molbev/msp248
Heterotachy, the variation of substitution rate at a site across time, is a prevalent phenomenon in nucleotide and amino acid alignments, which may mislead probabilistic-based phylogenetic inferences. The covarion model is a special case of heterotachy, in which sites change between the “ON” state (allowing substitutions according to any particular model of sequence evolution) and the “OFF” state (prohibiting substitutions). In current implementations, the switch rates between ON and OFF states are homogeneous across sites, a hypothesis that has never been tested. In this study, we developed an infinite mixture model, called the covarion mixture (CM) model, which allows the covarion parameters to vary across sites, controlled by a Dirichlet process prior. Moreover, we combine the CM model with other approaches. We use a second independent Dirichlet process that models the heterogeneities of amino acid equilibrium frequencies across sites, known as the CAT model, and general rate-across-site heterogeneity is modeled by a gamma distribution. The application of the CM model to several large alignments demonstrates that the covarion parameters are significantly heterogeneous across sites. We describe posterior predictive discrepancy tests and use these to demonstrate the importance of these different elements of the models.
Phylodynamics of HIV-1 from a Phase-III AIDS Vaccine Trial in North America, by Perez-Losada, M., Jobes, D. V., Sinangil, F., Crandall, K. A., Posada, D., Berman, P. W.
Molecular Biology and Evolution 2010 27(2):417-425; doi:10.1093/molbev/msp254
In 2003, a phase III placebo-controlled trial (VAX004) of a candidate HIV-1 vaccine (AIDSVAX B/B) was completed in 5,403 volunteers at high risk for HIV-1 infection from North America and the Netherlands. A total of 368 individuals became infected with HIV-1 during the trial. The envelope glycoprotein gene (gp120) from the HIV-1 subtype B viruses infecting 349 patients was sequenced from clinical samples taken as close as possible to the time of diagnosis, rendering a final data set of 1,047 sequences (1,032 from North America and 15 from the Netherlands). Here, we used these data in combination with other sequences available in public databases to assess HIV-1 variation as a function of vaccination treatment, geographic region, race, risk behavior, and viral load. Viral samples did not show any phylogenetic structure for any of these factors, but individuals with different viral loads showed significant differences (P = 0.009) in genetic diversity. The estimated time of emergence of HIV-1 subtype B was 1966–1970. Despite the fact that the number of AIDS cases has decreased in North America since the early 90s, HIV-1 genetic diversity seems to have remained almost constant over time. This study represents one of the largest molecular epidemiologic surveys of viruses responsible for new HIV-1 infections in North America and could help the selection of epidemiologically representative vaccine antigens to include in the next generation of candidate HIV-1 vaccines.