Max-Planck-Institut für Informatik
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

Metagenomics research

Metagenomics is a young research area which deals with the sequencing and study of whole communities of microorganisms, as opposed to the sequencing of genomes of individual organisms that have been obtained in pure culture. As by estimates less than 1% of all microorganisms can be cultured using standard techniques, the current knowledge of prokaryote biology and the collection of sequenced genomes are strongly biased towards a few, well characterized phyla, while very little is known about the vast majority of the prokaryotic world [1]. Metagenomics has the potential to level this inequality and has already delivered an enormous gain in biological knowledge. This is also reflected by its recognition in Science as one of the top ten scientific breakthroughs in 2004.

Phylogenetic classification of variable-length DNA sequences

One of the major challenges in the analysis of metagenome datasets is to deconvolute the genomic sequence mixture into 'bins', representing the sampled organismal populations or phylogenetic clades. This problem becomes increasingly difficult for communities of higher organismal complexity and for populations sampled with low abundance by sequencing. A recent development is PhyloPythia, a composition-based classifier that uses a Support Vector Machine (SVM) multi-class classication framework for the phylogenetic characterization of genomic sequence fragments (joint work with I. Rigoutsos, IBM Research, NY and P. Hugenholtz, Joint Genome Institute, CA). Trained on a data set of 340 complete genomes, the method combines multi-class classifiers for clades from the rank of domain down to the genus level, as well as samples-specific models that can be generated for the dominant sample populations from as little as 100 kb of marker-gene characterized training sequence. PhyloPythia is able to accurately assign fragments as short as 1 kb across all considered ranks as well as for the dominant metagenome sample populations. This was shown with extensive evaluation experiments both on real and simulated metagenome sets [2, 3]. The methods' merits were also confirmed in a separate evaluation study of commonly used metagenome processing tools [4]. In collaboration with experimental groups, we are working on phylogenetic models for metagenomes of several microbial communities, such as a microbial community in the hindgut of wood-degrading higher termites [6] and an active methylotroph community from Lake Washington [7].

Process-level gene annotation

Metagenomic sequencing has retrieved large numbers of novel genes from uncultivable organisms that are of biotechnological, medical or agricultural interest. For instance, 1,700 novel protein families were discovered in a comprehensive sequencing study of the marine planktonic microbiota from oceanic surface waters [8]. However, the functions and corresponding biological processes of these genes remain largely unknown, as they lack homology to experimentally characterized proteins of known function. We are working on a method that processes large sets of (meta-) genome annotations and captures the inherent functional relationships between gene products acting in concert in a biological process by means of a probabilistic model. Such a model could then be used for a process-level annotation of genes, a detailed characterization of the biological processes encoded in a (meta-) genome and to extend the current knowledge of biological processes.

Collaborators

References

  1. P. Hugenholtz
    Exploring prokaryotic diversity in the genomic era
    Genome Biology 2002, 3(2): REVIEWS0003

  2. A.C. McHardy, H.Garcia Martin, A. Tsirigos, P. Hugenholtz, I. Rigoutsos
    Accurate phylogenetic classification of variable-length DNA fragments
    Nature Methods 2007, 4(1):63-72

  3. H. Garcia Martin, N. Ivanova, V. Kunin, F. Warnecke, K. Barry, A.C. McHardy, C. Yeates, S. He, A. Salamov, E. Szeto, E. Dalin, N. Putnam, I. Rigoutsos, N.C. Kyrpides, L.L. Blackall, K.D. McMahon, P. Hugenholtz
    Metagenomic analysis of phosphorus removing sludge communities
    Nature Biotechnology 2006, 24(10):1263-9

  4. K. Mavromatis, N. Ivanova, K. Barry, H. Shapiro, E. Goltsman, A.C. McHardy, I. Rigoutsos, A. Salamov, F. Korzeniewski, M. Land, A. Lapidus, I. Grigoriev, P. Richardson, P. Hugenholtz, N.C. Kyrpides
    Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
    Nature Methods 2007, 4(6):495-500

  5. A.C. McHardy and I. Rigoutsos
    What's in the mix? Methods for the phylogenetic classification of metagenome sequence samples
    Current Opinion in Microbiology 2007, 10(5):499-503

  6. F. Warnecke, P. Luginbühl, N. Ivanova, M. Ghassemian, T.H. Richardson, J.T. Stege, M. Cayouette, A.C. McHardy, G. Djordjevic, N. Aboushadi, R. Sorek, S.G. Tringe, M. Podar, H.G. Martin, V. Kunin, D. Dalevi, J. Madejska, E. Kirton, D. Platt, E. Szeto, A. Salamov, K. Barry, N. Mikhailova, N.C. Kyrpides, E.G. Matson, E.A. Ottesen, X. Zhang, M. Hernández, C. Murillo, L.G. Acosta, I. Rigoutsos, G. Tamayo, B.D. Green, C. Chang, E.M. Rubin, E.J. Mathur, D.E. Robertson, P. Hugenholtz, J.R. Leadbetter
    Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite
    Nature 2007, 450(7169):560-565

  7. Kalyuzhnaya MG, Lapidus A, Ivanova N, Copeland AC, McHardy AC, Szeto E, Salamov A, Grigoriev IV, Suciu D, Levine SR, Markowitz VM, Rigoutsos I, Tringe SG, Bruce DC, Richardson PM, Lidstrom ME, Chistoserdova L.
    High-resolution metagenomics targets specific functional types in complex microbial communities
    Nature Biotechnology 2008 Sep;26(9):1029-34

  8. S. Yooseph, G. Sutton, D.B. Rusch, A.L. Halpern, S.J. Williamson, et al.
    The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.
    PLoS Biology 5(3):e16, 2007

Search MPII (type ? for help)