Metagenomics is a young research area which deals with the sequencing and study of whole communities of microorganisms, as opposed to the sequencing of genomes of individual organisms that have been obtained in pure culture. As by estimates less than 1% of all microorganisms can be cultured using standard techniques, the current knowledge of prokaryote biology and the collection of sequenced genomes are strongly biased towards a few, well characterized phyla, while very little is known about the vast majority of the prokaryotic world [1]. Metagenomics has the potential to level this inequality and has already delivered an enormous gain in biological knowledge. This is also reflected by its recognition in Science as one of the top ten scientific breakthroughs in 2004.
One of the major challenges in the analysis of metagenome datasets is to deconvolute the genomic sequence mixture into 'bins', representing the sampled organismal populations or phylogenetic clades. This problem becomes increasingly difficult for communities of higher organismal complexity and for populations sampled with low abundance by sequencing. A recent development is PhyloPythia, a composition-based classifier that uses a Support Vector Machine (SVM) multi-class classication framework for the phylogenetic characterization of genomic sequence fragments (joint work with I. Rigoutsos, IBM Research, NY and P. Hugenholtz, Joint Genome Institute, CA). Trained on a data set of 340 complete genomes, the method combines multi-class classifiers for clades from the rank of domain down to the genus level, as well as samples-specific models that can be generated for the dominant sample populations from as little as 100 kb of marker-gene characterized training sequence. PhyloPythia is able to accurately assign fragments as short as 1 kb across all considered ranks as well as for the dominant metagenome sample populations. This was shown with extensive evaluation experiments both on real and simulated metagenome sets [2, 3]. The methods' merits were also confirmed in a separate evaluation study of commonly used metagenome processing tools [4]. In collaboration with experimental groups, we are working on phylogenetic models for metagenomes of several microbial communities, such as a microbial community in the hindgut of wood-degrading higher termites [6] and an active methylotroph community from Lake Washington [7].
Metagenomic sequencing has retrieved large numbers of novel genes from uncultivable organisms that are of biotechnological, medical or agricultural interest. For instance, 1,700 novel protein families were discovered in a comprehensive sequencing study of the marine planktonic microbiota from oceanic surface waters [8]. However, the functions and corresponding biological processes of these genes remain largely unknown, as they lack homology to experimentally characterized proteins of known function. We are working on a method that processes large sets of (meta-) genome annotations and captures the inherent functional relationships between gene products acting in concert in a biological process by means of a probabilistic model. Such a model could then be used for a process-level annotation of genes, a detailed characterization of the biological processes encoded in a (meta-) genome and to extend the current knowledge of biological processes.
Collaborators
Andreas Brune, Research Group Leader, Department of Biogeochemistry, Max-Planck Institute for Terrestrial Microbiology, Marburg, Germany
Mila Chistoserdova, Department of Chemical Engineering, University of Washington,Seattle, WA
Jeffrey Gordon and Peter Turnbaugh, Center for Genome Sciences, Washington University, USA
Phil Hugenholtz, Leader Microbial Ecology Program, U.S. Department of Energy Joint Genome Institute, Walnut Creek, CA
Mark Morrison and Phillip Pope, Metagenomics, CSIRO Livestock Industries, Queensland, Australia
Isidore Rigoutsos, Manager Bioinformatics & Pattern Discovery Group, IBM T.J. Watson Research Center, Yorktown Heights, NY
References
P. Hugenholtz
Exploring prokaryotic diversity in the genomic era
Genome Biology 2002, 3(2): REVIEWS0003
A.C. McHardy, H.Garcia Martin, A. Tsirigos, P. Hugenholtz, I. Rigoutsos
Accurate phylogenetic classification of variable-length DNA fragments
Nature Methods 2007, 4(1):63-72
H. Garcia Martin, N. Ivanova, V. Kunin, F. Warnecke, K. Barry, A.C. McHardy, C. Yeates, S. He, A. Salamov, E. Szeto, E. Dalin, N. Putnam, I. Rigoutsos, N.C. Kyrpides, L.L. Blackall, K.D. McMahon, P. Hugenholtz
Metagenomic analysis of phosphorus removing sludge communities
Nature Biotechnology 2006, 24(10):1263-9
K. Mavromatis, N. Ivanova, K. Barry, H. Shapiro, E. Goltsman, A.C. McHardy,
I. Rigoutsos, A. Salamov, F. Korzeniewski, M. Land, A. Lapidus, I. Grigoriev, P.
Richardson, P. Hugenholtz, N.C. Kyrpides
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
Nature Methods 2007, 4(6):495-500
A.C. McHardy and I. Rigoutsos
What's in the mix? Methods for the phylogenetic classification of metagenome sequence samples
Current Opinion in Microbiology 2007, 10(5):499-503
F. Warnecke, P. Luginbühl, N. Ivanova, M. Ghassemian, T.H. Richardson, J.T. Stege, M. Cayouette, A.C. McHardy, G. Djordjevic, N. Aboushadi, R. Sorek, S.G. Tringe, M. Podar, H.G. Martin, V. Kunin, D. Dalevi, J. Madejska, E. Kirton, D. Platt, E. Szeto, A. Salamov, K. Barry, N. Mikhailova, N.C. Kyrpides, E.G. Matson, E.A. Ottesen, X. Zhang, M. Hernández, C. Murillo, L.G. Acosta, I. Rigoutsos, G. Tamayo, B.D. Green, C. Chang, E.M. Rubin, E.J. Mathur, D.E. Robertson, P. Hugenholtz, J.R. Leadbetter
Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite
Nature 2007, 450(7169):560-565
Kalyuzhnaya MG, Lapidus A, Ivanova N, Copeland AC, McHardy AC, Szeto E, Salamov A, Grigoriev IV, Suciu D, Levine SR, Markowitz VM, Rigoutsos I, Tringe SG, Bruce DC, Richardson PM, Lidstrom ME, Chistoserdova L.
High-resolution metagenomics targets specific functional types in complex microbial communities
Nature Biotechnology 2008 Sep;26(9):1029-34
S. Yooseph, G. Sutton, D.B. Rusch, A.L. Halpern, S.J. Williamson, et al.
The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.
PLoS Biology 5(3):e16, 2007