The Brenner research lab has four key research interests involving computational and experimental genomics.
Gene regulation by alternative splicing and nonsense-mediated mRNA decay.
Nonsense-mediated mRNA decay (NMD) is a cellular RNA surveillance system that recognizes transcripts with premature termination codons and degrades them. Several years ago, we discovered large numbers of natural alternative splice forms that appear to be targets for NMD, and we speculated that this might be a mode of gene regulation which we termed RUST (regulated unproductive splicing and translation). This seems to be confirmed by our finding that all conserved members of the SR family of splice regulators have an unproductive alternative mRNA isoform targeted for NMD. Strikingly, the splice pattern for each is conserved in mouse and always associated with an ultraconserved or highly-conserved region of ~100 or more nucleotides of perfect identity between human and mouse. Remarkably, this seems to have evolved independently in every one of the genes, suggesting that this is a natural mode of regulation. We are using microarray data to explore the pervasiveness of NMD in humans and in Drosophila, in collaboration with Don Rio. As part of a modENCODE consortium, we plan to discover the repertoire of cis-regulatory sites for alternative splicing in insects. Future directions include detailing the regulators in the SR family and exploring the evolution of this gene-expression regulation mechanism.
Prediction of protein function using Bayesian phylogenomics.
We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Unfortunately, these predictions have littered the databases with erroneous information, for a variety of reasons including the propagation of errors and the systematic flaws in BLAST and related methods. In collaboration with Michael Jordan's group, we have developed a statistical approach to predicting protein function that uses a protein family's phylogenetic tree, as the natural structure for representing protein relationships. We overlay on this all known protein functions in the family. We use a model of function evolution to then infer the functions of all other protein functions. Even our initial implementations of this method, called SIFTER (statistical inference of function through evolutionary relationships) have performed better than other methods in widespread use. We are presently making numerous improvements to the underlying SIFTER algorithm and enhancing its ability to work on a wide range of data. We are collaborating with the Joint Genome Institute and numerous protein databases to improve annotation on a large scale. In collaboration with Jack Kirsch, we are also experimentally validating the function predictions, with a focus on the Nudix family.
Medical and environmental metagenomics; personal genomics.
The Sorcerer II global ocean sampling project revealed the sequences millions of new putative protein sequences, arguably doubling the known repertoire of proteins. We collaborated with the Venter Institute in the analysis of these proteins, understanding how they differ from those previously seen, and discovering ancient relationships amongst them. We are developing a new binning method that will help assign individual sequence reads and contigs to clades, and we are collaborating with Jill Banfield to apply this to the acid mine drainage community. Our initial medical/metagenomics project is to understand the role of gut microbiota in Crohn's disease. Crohn's disease has long been known to be associated with microbial communities in the intestine, but the exact etiology has been unclear. By explicitly sampling these communities we aim to better understand how they cause disease. In addition, by studying how gut flora change during the withdrawal of long-term antibiotics, we hope to gain insight into the action of these drugs on the intestinal microbiota. We also have a longstanding interest in personal genome interpretation and developing a genome commons.
Structural genomics and proteins complexes.
Structural genomics ultimately aims to provide an experimental structure or a high-quality model for every protein. We are involved in maintaining the SCOP: Structural Classification of Proteins and ASTRAL databases which are key resources for accessing and understanding protein structure data. We therefore analyze structural genomics efforts and guide their future directions. Using kernel methods and selected features, we are building systems to recognize ancient protein evolutionary relationships. We are also involved in the Protein Complex Analysis Project, which uses mass-spectrometry, electron microscopy, and electron tomography to understand protein complexes and their cellular distribution.
In the News
Computational Biologist Steven Brenner will be part of an ambitious effort to assess whether large-scale gene sequencing aimed at detecting disorders and conditions can and should become a routine part of newborn testing.
Thanks to initial funding from the India-based Tata Consultancy Services, the Center for Computational Biology has launched a pioneering initiative to develop a software platform to analyze differences in people’s genomes and bring closer the day when one’s personal genome will be a starting point for health and medical advice.