Skip to Navigation Skip to Content

Haiyan Huang

Professor of Statistics
Department of Statistics
hhuang@stat.berkeley.edu
(510) 642-6433

Research Expertise and Interest

bioinformatics, biological sequence analysis, distributional approximation, interaction networks, regulatory networks, sequence weighting methods, statistics

Description

My research interests concern the development and application of statistical/mathematical methods for problems associated with various biological systems and data. Recent projects include: 1). Regulatory sequence analysis. We present a p-value based scoring scheme using probability generating functions to evaluate the potential transcription factor binding sites (TFBSs). We introduce the local genomic context into the model so that candidate sites are evaluated based both on their similarities to known binding sites and on their contrasts against their respective local genomic contexts. We demonstrate that our approach is advantageous in the prediction of myogenin and MEF2 binding sites in the human genome. This method has been packaged for use (available upon request). High eukaryotic genomes present a particular challenge to the computational identification of TFBSs because of their long non-coding regions and large numbers of repeat elements. I am very attracted to the problems associated with binding sites/motifs discovery in high-level organisms. An ongoing project is to describe the target binding motifs using the evolutionary information of associated species under a tree-based model. We expect that the new derived motifs would help better understand the evolutionary development of the binding sites. 2). Analysis of large-scale expression data. With Li Cai (Dana Farber Cancer Institute) and others, we work on SAGE data and develop a clustering algorithm to group tags sharing similar count patterns under different conditions. This algorithm is designed based on the nature of the SAGE data, and evidenced to be advantageous in analyzing the developing and mature mouse retina data (Cepko's lab, Harvard Medical School). An Ongoing project is to develop methods for accurately estimating gene correlations across multiple dependent arrays. The developed methods will be applied to a set of cross-platform microarray datasets for identifying regulators of transcription modules (collaborating with Zhou's lab at USC). 3). Mathematical/statistical theories: I am also interested in studying the mathematical/statistical theories underlying those computational methods.