The Mahony lab at Penn State University is part of the Department of Biochemistry & Molecular Biology and the Center for Eukaryotic Gene Regulation. We are computational biologists who develop machine-learning approaches for understanding gene regulation.

Our research aims to understand where transcription factors (TFs) bind in the genome, and what they do once they get there. There are many forces that can affect a TF’s choice of binding targets once it is introduced into the nucleus. The inherent DNA-binding preference of the protein will specify the sites that could potentially be bound, but the vast majority of high-affinity sequences will not in fact be occupied by the TF in any given cell type. Binding selectivity is thus determined by the regulatory environment of the cell: chromatin accessibility, interactions with co-factors, DNA methylation, and histone post-translational modifications all play roles in specifying the TF’s binding sites. These forces are context-specific, which allows the same TF to target different binding sites in different cell types. However, a TF’s choice of binding targets is only part of the equation; many bound sites do not seem to directly affect gene expression. We understand little about how enhancers can regulate genes that are thousands, sometimes millions, of bases away on the genome.

Fortunately, high-throughput sequencing assays are giving us unprecedented insight into the regulatory environment of the cell. ChIP-seq and ChIP-exo allow us to profile TF and histone modification occupancy at high resolution over the entire genome. RNA-seq lets us profile the global transcriptional activity. DNase-seq profiles the genome-wide accessibility landscape, while new assays such as ChIA-PET and Hi-C are opening a window on the three-dimensional architecture of the nucleus. The challenge will be integrating these voluminous data types into a cohesive understanding of cellular activity. We believe that integrative machine-learning approaches that model the biological and experimental processes that generate such data will help us to understand the context-specific activity of transcription factors.