🧬Computational Genomics Unit 6 – Regulatory Genomics & Epigenomics
Regulatory genomics and epigenomics explore how genes are controlled without changing DNA. These fields study elements like enhancers and silencers, as well as modifications like DNA methylation and histone changes that affect gene expression.
Understanding these mechanisms is crucial for grasping how cells function and differentiate. This knowledge has applications in medicine, helping explain disease origins and develop new treatments targeting gene regulation processes.
Regulatory genomics studies how gene expression is controlled by various regulatory elements and mechanisms
Epigenomics focuses on heritable changes in gene expression that do not involve alterations to the DNA sequence itself
Transcription factors (TFs) are proteins that bind to specific DNA sequences and regulate transcription of target genes
Enhancers are distal regulatory elements that positively regulate gene expression by interacting with promoters through DNA looping
Can be located upstream or downstream of the genes they regulate (e.g., the sonic hedgehog enhancer located ~1 Mb upstream of the SHH gene)
Silencers are regulatory elements that negatively regulate gene expression by recruiting repressive factors
Insulators are boundary elements that prevent inappropriate interactions between neighboring chromatin domains
Chromatin accessibility refers to the degree to which DNA is accessible to TFs and other regulatory proteins
Open chromatin regions are associated with active gene expression, while closed chromatin is associated with gene repression
Regulatory Elements in the Genome
Promoters are located near the transcription start site (TSS) and contain binding sites for RNA polymerase and general TFs
Proximal promoters are located immediately upstream of the TSS and contain core promoter elements (TATA box, initiator, downstream promoter element)
Distal promoters are located further upstream and contain additional regulatory elements (CpG islands, upstream activating sequences)
Enhancers can be located far from their target genes and interact with promoters through DNA looping mediated by cohesin and mediator complexes
Super-enhancers are clusters of enhancers that drive high levels of gene expression in cell type-specific manner (e.g., the α-globin super-enhancer in erythroid cells)
Silencers can be located near or far from their target genes and recruit repressive factors (histone deacetylases, polycomb group proteins)
Insulators prevent enhancer-promoter interactions and chromatin spreading by forming loops and interacting with nuclear lamina
CTCF is a key insulator-binding protein that mediates chromatin looping and TAD formation
Epigenetic Modifications and Mechanisms
DNA methylation involves the addition of methyl groups to cytosine residues, primarily at CpG dinucleotides
Methylation of promoter CpG islands is associated with gene silencing, while methylation of gene bodies is associated with active transcription
Histone modifications include acetylation, methylation, phosphorylation, and ubiquitination of histone tails
H3K4me3 is associated with active promoters, while H3K27me3 is associated with repressed promoters and enhancers
H3K27ac is associated with active enhancers and distinguishes them from poised enhancers marked by H3K4me1 alone
Chromatin remodeling involves the ATP-dependent alteration of nucleosome positioning and composition by remodeling complexes (SWI/SNF, ISWI, CHD, INO80)
Non-coding RNAs (ncRNAs) can regulate gene expression through various mechanisms
Long non-coding RNAs (lncRNAs) can recruit chromatin-modifying complexes, act as enhancer RNAs, or serve as scaffolds for protein complexes (e.g., XIST lncRNA in X chromosome inactivation)
microRNAs (miRNAs) can post-transcriptionally repress gene expression by targeting mRNAs for degradation or translational repression
Experimental Techniques in Regulatory Genomics
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) allows genome-wide mapping of protein-DNA interactions
Used to identify binding sites of TFs, histone modifications, and chromatin-associated proteins
Requires antibodies specific to the protein of interest and sufficient cell numbers for robust signal
DNase-seq and ATAC-seq identify regions of open chromatin by digesting accessible DNA with DNase I or transposase, respectively
Open chromatin regions are indicative of active regulatory elements (promoters, enhancers) and TF binding sites
Bisulfite sequencing determines DNA methylation patterns by converting unmethylated cytosines to uracil while leaving methylated cytosines unchanged
Whole-genome bisulfite sequencing (WGBS) provides single-base resolution methylation profiles, but is costly and requires high coverage
Reduced representation bisulfite sequencing (RRBS) focuses on CpG-rich regions and is more cost-effective
Hi-C provides genome-wide interaction maps at kilobase to megabase resolution, revealing topologically associating domains (TADs) and chromatin loops
Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) identifies interactions mediated by specific proteins (e.g., CTCF, RNA polymerase II)
Computational Methods for Epigenomic Analysis
Peak calling identifies enriched regions in ChIP-seq and accessibility data using algorithms (MACS2, HOMER, F-Seq) that compare signal to background
Differential analysis identifies regions with significant differences in signal intensity between conditions or cell types using tools like DiffBind and DESeq2
Chromatin state annotation segments the genome into distinct states (active promoter, strong enhancer, repressed, etc.) based on combinatorial histone modification patterns using hidden Markov models (ChromHMM, Segway)
Motif analysis identifies enriched DNA sequence motifs in regulatory regions using de novo discovery tools (MEME, DREME) or known motif scanning (FIMO, HOMER)
Motif instances can be used to infer TF binding and construct regulatory networks
Integrative analysis combines multiple data types (e.g., ChIP-seq, RNA-seq, Hi-C) to gain insights into regulatory mechanisms and predict functional effects of genetic variants
Tools like GREAT and RegulomeDB annotate variants with regulatory information and predict their impact on gene expression
Machine learning approaches (e.g., deep learning) are increasingly used to predict regulatory elements, chromatin states, and gene expression from DNA sequence and epigenomic data
Regulatory Networks and Gene Expression
Gene regulatory networks (GRNs) describe the complex interactions between TFs and their target genes that control cell type-specific gene expression programs
Can be constructed using TF binding data, gene expression data, and computational inference methods (e.g., ARACNE, GENIE3)
Transcriptional regulation involves the interplay between TFs, co-factors, and chromatin state to control the initiation and rate of transcription
General TFs (TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH) assemble at the promoter to form the pre-initiation complex and recruit RNA polymerase II
Sequence-specific TFs bind to enhancers and promoters to activate or repress transcription by recruiting co-activators or co-repressors
Post-transcriptional regulation modulates gene expression through mRNA processing, stability, and translation
Alternative splicing generates transcript isoforms with different functions or stability
miRNAs and RNA-binding proteins (RBPs) regulate mRNA stability and translation efficiency
Feedback loops and feed-forward loops are common motifs in GRNs that enable precise control of gene expression dynamics
Negative feedback loops confer robustness and homeostasis, while positive feedback loops amplify signals and generate switch-like responses
Applications in Health and Disease
Genome-wide association studies (GWAS) have identified numerous disease-associated variants, many of which lie in non-coding regulatory regions
Integrating GWAS data with epigenomic data can help prioritize causal variants and elucidate their functional effects on gene regulation
Epigenetic alterations are implicated in various diseases, including cancer, neurodevelopmental disorders, and autoimmune diseases
DNA methylation changes and aberrant histone modifications can lead to altered gene expression and disease progression
Epigenetic drugs targeting DNA methyltransferases (DNMTs) and histone deacetylases (HDACs) are used in cancer treatment (e.g., azacitidine, vorinostat)
Personalized medicine approaches leverage epigenomic data to stratify patients, predict drug responses, and develop targeted therapies
Epigenetic biomarkers can be used for early detection, prognosis, and treatment selection in various diseases
Epigenetic inheritance and transgenerational effects are areas of active research
Epigenetic modifications can be inherited across generations and may contribute to disease risk and evolutionary adaptation
Environmental factors (diet, stress, toxins) can induce epigenetic changes that affect offspring health and development
Emerging Trends and Future Directions
Single-cell epigenomics technologies (scRNA-seq, scATAC-seq, scBS-seq) enable the study of epigenetic heterogeneity and cell type-specific regulatory landscapes
Help identify rare cell types, developmental trajectories, and epigenetic states associated with disease
Spatial epigenomics methods (e.g., spatially resolved ChIP-seq, spatial transcriptomics) provide information on the spatial organization of regulatory elements and gene expression in tissues
CRISPR-based epigenome editing tools allow targeted manipulation of DNA methylation and histone modifications at specific loci
Used to dissect the functional roles of epigenetic modifications and regulatory elements in gene regulation and disease
Integration of multi-omics data (epigenomics, transcriptomics, proteomics, metabolomics) using systems biology approaches will provide a more comprehensive understanding of gene regulation and its impact on cellular phenotypes
Machine learning and artificial intelligence will play an increasingly important role in analyzing and interpreting large-scale epigenomic datasets
Deep learning models can predict epigenetic states, gene expression, and disease outcomes from DNA sequence and other features
Comparative epigenomics across species will shed light on the evolution of regulatory mechanisms and their role in adaptation and speciation
Epigenetic clocks based on DNA methylation patterns can predict biological age and are being developed as biomarkers of aging and disease risk