Computational Genomics

🧬Computational Genomics Unit 6 – Regulatory Genomics & Epigenomics

Regulatory genomics and epigenomics explore how genes are controlled without changing DNA. These fields study elements like enhancers and silencers, as well as modifications like DNA methylation and histone changes that affect gene expression. Understanding these mechanisms is crucial for grasping how cells function and differentiate. This knowledge has applications in medicine, helping explain disease origins and develop new treatments targeting gene regulation processes.

Key Concepts and Definitions

  • Regulatory genomics studies how gene expression is controlled by various regulatory elements and mechanisms
  • Epigenomics focuses on heritable changes in gene expression that do not involve alterations to the DNA sequence itself
  • Transcription factors (TFs) are proteins that bind to specific DNA sequences and regulate transcription of target genes
  • Enhancers are distal regulatory elements that positively regulate gene expression by interacting with promoters through DNA looping
    • Can be located upstream or downstream of the genes they regulate (e.g., the sonic hedgehog enhancer located ~1 Mb upstream of the SHH gene)
  • Silencers are regulatory elements that negatively regulate gene expression by recruiting repressive factors
  • Insulators are boundary elements that prevent inappropriate interactions between neighboring chromatin domains
  • Chromatin accessibility refers to the degree to which DNA is accessible to TFs and other regulatory proteins
    • Open chromatin regions are associated with active gene expression, while closed chromatin is associated with gene repression

Regulatory Elements in the Genome

  • Promoters are located near the transcription start site (TSS) and contain binding sites for RNA polymerase and general TFs
  • Proximal promoters are located immediately upstream of the TSS and contain core promoter elements (TATA box, initiator, downstream promoter element)
  • Distal promoters are located further upstream and contain additional regulatory elements (CpG islands, upstream activating sequences)
  • Enhancers can be located far from their target genes and interact with promoters through DNA looping mediated by cohesin and mediator complexes
    • Super-enhancers are clusters of enhancers that drive high levels of gene expression in cell type-specific manner (e.g., the α-globin super-enhancer in erythroid cells)
  • Silencers can be located near or far from their target genes and recruit repressive factors (histone deacetylases, polycomb group proteins)
  • Insulators prevent enhancer-promoter interactions and chromatin spreading by forming loops and interacting with nuclear lamina
    • CTCF is a key insulator-binding protein that mediates chromatin looping and TAD formation

Epigenetic Modifications and Mechanisms

  • DNA methylation involves the addition of methyl groups to cytosine residues, primarily at CpG dinucleotides
    • Methylation of promoter CpG islands is associated with gene silencing, while methylation of gene bodies is associated with active transcription
  • Histone modifications include acetylation, methylation, phosphorylation, and ubiquitination of histone tails
    • H3K4me3 is associated with active promoters, while H3K27me3 is associated with repressed promoters and enhancers
    • H3K27ac is associated with active enhancers and distinguishes them from poised enhancers marked by H3K4me1 alone
  • Chromatin remodeling involves the ATP-dependent alteration of nucleosome positioning and composition by remodeling complexes (SWI/SNF, ISWI, CHD, INO80)
  • Non-coding RNAs (ncRNAs) can regulate gene expression through various mechanisms
    • Long non-coding RNAs (lncRNAs) can recruit chromatin-modifying complexes, act as enhancer RNAs, or serve as scaffolds for protein complexes (e.g., XIST lncRNA in X chromosome inactivation)
    • microRNAs (miRNAs) can post-transcriptionally repress gene expression by targeting mRNAs for degradation or translational repression

Experimental Techniques in Regulatory Genomics

  • Chromatin immunoprecipitation followed by sequencing (ChIP-seq) allows genome-wide mapping of protein-DNA interactions
    • Used to identify binding sites of TFs, histone modifications, and chromatin-associated proteins
    • Requires antibodies specific to the protein of interest and sufficient cell numbers for robust signal
  • DNase-seq and ATAC-seq identify regions of open chromatin by digesting accessible DNA with DNase I or transposase, respectively
    • Open chromatin regions are indicative of active regulatory elements (promoters, enhancers) and TF binding sites
  • Bisulfite sequencing determines DNA methylation patterns by converting unmethylated cytosines to uracil while leaving methylated cytosines unchanged
    • Whole-genome bisulfite sequencing (WGBS) provides single-base resolution methylation profiles, but is costly and requires high coverage
    • Reduced representation bisulfite sequencing (RRBS) focuses on CpG-rich regions and is more cost-effective
  • Chromosome conformation capture (3C) techniques detect long-range chromatin interactions
    • Hi-C provides genome-wide interaction maps at kilobase to megabase resolution, revealing topologically associating domains (TADs) and chromatin loops
    • Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) identifies interactions mediated by specific proteins (e.g., CTCF, RNA polymerase II)

Computational Methods for Epigenomic Analysis

  • Peak calling identifies enriched regions in ChIP-seq and accessibility data using algorithms (MACS2, HOMER, F-Seq) that compare signal to background
  • Differential analysis identifies regions with significant differences in signal intensity between conditions or cell types using tools like DiffBind and DESeq2
  • Chromatin state annotation segments the genome into distinct states (active promoter, strong enhancer, repressed, etc.) based on combinatorial histone modification patterns using hidden Markov models (ChromHMM, Segway)
  • Motif analysis identifies enriched DNA sequence motifs in regulatory regions using de novo discovery tools (MEME, DREME) or known motif scanning (FIMO, HOMER)
    • Motif instances can be used to infer TF binding and construct regulatory networks
  • Integrative analysis combines multiple data types (e.g., ChIP-seq, RNA-seq, Hi-C) to gain insights into regulatory mechanisms and predict functional effects of genetic variants
    • Tools like GREAT and RegulomeDB annotate variants with regulatory information and predict their impact on gene expression
  • Machine learning approaches (e.g., deep learning) are increasingly used to predict regulatory elements, chromatin states, and gene expression from DNA sequence and epigenomic data

Regulatory Networks and Gene Expression

  • Gene regulatory networks (GRNs) describe the complex interactions between TFs and their target genes that control cell type-specific gene expression programs
    • Can be constructed using TF binding data, gene expression data, and computational inference methods (e.g., ARACNE, GENIE3)
  • Transcriptional regulation involves the interplay between TFs, co-factors, and chromatin state to control the initiation and rate of transcription
    • General TFs (TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH) assemble at the promoter to form the pre-initiation complex and recruit RNA polymerase II
    • Sequence-specific TFs bind to enhancers and promoters to activate or repress transcription by recruiting co-activators or co-repressors
  • Post-transcriptional regulation modulates gene expression through mRNA processing, stability, and translation
    • Alternative splicing generates transcript isoforms with different functions or stability
    • miRNAs and RNA-binding proteins (RBPs) regulate mRNA stability and translation efficiency
  • Feedback loops and feed-forward loops are common motifs in GRNs that enable precise control of gene expression dynamics
    • Negative feedback loops confer robustness and homeostasis, while positive feedback loops amplify signals and generate switch-like responses

Applications in Health and Disease

  • Genome-wide association studies (GWAS) have identified numerous disease-associated variants, many of which lie in non-coding regulatory regions
    • Integrating GWAS data with epigenomic data can help prioritize causal variants and elucidate their functional effects on gene regulation
  • Epigenetic alterations are implicated in various diseases, including cancer, neurodevelopmental disorders, and autoimmune diseases
    • DNA methylation changes and aberrant histone modifications can lead to altered gene expression and disease progression
    • Epigenetic drugs targeting DNA methyltransferases (DNMTs) and histone deacetylases (HDACs) are used in cancer treatment (e.g., azacitidine, vorinostat)
  • Personalized medicine approaches leverage epigenomic data to stratify patients, predict drug responses, and develop targeted therapies
    • Epigenetic biomarkers can be used for early detection, prognosis, and treatment selection in various diseases
  • Epigenetic inheritance and transgenerational effects are areas of active research
    • Epigenetic modifications can be inherited across generations and may contribute to disease risk and evolutionary adaptation
    • Environmental factors (diet, stress, toxins) can induce epigenetic changes that affect offspring health and development
  • Single-cell epigenomics technologies (scRNA-seq, scATAC-seq, scBS-seq) enable the study of epigenetic heterogeneity and cell type-specific regulatory landscapes
    • Help identify rare cell types, developmental trajectories, and epigenetic states associated with disease
  • Spatial epigenomics methods (e.g., spatially resolved ChIP-seq, spatial transcriptomics) provide information on the spatial organization of regulatory elements and gene expression in tissues
  • CRISPR-based epigenome editing tools allow targeted manipulation of DNA methylation and histone modifications at specific loci
    • Used to dissect the functional roles of epigenetic modifications and regulatory elements in gene regulation and disease
  • Integration of multi-omics data (epigenomics, transcriptomics, proteomics, metabolomics) using systems biology approaches will provide a more comprehensive understanding of gene regulation and its impact on cellular phenotypes
  • Machine learning and artificial intelligence will play an increasingly important role in analyzing and interpreting large-scale epigenomic datasets
    • Deep learning models can predict epigenetic states, gene expression, and disease outcomes from DNA sequence and other features
  • Comparative epigenomics across species will shed light on the evolution of regulatory mechanisms and their role in adaptation and speciation
  • Epigenetic clocks based on DNA methylation patterns can predict biological age and are being developed as biomarkers of aging and disease risk


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.