Computational Genomics

🧬Computational Genomics Unit 1 – Genome Sequencing Technologies

Genome sequencing technologies have revolutionized our understanding of genetics and biology. From Sanger sequencing to next-generation methods, these tools allow scientists to decode DNA with increasing speed and accuracy. They've enabled breakthroughs in disease research, personalized medicine, and evolutionary studies. As sequencing becomes faster and cheaper, it's transforming fields like medicine and agriculture. However, challenges remain in data analysis, interpretation, and ethics. Emerging technologies like long-read and single-cell sequencing promise to further expand our genomic knowledge and applications.

Key Concepts and Terminology

  • Genome the complete set of genetic material present in an organism
  • DNA sequencing the process of determining the precise order of nucleotides within a DNA molecule
  • Sanger sequencing a method of DNA sequencing based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication
  • Next-generation sequencing (NGS) a term used to describe several modern high-throughput sequencing technologies that enable the sequencing of large numbers of DNA molecules in parallel
    • Includes technologies such as Illumina sequencing, Ion Torrent sequencing, and Pacific Biosciences sequencing
  • Reads the short DNA sequences produced by a sequencing instrument, typically ranging from 50 to 400 base pairs in length
  • Coverage the average number of reads that align to, or "cover," each base in the reference genome
  • Assembly the process of aligning and merging sequencing reads to reconstruct the original DNA sequence
  • Variant calling the process of identifying differences between the sequenced genome and a reference genome, such as single nucleotide polymorphisms (SNPs) and insertions/deletions (indels)

Historical Context and Evolution

  • DNA structure first described by James Watson and Francis Crick in 1953, based on X-ray crystallography data collected by Rosalind Franklin
  • Sanger sequencing developed by Frederick Sanger in 1977, which became the primary method for DNA sequencing for several decades
    • Sanger sequencing relies on the use of labeled chain-terminating dideoxynucleotides (ddNTPs) to generate DNA fragments of varying lengths
    • These fragments are then separated by size using gel electrophoresis, allowing the DNA sequence to be read
  • Automation and refinement of Sanger sequencing led to the completion of the Human Genome Project in 2003, which produced the first complete sequence of the human genome
  • Development of next-generation sequencing (NGS) technologies in the mid-2000s revolutionized the field by enabling high-throughput, parallel sequencing of DNA molecules
  • Continuous improvements in NGS technologies have led to increased sequencing speed, accuracy, and affordability, making large-scale genomic studies more feasible

DNA Sequencing Methods

  • Sanger sequencing the traditional method of DNA sequencing that relies on the use of labeled chain-terminating dideoxynucleotides (ddNTPs) to generate DNA fragments of varying lengths
    • DNA sample is divided into four separate sequencing reactions, each containing a different ddNTP (ddATP, ddCTP, ddGTP, or ddTTP)
    • The ddNTPs are incorporated by DNA polymerase during in vitro DNA replication, causing the termination of DNA strand elongation
    • The resulting DNA fragments are then separated by size using gel electrophoresis, allowing the DNA sequence to be read
  • Maxam-Gilbert sequencing an early DNA sequencing method that relies on chemical modification and cleavage of DNA
    • DNA sample is radiolabeled at one end and then cleaved at specific bases using chemical treatments
    • The resulting DNA fragments are separated by size using gel electrophoresis, allowing the DNA sequence to be read
  • Pyrosequencing a sequencing method that relies on the detection of pyrophosphate release during DNA synthesis
    • DNA synthesis is performed in a stepwise manner, with each nucleotide added sequentially
    • The release of pyrophosphate during nucleotide incorporation is detected using a luminescent enzyme, allowing the DNA sequence to be determined in real-time
  • Chain termination methods a class of DNA sequencing methods that rely on the use of labeled chain-terminating nucleotides to generate DNA fragments of varying lengths (includes Sanger sequencing)

Next-Generation Sequencing Technologies

  • Illumina sequencing a widely used NGS platform that relies on the use of fluorescently labeled reversible terminator nucleotides
    • DNA sample is fragmented and adapters are ligated to the ends of the fragments
    • The fragments are then amplified by PCR and attached to a solid surface (flow cell)
    • Sequencing is performed by the sequential addition of fluorescently labeled nucleotides, with each cycle of nucleotide addition followed by imaging to determine the incorporated base
  • Ion Torrent sequencing an NGS platform that relies on the detection of hydrogen ions released during DNA synthesis
    • DNA fragments are attached to a semiconductor chip and sequencing is performed by the sequential addition of unlabeled nucleotides
    • The incorporation of a nucleotide causes the release of a hydrogen ion, which is detected by a change in pH on the semiconductor chip
  • Pacific Biosciences sequencing an NGS platform that relies on the real-time observation of DNA synthesis by a single DNA polymerase molecule
    • DNA synthesis is performed using fluorescently labeled nucleotides within a zero-mode waveguide (ZMW)
    • The incorporation of each nucleotide causes a fluorescent signal that is detected in real-time, allowing the DNA sequence to be determined
  • Oxford Nanopore sequencing an NGS platform that relies on the detection of changes in electrical current as DNA molecules pass through a protein nanopore
    • DNA sample is mixed with a protein nanopore and an ionic current is passed through the nanopore
    • As DNA molecules pass through the nanopore, they cause changes in the electrical current that are characteristic of the DNA sequence

Bioinformatics Tools for Sequence Analysis

  • Quality control tools software programs used to assess the quality of sequencing data and remove low-quality reads or bases (FastQC, Trimmomatic)
  • Alignment tools software programs used to align sequencing reads to a reference genome or to each other (BWA, Bowtie, HISAT)
    • Alignment is necessary to determine the location of each read within the genome and to identify differences between the sequenced genome and the reference genome
  • Variant calling tools software programs used to identify differences between the sequenced genome and a reference genome, such as single nucleotide polymorphisms (SNPs) and insertions/deletions (indels) (GATK, SAMtools, FreeBayes)
  • Genome assembly tools software programs used to align and merge sequencing reads to reconstruct the original DNA sequence (SPAdes, Velvet, SOAPdenovo)
    • Assembly is necessary when a reference genome is not available or when the goal is to identify novel sequences or structural variations
  • Annotation tools software programs used to identify and assign biological meaning to functional elements within a genome, such as genes, regulatory regions, and non-coding RNAs (MAKER, Augustus, Prokka)

Applications in Research and Medicine

  • Disease gene discovery sequencing can be used to identify genetic variants associated with inherited disorders or complex diseases (Alzheimer's disease, cancer)
  • Personalized medicine sequencing can be used to guide treatment decisions based on an individual's genetic profile (pharmacogenomics, cancer treatment)
  • Microbial genomics sequencing can be used to study the genomes of bacteria, viruses, and other microorganisms (pathogen identification, antibiotic resistance)
    • This can aid in the development of new antibiotics, vaccines, and diagnostic tests
  • Agricultural genomics sequencing can be used to study the genomes of crops and livestock to improve traits such as yield, disease resistance, and nutritional content
  • Evolutionary studies sequencing can be used to study the evolutionary relationships between species and to identify regions of the genome that have undergone selection

Challenges and Limitations

  • High cost sequencing technologies can be expensive, particularly for large-scale studies or clinical applications
  • Data storage and management the large amounts of data generated by sequencing require significant computational resources for storage and analysis
    • This can be a challenge for smaller research groups or institutions with limited resources
  • Interpretation of variants determining the biological significance of genetic variants can be difficult, particularly for rare or novel variants
    • This requires the integration of multiple lines of evidence, including functional studies and population-level data
  • Ethical considerations sequencing can raise ethical concerns related to privacy, informed consent, and the potential for genetic discrimination
    • There are also concerns about the use of sequencing data for non-medical purposes, such as forensic investigations or ancestry testing
  • Technical limitations current sequencing technologies have limitations in terms of read length, accuracy, and the ability to sequence certain regions of the genome (repetitive regions, structural variations)
  • Long-read sequencing technologies that generate reads of several kilobases or even megabases in length, allowing for improved genome assembly and the identification of structural variations (Pacific Biosciences, Oxford Nanopore)
  • Single-cell sequencing technologies that allow for the sequencing of individual cells, enabling the study of cellular heterogeneity and rare cell types
  • Spatial transcriptomics technologies that allow for the spatial mapping of gene expression within tissues, providing insights into the relationship between cellular function and spatial organization
  • Epigenomic sequencing technologies that allow for the mapping of epigenetic modifications, such as DNA methylation and histone modifications, which play important roles in gene regulation and development
  • Integration of sequencing with other omics technologies, such as proteomics and metabolomics, to provide a more comprehensive view of biological systems
  • Continued development of bioinformatics tools and databases to facilitate the analysis and interpretation of sequencing data
  • Increased use of sequencing in clinical settings for diagnosis, prognosis, and treatment selection, particularly in the areas of cancer and rare genetic disorders


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary