Light

1.5 Sequencing strategies (whole-genome, exome, targeted)

8 min read•august 20, 2024

Sequencing strategies are crucial tools in genomics, offering different approaches to unravel genetic information. provides a comprehensive view, while focuses on protein-coding regions. zeros in on specific areas of interest.

Each strategy has its strengths and limitations. Whole-genome sequencing offers the most complete picture but is costly. Exome sequencing is more affordable and targets disease-causing variants. Targeted sequencing allows for deep of selected regions, making it ideal for specific research questions.

Whole-genome sequencing

Whole-genome sequencing (WGS) involves determining the complete DNA sequence of an organism's genome
Provides the most comprehensive view of an individual's genetic makeup compared to other sequencing strategies
Enables the identification of all types of genetic variations, including single nucleotide variants (SNVs), insertions/deletions (indels), and structural variations (SVs)

Overview of WGS

Top images from around the web for Overview of WGS

Frontiers | Combination of whole genome sequencing and supervised machine learning provides ... View original
Is this image relevant?
Frontiers | Rapid whole genome sequencing methods for RNA viruses View original
Is this image relevant?
Whole-Genome Sequencing | Boundless Biology View original
Is this image relevant?
Frontiers | Combination of whole genome sequencing and supervised machine learning provides ... View original
Is this image relevant?
Frontiers | Rapid whole genome sequencing methods for RNA viruses View original
Is this image relevant?

1 of 3

Top images from around the web for Overview of WGS

Frontiers | Combination of whole genome sequencing and supervised machine learning provides ... View original
Is this image relevant?
Frontiers | Rapid whole genome sequencing methods for RNA viruses View original
Is this image relevant?
Whole-Genome Sequencing | Boundless Biology View original
Is this image relevant?
Frontiers | Combination of whole genome sequencing and supervised machine learning provides ... View original
Is this image relevant?
Frontiers | Rapid whole genome sequencing methods for RNA viruses View original
Is this image relevant?

1 of 3

WGS typically involves fragmenting genomic DNA into small pieces, sequencing these fragments, and then assembling the reads to reconstruct the entire genome
Requires high-throughput sequencing technologies, such as Illumina sequencing platforms (HiSeq, NovaSeq)
Generates large amounts of data, often in the range of hundreds of gigabases per sample
Requires significant computational resources for data storage, processing, and analysis

Advantages vs limitations

Advantages include the ability to detect all types of genetic variations, unbiased coverage of the genome, and the potential for novel discoveries
Limitations include high cost, large data storage requirements, and the need for extensive computational resources and expertise for data analysis
Interpretation of variants of unknown significance (VUS) can be challenging due to the vast amount of data generated

Applications in research

WGS has been extensively used in population genetics studies to understand human genetic diversity and evolution (1000 Genomes Project)
Enables the identification of disease-associated genes and variants in complex genetic disorders (autism spectrum disorders, schizophrenia)
Facilitates the study of cancer genomes to identify somatic mutations, structural variations, and copy number alterations (The Cancer Genome Atlas)
Allows for the exploration of the role of non-coding regions in gene regulation and disease pathogenesis

Clinical applications of WGS

WGS is increasingly being used in clinical settings for the diagnosis of rare genetic disorders, particularly in cases where other genetic tests have been inconclusive
Enables the identification of causative variants in patients with undiagnosed diseases (Undiagnosed Diseases Network)
Facilitates the implementation of precision medicine approaches by providing a comprehensive view of an individual's genetic makeup
Challenges include the interpretation of VUS, the need for extensive genetic counseling, and ethical considerations regarding incidental findings

Exome sequencing

Exome sequencing focuses on the protein-coding regions of the genome, which constitute approximately 1-2% of the total genome
Targets the exons of known genes, where the majority of disease-causing variants are thought to reside

Exome vs whole-genome sequencing

Exome sequencing is less expensive than WGS due to the reduced sequencing target size
Generates smaller datasets compared to WGS, making data storage and analysis more manageable
May miss important variants in non-coding regions that could be captured by WGS

Advantages of exome sequencing

Cost-effective approach for identifying disease-causing variants in known genes
Requires less sequencing coverage compared to WGS, allowing for higher sample throughput
Generates more manageable datasets, facilitating data analysis and interpretation
Has been successful in identifying causal variants for many Mendelian disorders

Limitations of exome sequencing

Limited to the protein-coding regions of the genome, potentially missing important variants in non-coding regions
Relies on accurate exon capture and annotation, which may be incomplete or biased
May not effectively capture structural variations, such as copy number variations (CNVs) or large insertions/deletions
Challenges in interpreting variants of unknown significance (VUS) remain, although less pronounced than in WGS

Applications in disease research

Exome sequencing has been widely used to identify disease-causing genes and variants in Mendelian disorders (cystic fibrosis, Huntington's disease)
Enables the discovery of novel disease-associated genes through family-based studies and case-control designs
Facilitates the study of genetic heterogeneity in complex diseases, such as autism spectrum disorders and intellectual disability
Allows for the identification of rare variants with large effect sizes that may be missed by genome-wide association studies (GWAS)

Clinical exome sequencing

Exome sequencing is routinely used in clinical settings for the diagnosis of rare genetic disorders
Offers a cost-effective alternative to WGS for the identification of disease-causing variants in known genes
Enables the diagnosis of patients with atypical presentations or unclear clinical phenotypes
Challenges include the interpretation of VUS, the need for regular re-analysis as new disease-gene associations are discovered, and the potential for incidental findings

Targeted sequencing

Targeted sequencing focuses on specific regions of interest in the genome, such as a set of candidate genes or disease-associated loci
Allows for the deep sequencing of selected regions, enabling the detection of low-frequency variants and mosaic mutations

Overview of targeted sequencing

Targeted sequencing typically involves the design of custom capture probes or amplicons to enrich for the desired regions of interest
Commonly used technologies include hybridization-based capture (Agilent SureSelect, Roche NimbleGen) and amplicon-based approaches (Illumina TruSeq, Thermo Fisher Ion Torrent)
Requires less sequencing coverage compared to WGS or exome sequencing, allowing for higher sample throughput and reduced costs

Advantages vs whole-genome sequencing

Targeted sequencing is more cost-effective than WGS for studying specific regions of interest
Enables deep sequencing of selected regions, facilitating the detection of low-frequency variants and mosaic mutations
Generates smaller datasets, making data storage and analysis more manageable
Allows for the multiplexing of a larger number of samples, increasing sample throughput

Limitations of targeted sequencing

Limited to the pre-defined regions of interest, potentially missing important variants in other parts of the genome
Requires prior knowledge of disease-associated genes or loci for effective target selection
May not capture structural variations or copy number alterations effectively
Challenges in designing optimal capture probes or amplicons, particularly for regions with high GC content or repetitive sequences

Applications in research

Targeted sequencing is used to study candidate genes or disease-associated loci identified through GWAS or linkage studies
Enables the fine-mapping of disease-associated regions to identify causal variants
Facilitates the study of genetic heterogeneity and the identification of rare variants in complex diseases
Allows for the validation and functional characterization of putative disease-causing variants

Clinical targeted sequencing panels

Targeted sequencing panels are widely used in clinical settings for the diagnosis of specific genetic disorders (hereditary cancer syndromes, cardiovascular disorders)
Enable the simultaneous sequencing of multiple disease-associated genes, improving diagnostic yield and reducing time to diagnosis
Offer a cost-effective alternative to exome sequencing for disorders with well-defined gene panels
Challenges include the need for regular updates of gene panels as new disease-gene associations are discovered, and the interpretation of VUS

Comparison of sequencing strategies

The choice of sequencing strategy depends on the specific research or clinical question, available resources, and desired level of genomic resolution

Whole-genome vs exome vs targeted

WGS provides the most comprehensive view of the genome but is the most expensive and computationally demanding
Exome sequencing focuses on the protein-coding regions and is more cost-effective than WGS, but may miss important non-coding variants
Targeted sequencing is the most cost-effective approach for studying specific regions of interest but is limited to pre-defined targets

Factors influencing strategy selection

Research or clinical question: WGS for novel discoveries, exome sequencing for Mendelian disorders, targeted sequencing for specific gene panels
Sample size and available resources: WGS for smaller cohorts, exome or targeted sequencing for larger sample sizes
Desired level of genomic resolution: WGS for the most comprehensive view, exome sequencing for a focus on coding regions, targeted sequencing for deep coverage of specific regions

Cost considerations

WGS is the most expensive approach, followed by exome sequencing and targeted sequencing
Cost includes library preparation, sequencing, data storage, and analysis
Prices for sequencing have decreased significantly over the years, making WGS and exome sequencing more accessible

Coverage and depth

WGS typically aims for 30-40x coverage of the genome, while exome sequencing may aim for 50-100x coverage of coding regions
Targeted sequencing can achieve very high coverage (>1000x) of selected regions, enabling the detection of low-frequency variants and mosaic mutations
Higher coverage increases the sensitivity for variant detection but also increases sequencing costs

Data storage and analysis requirements

WGS generates the largest datasets (100-200 GB per sample), followed by exome sequencing (5-10 GB per sample) and targeted sequencing (1-5 GB per sample)
Data storage and analysis requirements increase with the size of the dataset
expertise and computational infrastructure are essential for the analysis of sequencing data
Cloud computing platforms (Amazon Web Services, Google Cloud) have emerged as scalable solutions for data storage and analysis

Emerging sequencing technologies

Advances in sequencing technologies continue to shape the field of genomics, offering new opportunities for research and clinical applications

Long-read sequencing

Technologies such as Pacific Biosciences (PacBio) and Oxford Nanopore enable the sequencing of long DNA fragments (>10 kb)
Facilitate the assembly of complex genomic regions, such as repetitive sequences and structural variations
Enable the phasing of genetic variants and the study of alternative splicing events
Limitations include higher error rates compared to short-read sequencing and higher sequencing costs

Single-cell sequencing

Allows for the profiling of individual cells, revealing cellular heterogeneity and enabling the study of rare cell types
Commonly used techniques include plate-based (Smart-seq2) and droplet-based (10x Genomics) approaches
Enables the study of cell lineage, developmental trajectories, and the identification of novel cell types
Challenges include technical noise, limited capture efficiency, and the need for specialized data analysis tools

Spatial transcriptomics

Techniques such as FISSEQ (fluorescent in situ sequencing) and Slide-seq enable the spatial mapping of gene expression in tissue sections
Provide insights into the spatial organization of cell types and the microenvironment in complex tissues
Facilitate the study of tissue heterogeneity and the identification of spatial biomarkers
Limitations include lower throughput compared to traditional RNA-seq and the need for specialized tissue handling and data analysis approaches

Applications of emerging technologies

Long-read sequencing is being used to improve genome assemblies, study structural variations, and resolve complex genomic regions (telomeres, centromeres)
Single-cell sequencing is revolutionizing the study of cellular heterogeneity in cancer, neuroscience, and developmental biology
Spatial transcriptomics is enabling the study of tissue organization and the identification of spatial biomarkers in diseases such as cancer and Alzheimer's disease
Integration of emerging technologies with traditional sequencing approaches is expected to provide a more comprehensive understanding of genome structure and function

Key Terms to Review (19)

Bioinformatics: Bioinformatics is an interdisciplinary field that combines computer science, statistics, and biology to analyze and interpret biological data, particularly in genomics and molecular biology. This field plays a crucial role in managing and analyzing large datasets from various sources, including sequencing technologies, enabling researchers to derive meaningful insights into genetic information, gene expression, and molecular interactions.

Bowtie: In bioinformatics, a bowtie is a popular algorithm and software tool used for aligning short DNA sequences to a reference genome. This approach is highly efficient for handling the massive amounts of data generated by next-generation sequencing technologies, enabling researchers to accurately map reads and analyze genomic variations.

Clinical genomics: Clinical genomics is the application of genomic information and technology to clinical practice, aiming to improve patient care and outcomes through personalized medicine. It integrates genetic testing, data analysis, and patient management to tailor healthcare strategies based on individual genetic profiles, paving the way for targeted therapies and interventions. This approach relies heavily on various sequencing strategies and genomic data interpretation to inform diagnosis, prognosis, and treatment plans.

Confirmation Sequencing: Confirmation sequencing is a process used in genomics to verify the accuracy of DNA sequences obtained through various sequencing methods. This step is crucial as it ensures that the data generated from sequencing, whether whole-genome, exome, or targeted approaches, is reliable and represents the true genetic information of the sample being analyzed. By confirming sequences, researchers can avoid errors that might arise from initial sequencing technologies and can confidently draw conclusions based on the accurate data.

Coverage: Coverage refers to the number of times a particular nucleotide in a genome is sequenced during a sequencing experiment. It is a crucial metric that affects the accuracy and completeness of the resulting genomic data, influencing aspects like sequencing strategies, assembly algorithms, functional annotations, and metagenome analyses. High coverage improves the reliability of variant calls, while low coverage may lead to missing data or incorrect interpretations in genomic studies.

Discovery Sequencing: Discovery sequencing refers to a comprehensive approach to genome sequencing that aims to identify all genetic variations, including novel mutations, across a given genome. This method is crucial in uncovering the genetic basis of diseases and understanding the full genetic landscape of organisms, which can be applied in various sequencing strategies such as whole-genome, exome, and targeted sequencing.

Exome Sequencing: Exome sequencing is a genomic technique that focuses on sequencing all the protein-coding regions of genes in a genome, known as the exome. This approach allows researchers to identify genetic variants that may be responsible for diseases, providing insights into genetic disorders and guiding personalized medicine. By targeting only the exonic regions, exome sequencing reduces the amount of data generated compared to whole-genome sequencing, making it a cost-effective alternative while still providing valuable information.

GATK: The Genome Analysis Toolkit (GATK) is a software package developed by the Broad Institute for analyzing high-throughput sequencing data, primarily focusing on variant discovery in genomic datasets. It plays a crucial role in processing next-generation sequencing (NGS) data and is integral to various sequencing strategies, including whole-genome and exome sequencing, as well as targeted approaches. GATK employs advanced algorithms for reference-guided assembly, enabling accurate detection of structural variations and insertions/deletions (indels) while providing methods to assess linkage disequilibrium among genetic variants.

Insertions and Deletions (indels): Insertions and deletions, commonly referred to as indels, are types of genetic mutations where nucleotides are added (insertions) or removed (deletions) from a DNA sequence. These changes can lead to significant effects on the resulting protein, potentially disrupting normal functions or leading to disease. Indels can affect gene function and are often studied in the context of sequencing techniques and their implications in population genetics, disease associations, and evolutionary studies.

Next-generation sequencing: Next-generation sequencing (NGS) refers to a set of advanced DNA sequencing technologies that allow for the rapid and cost-effective sequencing of large amounts of genetic material. This technology has revolutionized genomics by enabling whole-genome sequencing, exome sequencing, and targeted sequencing, allowing researchers to analyze complex genomes and understand genetic variations more thoroughly.

Population Genomics: Population genomics is the study of genetic variation within and between populations, using genomic data to understand evolutionary processes and population dynamics. This field integrates large-scale sequencing techniques to analyze the genetic composition of populations, facilitating insights into natural selection, migration patterns, and genetic drift, among other evolutionary mechanisms.

Read Depth: Read depth, also known as sequencing depth or coverage, refers to the number of times a particular nucleotide is sequenced during a genomic sequencing process. It is a critical measure that influences the accuracy and reliability of variant calling and the detection of low-frequency mutations. High read depth can enhance the confidence in variant detection, while lower read depths may miss rare variants or lead to ambiguous results.

Sanger sequencing: Sanger sequencing is a method of determining the nucleotide sequence of DNA based on the selective incorporation of chain-terminating dideoxynucleotides during DNA replication. This technique is foundational in genomics and contrasts with newer methods that have drastically increased sequencing speed and efficiency, leading to the development of next-generation sequencing technologies.

Sequencing bias: Sequencing bias refers to systematic errors in the data obtained from sequencing technologies, leading to uneven representation of different regions of the genome or variations within a population. This can occur due to various factors, such as the inherent limitations of certain sequencing methods or sample preparation processes. Understanding sequencing bias is crucial for accurately interpreting genomic data and ensuring reliable biological conclusions.

Single Nucleotide Polymorphism (SNP): A single nucleotide polymorphism (SNP) is a variation at a single position in a DNA sequence among individuals, where a nucleotide in the genome is replaced with another nucleotide. These variations can influence how genes function and are associated with different traits or diseases. SNPs serve as important genetic markers in various applications, such as determining genetic diversity, studying population genetics, and conducting genome-wide association studies (GWAS).

Targeted sequencing: Targeted sequencing is a method that focuses on specific areas of the genome to obtain detailed information about particular genes or regions of interest. This approach is efficient and cost-effective, as it selectively sequences predefined genomic regions rather than the entire genome or exome. By concentrating on specific targets, researchers can gather relevant data that may be associated with certain diseases or traits, making it a valuable tool in both clinical and research settings.

Technical Artifacts: Technical artifacts are the tangible outputs or products generated through scientific and technological processes, especially in the context of research and development. In genomic studies, these artifacts often arise during sequencing processes and can influence the accuracy and reliability of the results. Understanding these artifacts is crucial as they can affect data interpretation, lead to misidentification of variants, and ultimately impact clinical decisions.

Variant calling: Variant calling is the process of identifying variations in the DNA sequence of an organism compared to a reference genome. This step is crucial in genomic studies as it helps to detect single nucleotide polymorphisms (SNPs), insertions, deletions, and other structural variants that can have significant implications for genetic research, disease studies, and personalized medicine.

Whole-genome sequencing: Whole-genome sequencing (WGS) is a comprehensive method for determining the complete DNA sequence of an organism's genome, including both coding and non-coding regions. This technique provides a high-resolution view of genetic variations, enabling researchers to identify mutations, understand genetic diseases, and explore evolutionary relationships. WGS can be contrasted with other sequencing strategies like exome sequencing, which focuses only on protein-coding regions, and targeted sequencing, which looks at specific areas of interest within the genome.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Practice QuizGlossary

Practice Quiz Glossary

1.5 Sequencing strategies (whole-genome, exome, targeted)

Whole-genome sequencing

Overview of WGS

Top images from around the web for Overview of WGS

Top images from around the web for Overview of WGS

Advantages vs limitations

Applications in research

Clinical applications of WGS

Exome sequencing

Exome vs whole-genome sequencing

Advantages of exome sequencing

Limitations of exome sequencing

Applications in disease research

Clinical exome sequencing

Targeted sequencing

Overview of targeted sequencing

Advantages vs whole-genome sequencing

Limitations of targeted sequencing

Applications in research

Clinical targeted sequencing panels

Comparison of sequencing strategies

Whole-genome vs exome vs targeted

Factors influencing strategy selection

Cost considerations

Coverage and depth

Data storage and analysis requirements

Emerging sequencing technologies

Long-read sequencing

Single-cell sequencing

Spatial transcriptomics

Applications of emerging technologies

Key Terms to Review (19)

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Next guide