Next generation sequencing: new possibilities in medicine · NGS (Next Generation Sequencing) No...
Transcript of Next generation sequencing: new possibilities in medicine · NGS (Next Generation Sequencing) No...
Next generation sequencing:new possibilities in medicine
Attila KeresztInstitute of Biochemistry
12th April, 2017
„Practice-oriented, student-friendly modernization of the biomedical education for strengthening the international competitiveness of the rural Hungarian universities”TÁMOP-4.1.1.C-13/1/KONV-2014-0001
DNA SEQUENCING
DNA sequencing is the process of reading nucleotide bases in a DNA molecule
GENOME SEQUENCINGSequencing the whole genetic material of an organism
1953: Structure of DNA1973: First sequence of 24 bp published (lac Operator)
1977: Phage ΦX-1741995: Haemophilus influenzae
1996: Methanococcus jannaschii1996: Saccharomyces cerevisiae
1997: Escherichia coli1998: Caenorhabditis elegans
2000: Drosophila melanogaster2000: Arabidopsis thaliana2001: Homo sapiens (draft)
2002: Mus musculus2006: Homo sapiens (complete)
2009: 1000th prokaryotic genome (complete)
Need for library preparation in a host
• Labour and time - intensive, expensive
• Toxic regions are not represented
• Host genome contaminations
Low throughput
• strand synthesis and base determination are separated
• need for electrophoretic step
• high unit cost (cost/bp)
No need for library preparation in a host
• immobilized template fragments, PCR methods
• labour, time and cost effective
High throughput
• several millions-billions of sequencing /run
• synthesis and sequencing are not separated
Sanger (First generation) sequencing
NGS (Next Generation Sequencing)
No competition, but complementation
Long read, low coverage
Short read, huge coverage(Second generation)
THE EVOLUTION OF GENOME SEQUENCING
Very longong read, large coverage(Third generation)
FIRST GENERATION DNA SEQUENCING: LIBRARY CONSTRUCTION IN A HOST
FIRST GENERATION DNA SEQUENCING:SEQUENCE ASSEMBLY STRATEGY
FIRST GENERATION DNA SEQUENCING TECHNOLOGYIS BASED ON THE ACTIVITY OF DNA POLYMERASE
FIRST GENERATION DNA SEQUENCING TECHNOLOGYIS BASED ON THE ACTIVITY OF DNA POLYMERASE
FIRST GENERATION DNA SEQUENCING TECHNOLOGYIS BASED ON THE ACTIVITY OF DNA POLYMERASE
AND THE USE OF DIDEOXY NUCLEOTIDES (SANGER)
N
NN
N
NH2
O
HOH
HH
HH
OPH
O-
O
POP-O
O
O-
O
O-
N
NN
N
NH2
O
HH
HH
HH
OPO
O-
O
POP-O
O
O-
O
O-
dATP ddATP
The 3’ hydroxyl has been changed to a hydrogen in ddNTP’s, which terminates a DNA chain because a phosphodiester bond cannot form at this 3’ location
CHAIN TERMINATION
A
C
G
T
FIRST GENERATION DNA SEQUENCING TECHNOLOGY:4 reactions: each contains 4 dNTPs and 1 ddNTPprimer or one dNTP radioactively labeledDNA molecules of different length are separated by gel electrophoresis
FIRST GENERATION DNA SEQUENCING TECHNOLOGY:4 reactions: each contains 4 dNTPs and 1 ddNTPPrimers in the 4 reactions are labeled with different fluorescent dyesDNA molecules of different length are separated by gel electrophoresis
A
C
G
T
A CG T
FIRST GENERATION DNA SEQUENCING TECHNOLOGY:1 reaction: contains 4 dNTPs and 4 ddNTPsthat are labeled with 4 different fluorescent dyesDNA molecules of different length are separated by gel/capillary electrophoresis
C Y C L E S E Q U E N C I N GC Y C L E S E Q U E N C I N G
SECOND GENERATION DNA SEQUENCING: LIBRARY CONSTRUCTION WITHOUT A HOST
ISOLATE DNA/RNA TO BE SEQUENCEDISOLATE DNA/RNA TO BE SEQUENCED
FRAGMENTATION OF DNALIGATION OF ADAPTORS, PRIMERS (BARCODE)
SIZE SELECTION
EMULSION PCR SOLID-PHASE PCR
PYRO-SEQUENCING
454Roche
SEMICONDUCTORSEQUENCING
IonTorrentLife Technologies
SEQUENCINGBY LIGATION
SOLiDLife Technologies
REVERSIBLETERMINATORSEQUENCINGIllumina Solexa
sequencingin picowells
sequencingon solid surface
sequencingby synthesisby ligation
LIBRARY PREPARATIONLIBRARY PREPARATION
CLONAL AMPLIFICATION OF THE LIBRARYCLONAL AMPLIFICATION OF THE LIBRARY
LIBRARY PREPARATIONISOLATE DNA/RNA RANDOM FRAGMENTATION
END REPAIRADAPTORSPRIMERS
LIGATION
SIZESELECTION
CLONALAMPLIFICATION
CLONAL AMPLIFICATION: EMULSION PCR
CLONAL AMPLIFICATION: SOLID-PHASE PCRcluster generation for Illumina sequencing
originaltemplate
originaltemplate
newstrand
newstrand
CLONAL AMPLIFICATION: SOLID-PHASE PCRcluster generation for Illumina sequencing
4 nucleotides flow sequentially
CYCLE SEQUENCING35-1000cycles
A
step1
add A
detect A
washaway A
C
step2
add C
detect C
washaway C
G
step3
add G
detect G
washaway G
T
step4
add T
detect T
washaway T
PYROSEQUENCING (ROCHE 454)
C
T
A
G
4 nu
cleo
tides
flow
seq
uent
ially
hν
PYROSEQUENCING (ROCHE 454)SIGNAL DETECTION (PYROGRAM)
Parameters Roche GS Junior Roche GS FLXRead length 700 nt 700 ‐1000 ntReads per run 100 000 1 000 000Throughput 70 Mbp 700 Mbp
SEMICONDUCTOR SEQUENCING(LifeTechnologies IonTorrent)
No camera, just a pH sensor in each well
SEMICONDUCTOR SEQUENCING
SEMICONDUCTOR SEQUENCING
SEMICONDUCTOR SEQUENCING
SEMICONDUCTOR SEQUENCING
SEMICONDUCTOR SEQUENCINGION TORRENT
ProtonIon 314 Chip Ion 316 Chip Ion 318 Chip PI Chip
Read length 200 nt/400 nt 200 nt/400 nt 200 nt/400 nt 200 ntReads per run 400‐550 thousand 2‐3 million 4‐5.5 million 60‐80 millionThroughput 30‐50/60‐100 Mbp 300‐600/600‐1 000 Mbp 0.6‐1/1.2‐2 Gbp up to 10 Gbp
ParametersPersonal Genome Machine (PGM)
Illumina SolexaREVERSIBLE TERMINATOR SEQUENCING
Library amplification on solid surface
ILLUMINA SEQUENCINGreversible terminator sequencing
fluorescently labeled3′-blocked reversible terminators
ILLUMINA SEQUENCINGreversible terminator sequencing
3′-blockedreversible
terminators
incorporationof a singlenucleotide
detectionof the
fluorescence
cleavage offluorophore,
blocker
ILLUMINA SEQUENCINGdata processing
ILLUMINA SEQUENCINGsysytems with different throughputs
Parameters MiSeq NextSeq500 HiSeq2500 HiSeq4000 HiSeq XRead length up to 2x300 nt up to 2x150 nt up to 2x125 nt up to 2x150 nt up to 2x150 nt
Clusters per run 22‐25 million 400 million 4 billion 4.3‐5 billion 5.3‐6 billionThroughput 13‐15 Gbp 100‐120 Gbp 0.9‐1 Tbp 1.3‐1.5 Tbp 1.6‐1.8 Tbp
SEQUENCING BY LIGATIONLifeTechnologies (SOLiD)
ACCAGTTG
SEQUENCING BY LIGATIONLifeTechnologies (SOLiD)
SEQUENCING BY LIGATIONLifeTechnologies (SOLiD)
SEQUENCING BY LIGATIONLifeTechnologies (SOLiD)
SEQUENCING BY LIGATIONLifeTechnologies (SOLiD)
A CC AG TT G
SEQUENCING BY LIGATIONLifeTechnologies (SOLiD)
SEQUENCING BY LIGATIONLifeTechnologies (SOLiD)
A TC GG CT A
SEQUENCING BY LIGATIONLifeTechnologies (SOLiD)
SEQUENCING BY LIGATIONLifeTechnologies (SOLiD)
SEQUENCING BY LIGATIONLifeTechnologies (SOLiD)
SEQUENCING BY LIGATIONLifeTechnologies (SOLiD)
SEQUENCING BY LIGATIONLifeTechnologies (SOLiD)
Parameters SOLiD5500xlRead length 1x75 or 2x50Reads per run 3.2 billionThroughput 160‐320 Gbp
THIRD GENERATION DNA SEQUENCING: SINGLE MOLECULE SEQUENCING
Pacific Biosciences has developedSingle Molecule Real Time (SMRT™)
DNA sequencing technology
Oxford Nanopores developed 'Strand sequencing' which is a technique that passes
intact DNA polymers through a protein nanopore, sequencing in real time as the DNA
translocates the pore.
3rd GENERATION SEQUENCINGPacific Biosciences
3rd GENERATION SEQUENCINGPacific Biosciences
SMRT Cell:Contains arrays of thousands of zero-mode waveguides (ZMWs).
A ZMW is a cylindrical hole, tens of nanometers in diameter, fabricated using semiconductor manufacturing technologies using 100 nm metal film deposited on a
transparent silicon dioxide substrate.
Each ZMW becomes a nanophotonic visualization chamber–blocking light from penetrating past just a few nanometers due to the phenomenon of waveguide cutoff well
known in microwave engineering. This provides a detection volume of just ~100 zeptoliters (10-21 liters).
Limit of detection zone
ZMW with 1 DNA polymeraseattached to te bottom
3rd GENERATION SEQUENCINGPacific Biosciences
4 dNTPs labeled with different, phospholinked fluorophores
3rd GENERATION SEQUENCINGPacific Biosciences
backgroundsignal fromfluorescentdNTPs fromabove the
detection zone
DNA poly-merase bindsfluorescentdNTP thatresults inlight burst
DNA poly-merase
cleaves offphospholinked
linkedfluorophore
DNA poly-merase bindsfluorescentdNTP thatresults inlight burst
DNA poly-merase
cleaves offphospholinked
linkedfluorophore
3rd GENERATION SEQUENCINGPacific Biosciences
Simple workflow: library preparation then sequencing (no amplification)
DNA ISOLATION
DNA FRAGMENTATION
END REPAIR, HAIRPIN LIGATION
3rd GENERATION SEQUENCINGPacific Biosciences
Key advantages of SMRTbell templates:structurally lineartopologically circularstructural homogeneity of templatesprovides sequences of both forwardand reverse strands in the same trace
3rd GENERATION SEQUENCINGOxford Nanopore
3rd GENERATION SEQUENCINGOxford Nanopore
Nanopore sensingIonic current flows through the pore
Introduce analyte of interest into the poreIdentify target analyte by the characteristic disruption of the electrical current
3rd GENERATION SEQUENCINGOxford Nanopore
adding hairpin to one fragment end allows sequencing of both
strands
3rd GENERATION SEQUENCINGOxford Nanopore
3rd GENERATION SEQUENCINGOxford Nanopore
3rd GENERATION SEQUENCINGOxford Nanopore
3rd GENERATION SEQUENCINGOxford Nanopore
NEXT GENERATION SEQUENCINGAPPLICATIONSdiscovery phase
I. GENOME SEQUENCING:1. de novo genome sequencing
no previous sequence information availableusually combining long and short reads
2. genome re-sequencingsequencing the whole genome and compare it to a reference sequence
3. targeted re-sequencingsequence selected parts of the genome and compare it to a reference sequence
hybridization and PCR based methods
4. determination of DNA modifications (epigenetics)bisulphite sequencing (identification of 5-methyl-cytosine sites)
PacBio sequencing: NT modifications are recognized based on polymerase kinetics
identification of mutations, structural variants: 2., 3.
targeted re-sequencingSureSelect target enrichment (Agilent)
an example
NEXT GENERATION SEQUENCINGAPPLICATIONSdiscovery phase
fragmented DNA labeledprobes (baits)
unbound fraction(discarded)
hybridization
target capture
target recovery
sequencing
NEXT GENERATION SEQUENCINGAPPLICATIONSdiscovery phase
II. TRANSCRIPTOME SEQUENCING:
RNA sequencingsequencing the RNA pools of cells, tissues, organs
micro/small RNA sequencingsequencing the small RNA pools of cells, tissues, organs sequence
Deep-SAGE/CAGE sequencingSAGE: Serial Analysis of Gene Expression, CAGE: cap analysis gene expression
sequencing tags from the 3’ or the 5’ ends from mRNA pools of cells, tissues, organs
Ribosome profilingsequencing of ribosome-protected mRNA fragments
investigation of the expressed genomegenome-wide or targeted comparison of gene expression profiles
between different cells, tissues, organs, conditions...
NEXT GENERATION SEQUENCINGAPPLICATIONSdiscovery phase
III. DNA-PROTEIN INTERACTIONS:
1. Chromatin-immunoprecipitation sequencing (Chip-Seq)after DNA fragmentation and protein-DNA crosslinking protein-bound fragments are
isolated with the help of specific antibodieshiston modifications (acetylation, methylation, phosphorilation, ubiquitination)
DNA binding proteins/transcription factors
2. MNase sequencingMicrococcal nuclease (MNase) digests naked DNA
nucleosome-associated DNA is protected, enriched and sequenced
3. ATAC sequencingassay for transposase-accessible chromatin using sequencing (ATAC-seq)
captures open chromatin sitesbased on direct in vitro transposition of sequencing adaptors into native chromatin
NEXT GENERATION SEQUENCINGAPPLICATIONSdiscovery phase
ATAC-SEQ
NEXT GENERATION SEQUENCING APPLICATIONS: in clinical practice
INHERITED DISEASESOver 6000 monogenic inherited diseases
CFTR: Cystic fibrosis transmembrane conductance regulatorhigh occurance of carriers
FDA approved NGS detection method for 139 clinically relevant CFTR variants
TueSight One Sequencing panel (not for diagnostic purposes but it can help...)4813 clinically relevant genes associated to a clinical phenotype
Ion AmpliSeq Inherited Disease Panel (not for diagnostic purposes but it can help...)Broad survey of significant genetic disease genes with extensive 300+ gene panel
Human leukocyte antigen (HLA) typingHLAs play important role in the distinction of self and non-self cells
(infectious dieseases, graft ejection during transplantation, autoimmunity)difficult to typedue to the high levels of sequence homology
but NGS provides accurate, unambiguous,phase-resolvedHLA typing in a single assay
Many more to come(autism, cardiomyopathy, sudden cardiac arrest ...)
NEXT GENERATION SEQUENCING APPLICATIONS: in clinical practice
IN VITRO FERTILIZATION AND NONINVASIVE PRENATAL DIAGNOSTICSproblem of aneuploidy (abnormal number of a chromosome)
Preimplantation Genetic Screening (PGS) for In Vitro Fertilization (IVF)Chromosome aneuploidy (abnormal number of chromosomes) is a major cause of
in vitro fertilization (IVF) failure, pregnancy loss, and, in rare cases, abnormal pregnancyThe VeriSeq PGS Kit uses NGS on the Illumina MiSeq System to screen all 24
chromosomes for aneuploidy in a single assay. The assay can be used on a single cell or a few cells from an embryo.
Non-invasive prenatal testing (NIPT)During the early stages of pregnancy, fetal cfDNA represents approximately 3% of the
genomic content found within maternal plasma DNAThrough the power of NGS, this fetal DNA can be analyzed to identify potential
chromosomal aberrations and the gender of the fetus (XX, XY)(T21 Down syndrome, T18 Edwards syndromeT13 Patau syndrome, Monosomy X).
sequencing 28 million tags (1x25 bp) per sample
NEXT GENERATION SEQUENCING APPLICATIONS: in clinical practice
CANCER GENOMICScancer: disease of the genome
certain mutations predispose to cancermutations affect the progression of the disease and the prognosis of the patient
there are targeted therapies with >200 drugscertain mutant oncogene proteins are targeted by given drugs
certain mutations cause the inefficacy of given drugs
Cancer panelsto detect germline mutationsto detect somatic mutations
to detect mutations associated with certain cancer types(for example, colon, lung, myeloid, ...)
Thank you for your attention!
This work is supported by the European Union, co-financed by the European Social Fund, within the framework of " Practice-
oriented, student-friendly modernization of the biomedical education for strengthening the international
competitiveness of the rural Hungarian universities " TÁMOP-4.1.1.C-13/1/KONV-2014-0001 project.