Introduction to Illumina NGS technology, Library ...dors.weizmann.ac.il/course/course2018/Hadas...
Transcript of Introduction to Illumina NGS technology, Library ...dors.weizmann.ac.il/course/course2018/Hadas...
Introduction to Illumina NGS technology, Library
preparation and Mars-seq
Hadas Keren-Shaul Advanced Sequencing Technologies (Sandbox), LSCF
Introduction to Deep Sequencing Analysis course June 2018
LSCF
SandboxA Playground for Genomic Research and
Science Innovation
• Open 24/7
• Access to Weizmann trained users
Sandbox lab, Levine building, Room 202
Genomic technologies are greatly advancing Biology and Medicine
How do we get access to these technologies?
How do we share the technological developments in Weizmann?
The Interface between cutting edge technologies developed in individual labs to the entire community of Weizmann
• Standardizing custom genomic protocols
• Affordable, accessible to many users
• Hands-on workshops
• Quality assured equipment and consumables
• Troubleshooting and guidance
Bringing advanced genomic technologies to Weizmann scientists
Sandbox Vision
Weizmann Genomic
InnovationSandbox
Weizmann scientists
NGS – Next Generation Sequencing
The research tool to study biological systems with unprecedented throughput, scalability, and speed
Broad range of applications: Sequence whole genomes
Zoom in to deeply sequence target regions
Utilize RNA sequencing to discover RNA variants and splice sites, or quantify mRNAs for gene expression analysis
Analyze genome-wide methylation or DNA-protein interactions
Study microbial diversity in humans or in the environment
NGS rapidly advances over the years
Cost for genome sequencing has drastically declined
Timeline and Comparison of Commercial HTS Instruments
Jason A. Reuter, Damek V. Spacek, Michael P. Snyder Molecular Cell Volume 58, Issue 4, Pages 586-597 (May 2015) DOI: 10.1016/j.molcel.2015.05.004
Developments in high throughput sequencing
PacBioSequel
The NGS Workhorse - Illumina Sequencing By Synthesis
A. Library preparation
Illumina, Sequencing By Synthesis
B. Cluster Amplification
Illumina, Sequencing By Synthesis
C. Sequencing
Illumina, Sequencing By Synthesis
D. Alignment and Data Analysis
https://youtu.be/fCd6B5HRaZ8
Illumina library construction
Illumina sequencing libraries
P5 P7Read1 Read2
i5
i7
Insert to sequence
i7
i5
Paired End (PE) vs. Single read (SE)
Single end (SE) – for each cDNA fragment only one end is readPaired end (PE) – the cDNA fragment is read from both ends
Paired end sequencing
Enables both ends of the DNA fragment to be sequenced Facilitates detection of genomic rearrangements and
repetitive sequence elements, as well as gene fusions and novel transcripts
Better alignment of the reads, especially across difficulty to sequence, repetitive regions of the genome
What should I know before sequencing?
• Library quality and quantity
• Type of library prep method used
• PE or SR?
• How many bp to read:
rd1, rd2, index1 (i7), index2 (i5)
• Run definitions can be made in Illumina website, basespace
Illumina sequencers
Production scale sequencers
Illumina sequencers
Benchtop sequencers
In the Sandbox - NextSeq, Illumina
• Desktop sequencing machine, fast, flexible, high-throughput
• Independent, easy and accessible sequencing 24/7
• Nextseq training• Detailed run protocols• A downstream analysis pipeline, generated by LSCF
bioinformatics, for immediate demultiplexing of samples• Used daily by many Weizmann labs
RNA-Seq - Method of choice to study Gene Expression
Identification of novel transcripts
Less background noise
Greater dynamic range for detection
How to perform RNA-seq?
Most RNA-seq experiments are based on sequencing on DNA molecules instruments:
Capture RNA molecules
Convert RNA to cDNA with defined size range
Add adapter sequences on the cDNA ends for amplification and sequencing
How to perform RNA-seq?
Many different methods for library preparation
Strand specific RNA-seq methods – which DNA strand corresponds to the sense strand of RNA
Wiley Interdisciplinary Reviews: RNA Volume 8, Issue 1, 19 MAY 2016 DOI: 10.1002/wrna.1364
Selection of polyA+ transcripts
The most common application of RNA-seq
In Eukaryotic organisms, most protein coding RNAs (mRNA) contain a poly(A) tail
Technical convenience for enrichment of poly(A)+ transcripts from total cellular RNA (1-5%)
Beads coated with polyT or oligo-dT priming for Reverse transcription (possible 3’ bias)
In non polyadenylated RNAs, such as prokaryotic mRNAs, or fragmented samples from FFPE, it is possible to do rRNAdepletion
Fragmentation of template to ‘fit’ sequencing platform
Fragmentation of RNA before RT – in alkaline solutions or enzymes
Fragmentation of cDNA – acoustic shearing, Dnase, transposons-based tagmentation method by Tn5: fragment cDNA and add adapter sequences at the same time
Tagmentation requires optimization of precise enzyme:DNAratio
Adapters and directionality
Naïve protocols - obtain reads from cDNA fragment. BUT the link with the sense or antisense strand is broken.
Stranded protocols - generate reads from one strand, corresponding to the sense or antisense strand (depending on the protocol)
Wiley Interdisciplinary Reviews: RNA Volume 8, Issue 1, 19 MAY 2016 DOI: 10.1002/wrna.1364
Adding adapters directly to the 5’ or 3’ of the RNA
Incorporating dUTP in the second strand of cDNA
How to extract my RNA?
Step 1: Sample collection and protection
Step 2: RNA preparation
Step 3: QC of isolated RNA
Step 4: Storage of isolated RNA
RNA extraction method needs to be calibrated per project!
Step 1: Sample collection and protection
Lytic agent or denaturant must be in contact with cellular contents when cells are disrupted – problematic if:
Tissues or cells are hard (bones, roots)
Contain capsules or walls (yeast, spores)
It is not possible to process sample immediately after collection - transport from another site, many samples in parallel
RNA stabilization solution, freezing in dry ice/liquid nitrogen
Step 2: RNA preparation
Origin of the sampleTissues high in nucleases, fatty tissues, samples with high amounts of inhibitors
The amount of sample that can be obtainedA cell line (millions of cells), rare FACS sorted population (few thousands of cells)
The amount of RNA requiredDepends on the method of choice for RNA-seq
RNA Extraction methods
• ‘Gold standard for RNA preparation• Sample is homogenized in a phenol-containing solution
(Trizol, Qiazol – for fatty tissues) and centrifuged• Sample is separated into there phases• The upper aqueous phase is recovered and RNA is collected
by alcohol precipitation and rehydration
Organic extractions
Organic Extractions
Benefits – Rapid denaturation of nucleases and stabilization of RNA Scalable format Cheap, simpleDrawbacks – The use and associated waste of organic reagents Laborious and manually intensive processing Difficult to automate Requires a large amount of input sample
Filter-based, Spin Basket format
• Utilize membranes seated at the bottom of a small plastic basket
• Samples are lysed in a buffer that contains RNAse inhibitors (usually guanidine salts), and nucleic acids are bound to the membrane by passing the lysate through the membrane using centrifugal force
• Samples are than washed and eluted • Hybrid methods combine organic extraction with
purification by spin basket
Filter-based, Spin Basket format
Benefits – Convenient and easy to use Scalable format, ability to automateDrawbacks – Propensity to clog with particulate material Retention of large nucleic acids such as gDNA Fixed binding capacity Carryover of salts when using a sub-optimal sample input
Magnetic particle methods
• Small (0.5-1um) magnetic particles that bind DNA/RNA• Samples are lysed in a buffer that contains RNAse inhibitors
and allowed to bind to the magnetic particle• Following magnetization, samples are washed and RNA is
eluted of the magnetic particles
Magnetic particle methods
Benefits – No risk of filter clogging Solution-based binding kinetics increase target capture Rapid, easy to use, ability to automateDrawbacks – Potential carry-through of magnetic particles in the eluted
sample Slow migration of magnetic particles in viscous solution Capture/release of particles can be laborious
Direct lysis methods
• Perform sample preparation, not purification• Utilizes lysis buffer formulations that disrupt samples,
stabilize nucleic acids, and are compatible with downstream analysis
• A sample is mixed with lysis agent, incubated under specific conditions and used directly for downstream analysis
• By eliminating the need to bind and elute from solid surfaces, it avoids bias and recovery efficiency effects
Direct lysis methods
Benefits – Fast and easy Highest potential for accurate RNA representation Can work well with very small samples Scalable, possible to automateDrawbacks – Impossible to perform traditional analytical methods for
RNA yield Dilution based (most useful) Potential for sub-optimal performance, requires
optimization for downstream processes
Step 3: QC of isolated RNA
RNA quantityNanodropQubitReal Time PCR
RNA quality and purityNanodrop
RNA integrityElectrophoresis gel, Bioanalyzer, TapeStation
Nanodrop – for RNA quantity and purity
• Measures UV absorption
• RNA has a maximum absorption at 260 nm
• UV absorbance depends on pH of RNA solution
A260/280 – level of protein contamination Pure RNA =2.1Acceptable: 1.8-2.0
A260/230 – level of salt / organic compounds contamination (guanidine salts and phenols, used in RNA isolation protocols)Acceptable: >1.5
Qubit – measures low input samples
• Fluorometer, provides an accurate and selective method for the quantitation of low-abundance RNA samples.
• Can measure RNA or DNA depending on the kit used
• High sensitivity
RNA- integrity Gel electrophoresis
Degraded RNA will not perform well in dowsntream applications!
Run the on a 1% agarose gel -
28S rRNA band should be ~2-fold 18s
Equal intensity indicates some degradation
mRNArRNA
rRNA
Higher molecular weight bands – can indicated DNA contamination
Smearing below rRNA indicates poor RNA quality
RNA integrity – Bioanalyzer / Tapestation
A miniaturized version of agarose and acrylamide gels
RIN – RNA Integrity Number – quality score for total RNA
10 – maximum RNA integrity
Step 4: Storage of isolated RNA
RNase free environment Storage in -80oC Small aliquots – avoid freeze thaw cycles RNA storage solution (10mMTris-HCl, pH7.5)
Mars-seq – Scalable and sensitive RNA-seq
• Library generation for 3’ RNA seq
• Developed in the lab of Ido Amit
• Low input material (1 ng of RNA)
• Stable, suitable for inexperienced users
• Suitable for a wide variety of species and applications (sortedcells, frozen tissues, etc.)
• Ultra low cost due to custom made reactions
• Simple and efficient due to early pooling of samples
• RNA data < 1 week
• A detailed quality control scheme for library evaluation atdifferent steps prior to sequencing
Jaitin, Kenigsberg, Keren-Shaul et al., Massively Parallel single cell RNA-seq. Science 2014
Library construction- Day 1
Step 1: Reverse transcription
Step 4: Second strand synthesis
3’5’ An
NT20-UMI-barcode-partial rd2-1-T7 promoter 3’ 5’
Step 2: Sample pooling A
NT20-UMI-barcode-partial rd2rev-T7 promoter 3’ 5’ 5’ 3’
Step 5: In Vitro Transcription
Un-UMI-barcode-partial rd2rev 3’ 5’
RNA
cDNA
2nd strand
Legend:
Step 3: Exonuclease I
5' –T7 promoter-Illumina sequences XXXXXXX NNNNNNNN TTTTTTTTTTTTTTTTTTTTN 3'
BC-7bp UMI-8bp
aRNA
Un-UMI-barcode-partial rd2rev 3’ 5’
Step 7: RNA Fragmentation
Step 8: RNA/ssDNA ligation
P5_rd1 forward
primer
Step 10: Amplification + Illumina primers addition by nested PCR
Step 9: Reverse transcription
Step 6: DNaseI
OH OH
OH
v Un-UMI-barcode-partial rd2rev 3’ 5’ 3’ 5’partial rd1rev
Un-UMI-barcode-partial rd2rev 5’ 3’
3’ 5’ partial rd1 primer
P7_rd2 reverse
primer
Library ready for Illumina sequencing
OH Un-UMI-barcode-partial rd2rev 5’
partial rd1
P5 P7
5’ 3’ partial rd2
Library construction- Day 2
Is my library good?
• QC1, QC2, QC3
• Qubit value – ~1-10 ng/ml
• TapeStation profile
Sequencing of Mars-seq libraries
• NextSeq® 500 High Output v2 Kit (75 cycles)
• 75 rd1, 15 rd2, no index
• Pooling up to 80 samples in one run
P5 P7Read1 Read2
Insert to sequence
Read to align to genome
Read 2 to obtain cell and molecule barcodes
Mars-seq Workshop in the Sandbox
• 3 days hands-on workshop
• Standard RNA material
• 1 representative per lab
Thank you for listening
Happy Sequencing