Automated Workflow for the Generation of Mitogenome … · Joseph Ring, Kimberly Sturk-Andreaggi,...
Transcript of Automated Workflow for the Generation of Mitogenome … · Joseph Ring, Kimberly Sturk-Andreaggi,...
Automated Workflow for the Generation of Mitogenome Reference Data
Joseph Ring, Kimberly Sturk-Andreaggi, Charla MarshallQIAGEN Investigator Forum
May 3, 2018
Disclaimers
2
• The opinions or assertions presented hereafter are the private views of the speaker(s) and should not be construed as official or as reflecting the views of the Department of Defense, its branches, the Defense Health Agency, the U.S. Army Medical Research and Materiel Command or the Armed Forces Medical Examiner System.
• Mention of commercial products does not constitute a recommendation or endorsement by the speakers and/or their associated organization/institute. Commercial equipment, instruments and materials are identified in order to specify experimental procedures as completely as possible, and does not imply that any of the commercial products identified are necessarily the best available for the purpose.
INTRODUCTION
3
Mitogenome Databasing
• From 2011-2014, the AFDIL undertook an NIJ grant-funded project for Sanger full mitogenome databasing
• 588 samples were sequenced and analyzed over that timespan– Lab processing performed on a Hamilton
STARplus and STARlet– Redundant review involving at least 2 individual
scientists
4
84% 87% 91% 98%
0%
20%
40%
60%
80%
100%
HV1 HV1/HV2 CR mtG
Unique Haplotypes Shared Haplotypes
Mitogenome Benefits
• Increased power of discrimination– Less common types
5
Impact of the Region
• 283 mitogenomesgenerated with NGS– U.S. populations
(Caucasians, Hispanics, African Americans)
– Compared HV1/HV2 and mitogenome haplogroups
• assigned with HaploGrep2
6
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
HV1/HV2Identical Less Precise
Same Clade Different Clade
Ancestry(HV-mtG)
Less Precise
Same Clade
Different Clade
European-African 0 0 2Asian-African 0 0 2Asian-European 0 0 1Asian-NA 11 3 2
Incorrect Maternal Ancestry
• 21 samples predicted a different maternal ancestry with HV1/HV2 than with the mitogenome
7
Next Generation Sequencing
• Increases mitogenome sequencing feasibility– High throughput
• Increased sensitivity– Mixtures– PHPs
• More quantitative than Sanger sequencing– Heteroplasmy
• Automated bioinformatic analysis– Eases analysis burden
8
Illumina MiSeq
Thermo Fisher Ion S5
Reference Database
• Implementation of mitogenome sequencing requires more reference data for statistics– Around 1000 “forensic-quality” mitogenomes will
be available to search in EMPOP
• AFDIL was awarded an NIJ grant to sequence 5,000 high-quality mitogenomes– Nearly all were previously processed in NIJ-
funded control region (CR) databasing project • Can be used as QC check
– Proposed processing and analysis time of 3 years9
PROCESSING OVERVIEW
10
Pre-Amplification
Extraction
• QIAampDNA Blood Mini
Amplification
• Two, ~8.5 kb overlapping amplicons
11Fendt et al. 2009https://www.qiagen.com/us/shop/sample-technologies/dna/genomic-dna/qiaamp-dna-blood-mini-kit/#orderinginformation
Amplicon Quantitation
• Fragment Analyzer• Combine amps in
equal concentration
Pooled Amplicon
Purification
• AMPure XP• Quant pooled
product with Varioskan LUX
Library Preparation
• Roche KAPA HyperPlus
• Equal sample input
Post-Amplification
12
Fragment Analyzer
Varioskan LUX
Roche’s KAPA HyperPlus Library Prep
• HyperPlus utilizes three incubation steps with an optional library PCR step
• Accommodates a wider range of DNA input than Nextera XT– Nextera validated at
AFDIL for HQ samples– Produces more
uniform sequencing coverage as well
• Using half-volume reactions 13DNA input range of 1 - 1000 ng
Fragmentation
End Repair
Adapter Ligation
A-Tailing
Library PCR (optional)
Post-Amplification
Library Prep
• Optional library amplification
Library Purification
• AMPure XP• Quant with
Fragment Analyzer
Library Pooling
• Quant pool with Roche KAPA qPCR
Sequencing • MiSeq 1x150 cycle* (~17 hr run)
14
*May switch to 2x300 cycle (~65 hr run)Longer run times, but longer reads and higher read coverage
Illumina MiSeq
0 2 4 6 8 10 12 14Total Processing Time (hours)
Hands Off Time
Manual Processing Time
Amplicon Quantitation
Pooled Amp Purification
Library Preparation
Library Purification
Library Pooling
Sequencing Setup
All post-amplification steps are now fully automated on a Hamilton STARplus liquid handler robot
15
Post-Amplification: 96 Samples
Hands On Time – 9 hrsHands Off Time – 5.5 hrsTotal Time – 14.5 hrs
AUTOMATED METHOD
16
Hamilton STARplus
17
Deck Layout
18
On Deck Thermal Cycler (ODTC)
Comfort Lid
Deck Layout
19
Plate Shaker
Chilled Plate and Tube Carriers
Magnet Plate
Additional Features• Barcoding and
worklisting– Method can read plate
and tube barcodes for sample tracking
– Automated worklist handling for easy cherry picking of samples
• All pipetting information is logged in the run report
20https://www.hamiltoncompany.com/~/media/Images/Robotics/Products/Accessories/Barcode-reader.ashx
Additional Features
• Total Aspiration and Dispense Monitoring (TADM)– Allows for complete
monitoring of all pipetting steps
– TADM export file can help determine if failure was due to pipetting error
21
-1600
-1400
-1200
-1000
-800
-600
-400
-200
0
200
0 100 200 300 400 500 600 700
AFDIL_Kapa_50uLF96_CleanSample_SurfaceEmpty 25.0 uL1 - 96
Dispensed some air rather than all sample
Library Prep Run Parameters• HyperPlus or Hyper Prep
– Lower quality samples
• Can choose to process 1-96 samples
• Adapters can be pipetted from a plate or tubes
• Fragmentation time, AMPure ratios, and library amplification cycles are customizable– Optimization work or
different sample types22
Side Methods
• All post-amplification steps are now fully automated
• Saves considerable hands on processing time
23
Automated Processing Time
24
0 2 4 6 8 10 12 14
Total Processing Time (hours)
Hands Off Time
Amplicon Quantitation
Pooled Amp Purification
Library Preparation
Library Purification
Library Pooling
Sequencing Setup
0 2 4 6 8 10 12 14Total Processing Time (hours)
Hands Off Time
MANUAL
AUTOMATED
Hands On Time – 9 hrsHands Off Time – 5.5 hrsTotal Time – 14.5 hrs
Hands On Time – 2 hrsHands Off Time – 11 hrsTotal Time – 13 hrs
INITIAL TESTING
25
Poor Fragmentation
• Initial testing showed poor fragmentation
• Conditions based on full-volume method that Hamilton had already developed
• Due to insufficient mixing of the fragmentation master mix
Representative Sample Library - Manual
Representative Sample Library - Automated
Increased Mixing Volume
Full Plate Testing
27
Representative Sample Library - Manual
Representative Sample Library - Automated
0
10
20
30
40
50
60
70
80
90
100
Manual Automated
Ave
rage
Lib
rary
Con
cent
rati
on (n
M)
Fragment Analyzer Smear Analysis(125-700 bp)
Data Comparison
Run Cluster Density (K/mm2)
Clusters Passing Filter (%) % Reads ≥ Q30 Reads Passing
Filter (M)
Manual 1272 90.38 90.33 27.42
Hamilton 1034 93.09 82.13 23.21
28
MiSeq Run Metrics
Sample Mapping Metrics – Average of All SamplesCondition Mapped Reads Mean Coverage Minimum Coverage
Manual 205,637 1178 412
Hamilton 169,053 978 339
Data Comparison
Run Cluster Density (K/mm2)
Clusters Passing Filter (%) % Reads ≥ Q30 Reads Passing
Filter (M)
Manual 1272 90.38 90.33 27.42
Hamilton 1034 93.09 82.13 23.21
29
• All sample variant calls were concordant between processing conditions– One sample appears to have been mixed at the extract level– Five samples completely failed or had partial sequencing
coverage• Consistent with amplification results and between conditions
MiSeq Run Metrics
Sample Mapping Metrics – Average of All SamplesCondition Mapped Reads Mean Coverage Minimum Coverage
Manual 205,637 1178 412
Hamilton 169,053 978 339
Manual (Normalized) 174,064 997 349
DATA ANALYSIS
30
QIAGEN CLC Genomics Workbench
• Tentative Workflow:– Trim sequences– Map to rCRS– Realign indels– Variant calling
• 100X min. read coverage • 5X min. variant count• 5% min. variant frequency
• Utilizes custom AQME tool for forensic profile generation and haplogrouping
31
Workflow takes ~1 hour to analyze 96 samples
AQME Toolbox(AFDIL-QIAGEN mtDNA Expert)
32
Realign Variants Create Mitochondrial Variant Table
Mitochondrial Haplogrouper
Variant Track
mtDNA Table
Realigned Variant
Track
Profile Report
Coverage Table
Haplogroup Report
History
Excel spreadsheet PDF documentText file
XML file
Read MappingCLC Input
AQME Tool
AQME Output
Export
mtDNA Table
33
Original Data Forensic Profile
Haplogroup Assignment
34
Hg I4a
Phylogenetic Nomenclature
35
Hg M1a1d
Phylogenetic Nomenclature
36
Hg M1a1d
Artificial Recombination
37
Sample 1 (H7a) = LR Amp A7 J1c3 SNPs in LR Amp A• 2 found• 5 missing• 1 private mutation
Sample 2 (J1c3) = LR Amp B20 J1c3 SNPs in LR Amp B• 19 found• 1 “uncertain” missing• 2 private mutations (1
PHP)
Hg J1c3 Missing SNP Found SNP Private/Ignored SNP
Hg H7
Hg J1c3
A
B
B
SAMPLE REPROCESSING
38
Sample Reprocessing• Many of these samples are 10+ years old
– May not be high enough quality for LR amplification• Use an alternate amp strategy amenable to lower quality
samples– Also used on a subset of the samples for QC purposes
• Including heteroplasmy confirmation
Thermo Fisher’s Precision ID mtDNA Whole Genome Panel
QIAGEN’s QIAseq Targeted DNA Human Mitochondrial Panel
Precision ID
• Utilizes 162 primer pairs and 283 degenerate primers across two multiplex reactions
• Average amplicon size of ~160 bp
• Not designed for Illumina sequencing– Roche’s KAPA Hyper
Prep kit – QIAseq 1-Step
Amplicon Library Prep kit
40
Fragmentation
End Repair
Adapter Ligation
A-Tailing
Library PCR (optional)
Hyper PrepQIAseq 1-Step
QIAseq Targeted mtDNA Panel
• Adapter ligation– UMI (MB)– P7 adapter with N7 index
P7
P5
i7
i5
Seq Read
P7i7
Seq ReadUMI
Traditional QIAseq
41https://www.qiagen.com/us/shop/sample-technologies/dna/genomic-dna/qiaseq-targeted-dna-panels/#productdetails
QIAseq Molecular Indices
• Unique Molecular Indices (UMIs) are 12 bp of randomized bases
• Attaches a unique (412
possibilities) sequence to each DNA fragment
• Can distinguish false variants due to PCR and/or sequencing error
42QIAGEN, QIAseq Targeted DNA Panel Handbook (R2), 2017.
Sequencing ReadsAGTCTTTCCCA-GTCAGTCGAGTCTTTTCCA-GTCAGTCGAGTCTTTCCCA-GTCAGTCGAGTCTTTCCCAAGTCAGTCGAGTCTTTCCCA-GTCAGTCG
Bioinformatic “Super Read”AGTCTTTCCCAGTCAGTCG
UMI Benefit• Each UMI represents a
single molecule– Multiple sequences
generated on the MiSeq– May contain PCR and
sequencing errors
• Bioinformatically combine all UMI sequences– Represented by consensus
sequence– Map the “super reads”– Eliminate all PCR and
sequencing errors• Damage or mutation
authentic to the sample DNA
QIAseq Procedure
• Target enrichment by Single Primer Extension– 222 mtDNA-specific
primers with tail
44https://www.qiagen.com/us/shop/sample-technologies/dna/genomic-dna/qiaseq-targeted-dna-panels/#productdetails
Sample Testing
45
• Tested Precision ID and QIAseq on 15 whole blood samples– Ring et al. In Review.
Electrophoresis• High profile concordance
between kits – Except low-level variants
• Many NUMT-associated variants (NAVs)
• Amount of NAVs due to differences in kit chemistries
• Most NAVs were removed bioinformatically– Consensus sequence
mapping and NAV filtering
0
828
62
0
200
400
600
800
1000
LR PrecisionID
QIAseq
Num
ber o
f NAV
s*
*Above 5% variant frequency
TIME & COST
46
0 10 20 30 40 50 60 70 80 90 100
NGS Mitogenome
Sanger Mitogenome
Sanger Control Region
Days
Sample Prep Instrument Analysis*
Estimated Time
47* First and second analyses required; analyses performed by 2 different analysts
Extract In Profile out For 96 Samples
10 days
88 days
25 days
Estimated Cost
48
Extract ProfileFor 96 Samples
$- $100,00 $200,00 $300,00 $400,00 $500,00 $600,00
NGSmtGenome
SangermtGenome
SangerControl Region
Cost per Sample
Reagents/Consumables Analysis (Analyst Time)*
*Analyst salary assumed to be $40/hour; estimate for illustration purposes only
$67.29
$524.58
$90.63
FUTURE PLANS
49
Library Prep Method
• Further optimization needed– Determine optimal amplicon input and adapter
concentration for the method• Still have adapter dimer left in libraries
– Contamination assessment
• Validation of the method for casework use– Family reference samples
Pre-Amplification Automation
• Modify current CR automated amp setup for LR amp– Hamilton STARlet
• Development of an automated extraction method on the STARlet– Hamilton [MPE]2
positive pressure module
51[MPE]2 Positive Pressure Extraction & Evaporation Module Brochure. Lit. No. F0032 v1.3. 2015.
Acknowledgments
• QIAGEN for the invitation to speak• AFMES-AFDIL Emerging Technology Section
– Cassie Taylor– Erin Gorden– Jennifer Higginbotham
• Hamilton Company– Brandon Bare for method development
• AFMES-DoD DNA Operations– Dr. Timothy McMahon and Lt Col Laura Garner
• Armed Forces Medical Examiner System– Col Finelli and Lt Col Alice Briones
Questions?