Post on 25-May-2015
Plant DNA Barcoding using matKsome work on new primer sets
Dr. Alan Forrest
Prof. Pete HollingsworthRoyal Botanic Garden Edinburgh
Damon Little, New York Botanic GardenAron Fazekas, University of GuelphGao Lian-Ming, Kunming Institute of BotanySean Graham, University of British ColumbiaMehrdad Hajibabaei, CCDB, University of GuelphMaria Kuzmina, CCDB, University of Guelph
Hollingsworth, Graham, Little (2011). "Choosing and using a plant DNA barcode." PLoSONE 6: e19254.
Angiosperms: matK baselineHow good are the current “best” matK primers?
Ca. 10K PCR & sequencing attempts from 5 labs:
Kim 1R+3F = 72% success (N=9424)
2-step protocol: Kim 1R+3F and 390F+1326R: 80% success
Poorly performing orders include Malpighiales, Piperlaes, Poales, and Myrtales (especially Melastomataceae)
*ACDB African Centre for DNA Barcoding, University of Johannesburg, South Africa
*CCDB Canadian Centre for DNA Barcoding, University of Guelph, Canada
*KIB Kunming Institute of Botany, Chinese Academy of Sciences, China
*NYBG New York Botanic Garden, USA
*UBC University of British Columbia, Canada
Angiosperms: 3 approaches to improve matK retrieval
1) ePCR of existing published primers against ca. 10K matK sequences
Genetic algorithms to search for new primers
2) CODEHOP: COnsensus DEgenerate Hybrid Oligonucleotide Primer
Primer cocktails with a degenerate ‘core’ coupled with variant 3’ triplets for all known exact matches in GenBank
3) New primers/combinations tested alongside existing primers:1R+3F KJ Kim, unpublished
390F+1326R Cuenoud et al (2002) Am J Bot 89
472F+1248R Yu et al (2011) J Syst Evol 49, 1-6
xF+MALPR1 New combination (Ford et al 2009; Dunning & Savolainen 2010)
398Fb4+1311R CODEHOP; this study
matK primer location
Angiosperms: the test sample
5 Plates of samples• Wide taxonomic sample: N=470 • 52/61 orders and 172 families sensu APG3• All samples previously sequenced for rbcL• DNA extractions standardized, concentration equilibrated
A 188 samples from accessions that worked previously for 1R+3F (retain current success rates)
B 188 samples from accessions that failed previously for 1R+3F (improve on current success rates)
C 94 samples from 5 orders that performed particularly poorly (check that the nightmare groups are fixed)
Angiosperms: testing different protocols
PCR: different additives (acetamide, betaine, BSA, DMSO, DTT, formamide, glycerol, sulfolane, trehalose, 2-pyrrolidone, CES solution) primer and magnesium concentrations, annealing time and temperature
Best results: Platinum Taq polymerase, 1M betaine, 0.2M trehalose
PCR clean-up: nothing, Qiagen columns, ExoSAP-IT (neat and dilute)no clean-up = poor sequence quality Best results: ExoSAP-IT (dilute 1:10)
Sequencing PCR:Different additives tested (nothing, betaine, DMSO, trehalose, BDX64)Best results: 0.2M trehalose increased read length by up to 150bp
Full details of tests available from Alan Forrest, to be posted on Connect
Angiosperms: PCR results from different primer pairs
Worked Failed Badbefore before clades
Collaborating labs: Total A B CrbcL 100% 100% 100% 100%matK 1R+3F 40% 100% 0% 0%
Test lab:rbcL 99% 99% 98% 97%matK 390F+1326R 71% 79% 63% 71%matK 1R+3F 85% 97% 85% 63%matK 398Fb4+1311R 86% 87% 87% 83%matK 472F+1248R 88% 93% 92% 71%matK xF+MALPR1 91% 94% 92% 85%
Angiosperms: 2-step matK PCR amplification
1st Round 2nd Round Samples amplified1R+3F 390F+1326R 91%390F+1326R 398Fb4+1311R 90%xF+MALPR1 390F+1326R 93%xF+MALPR1 398Fb4+1311R 94%1R+3F 398Fb4+1311R 95%472F+1248R 1R+3F 95%472F+1248R 390F+1326R 95%xF+MALPR1 1R+3F 96%472F+1248R 398Fb4+1311R 97%xF+MALPR1 472F+1248R 98%
Angiosperms: 2-step protocol results: xF+MALPR1 & 472F+1248R
470 samples sequenced
High quality bi-directional reads obtained for 94% samples (96% inc. single reads, 97% inc. Phusion recoveries)
Complete failures: 3 (all failed for rbcL)Sequence failures: 17 low quality unable to contig
Of these failures, 10 subsequently recovered with Phusion Taq, but 3 were potentially pseudogenes
Single reads: 9Contaminants/Mix ups: 15
Of these, 7 are contaminants when sequenced with rbcL as well8 are matK problems, but ok for rbcL
Contaminants as fails: success 91% (92% inc. Phusion recoveries)Contaminants as missing: success 96% (97% inc. Phusion recoveries)
Angiosperms: recommended work flow
Dilute DNA 1:10
1st ROUND: all samplesPCR matK primers xF+MALPR1
1M betaine, 0.2M trehalose, Platinum Taq
Clean successful PCR products
Sequence clean PCR products0.2M trehalose
Acquire samples and extract DNA
2nd ROUND: all PCR and SEQ failures3F+1R or 472F+1248R
1M betaine, 0.2M trehalose, Platinum TaqPCR and SEQUENCE rbcL
Clean successful PCR products
Sequence clean PCR products0.2M trehalose
>95% matK sequence success rate
ALL poor quality sequences/mononucleotide motifsPCR and sequence matK primers xF+ERIR
1M betaine, 0.2M trehalose, Phusion Taq
Angiosperms: recommendations and protocols
• PCR using a good quality thermostable Taq polymerase– fewer amplicons obtained with cheaper alternatives
• Clean-up amplicons and sequence using 0.2M trehalose• Poor sequences due to mononucleotide motifs can be
sequenced using Phusion Taq and primer xF+ERIR
Online resources:matK barcoding protocols will made be available on ConnectOrdinal alignments available for specific primer design for
problematic taxaStatistics on primer mismatch and mono-nucleotide motifs
available sorted by taxon
Angiosperms: matK barcode summary
The 2-step protocol recommended here allowed >90% of samples from a wide taxonomic range to be sequenced for matK
Need to assess whether this is robust to different laboratory environments and plant groups
The Guardian, 17th November, 2007
Gymnosperms: matK barcodes
Gymnosperms include ca. 1100 speciesMany economically/ecologically important and/or rare taxa
Full length matK alignment for primer design:>800 accessions representing all genera downloaded from GenBank
Gymnosperm matK quite conserved:conserved priming sites can be located, but divergent in Gnetales
Sample set:All 86/86 genera (N=119) including Ginkgo
sensu Christenhusz et al (2011) Phytotaxa 19, 55-70
Gymnosperms: matK barcodes
All gymnosperms: N=95 N=16 N=8Conifers Cycads Gnetophytes
rbcL 89% 100% 100%A GYMF1A+R1A 86% 100% 38%B1 GYM-F+GYM-R 86% 100% 25%B2 GNE-F+GNE-R na na 88%matK A+B 95% 100% 100%
7 failures in conifers for matK also failed for rbcL suggests primer mismatch not the reason for failure
Recommendation:1st round PCR and SEQ with GYM-F1A+R1A, 2nd round PCR and SEQ using GYM-F+GYM-R for conifers and cycads,
and GNE-F+GNE-R for gnetophytes
Ferns & allies: matK barcodes
Ferns and allies include ca. 10,000 speciesca. 90% of these are Polypodiales
Full length matK alignment for primer design:159 accessions representing all major groups derived from several published and unpublished sources
Fern matK very variable:difficult to locate conserved sites for primer design
Variability means potentially useful barcode:Recent publication* supports use of rbcL + matK as the core fern
barcode, but further empirical utility tests required
Sample set:14/14 orders and 44/48 families (N=95)
sensu Christenhuz et al (2011) Phytotaxa 19, 7-54
*Li et al (2011) PLoS ONE 6, e26597
Ferns & allies: matK barcodes
ePCR and manual examination of alignment failed to locate any universal priming sites:Primers therefore designed at the ordinal level
Cyatheales: Single primer pair amplifies 100% (8/8 accessions)
Polypodiales: 81% successfully sequencedSingle primer pair amplifies 43/57 accessions with 2nd primer pair adding 3 accessions 5/15 failures also failed for rbcL
Primers for lycophytes and earlier diverging orders designed but as yet untested
Liverworts: matK barcodes
Liverworts include ca. 5000 known speciesca. 90% of these are leafy liverworts
Full length matK alignment for primer design:56 accessions representing all major groups including many de novo sequences
Liverwort matK very variable:difficult to locate conserved sites for primer design
Variability means potentially useful barcode
Sample set:15/15 orders and 74/82 families (N=94) sensu Crandall-
Stotler et al (2009) Edin J Bot 66, 1-44
Two-step approach:A Best single primer pair gives 72% B Four primer pairs representing major clades used
separately on failures from step 1: complex thalloids (400 spp.), simple thalloids 1 (200 spp.), simple thalloids 2 (150 spp.), leafy (4300 spp.)
Using these 4 primer pairs as a cocktail gave lower PCR success
rbcL 100% successmatK A plus B results in 90% successFailures include early diverging Treubiales and Calobryales
(only ca. 20 spp.)Full length matK sequences are the rate limiting step
Liverworts: matK barcodes
Mosses: matK barcodes
Mosses include ca. 12,800 speciesGreatest numbers and diversity in Hypnales
Full length matK alignment for primer design:66 accessions representing all major groups including many de novo sequences
Moss matK quite conserved compared to ferns and liverworts:conserved priming sites located and range of primer pairs tested
matK barcode utility unknown:lack of moss matK primers has precluded any meaningful comparisons with other markers
Sample set:29/30 orders and 92/111 families (N=107) sensu Goffinet &
Shaw
Mosses: matK barcodes
rbcL 100% PCR success
matK: 4 primer pairs testedBest primer pair sequences 82% (Best 2-step = 94%, all 4 primers =
98%)
However:All mosses except Sphagnum contain a mononucleotide motif in the
centre of the barcode region, which is difficult to sequence across.
Phusion Taq polymerase alleviates the problem, but PCR is more difficult to optimize
Best primer pair sequences 62%
Best 2-step = 75%, best 3-step = 82% (Hypnales = 85%)
2-step protocol = >95%
2-step protocol = >95%
2-step protocol = ca. 80% Polypodiales 1-step protocol = 100% CyathealesLycophyte and early-diverging lineage primers require testing
1-step protocol = >80%
3-step protocol = >80%Further primer optimization required
2-step protocol = ca. 90%Further primer optimization required
Acknowledgements
Collaborating laboratories:
Damon LittleNew York Botanic Garden
Sean GrahamUniversity of British Columbia
Gao Lian-Ming, Li De-ZhuKunming Institute of Botany
Maria KuzminaMehrdad Hajibabaei
CCDB, University of Guelph
Aron FazekasUniversity of Guelph
Suppliers of data and samples:
Olivier Maurin, Michelle van der Bank
ACCB, University of Johannesburg
Harald SchneiderNatural History Museum, London
Dietmar Quandt, Susann WickeNees Institute, University of Bonn
Fay Wei Li, ChunNeng Wang, otherNational Taiwan University
Paul WolfUtah State University
Juan Carlos VillarealUniversity of Conneticut