Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... ·...
Transcript of Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... ·...
![Page 1: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/1.jpg)
FindingandCallingGenomeVariants
![Page 2: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/2.jpg)
Outline• Genomevariantsoverview• Miningvariantsfromdatabases
! dbSNP! HapMap! 1000Genomes! Disease/Clinicalvariantsdatabases
• Callingvariantsusingyourowndata! GATKbestpracGces! Samtools(mpileup/bcIools)
2
![Page 3: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/3.jpg)
GenomicVariaGon• PopulaGongeneGcs
" Measure/explaindiversity/heritability
• DiseasesuscepGbility" GWAS" Biomarkers
• VariantsmaycauseaparGculartrait" Regulatoryelement(eg.promoter,enhancer,3’UTRetc.)" Proteincodingsequence(eg.silent,missense,ornonsensemutaGon)
Palstra,RJ.etal(2012)hYp://evoluGon.berkeley.edu/evolibrary/arGcle/mutaGons_06
3
![Page 4: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/4.jpg)
GenomicVariaGon:SequencevsStructuralVariaGon
• SequenceVariants
• StructuralVariants(>50basesormore)
hYp://www.ensembl.org/info/genome/variaGon
Type DescripGon Example(Reference/AlternaGve)
SNP SingleNucleoGdePolymorphism Ref:...TTGACGTA... Alt:...TTGGCGTA...
Inser+on InserGonofoneorseveralnucleoGdes Ref:...TTGACGTA... Alt:...TTGATGCGTA...
Dele+on DeleGonofoneorseveralnucleoGdes Ref:...TTGACGTA... Alt:...TTGGTA...
Subs+tu+on AsequencealteraGonwherethelengthofthechangeinthevariantisthesameasthatofthereference.
Ref:...TTGACGTA... Alt:...TTGTAGTA...
Type DescripGon Example(Reference/AlternaGve) CNV
CopyNumberVariaGon:increasesordecreasesthecopynumberofagivenregion
"Gain"ofonecopy: "Loss"ofonecopy:
Inversion AconGnuousnucleoGdesequenceisinvertedinthesameposiGon
Transloca+on AregionofnucleoGdesequencethathastranslocatedtoanewposiGon(eg.BCR-ABLfusiongene)
4
![Page 5: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/5.jpg)
GenomeVariaGon:IndividualandPopulaGon
• SingleNucleoGdePolymorphisms(SNP)– MAF*>1%commonSNP– MAF*<1%rareSNP– SomedefiniGonsuse5%asthreshold
• Onaverageonevariantevery1200bases(basedonHapMap)
*MinorAlleleFrequency5
![Page 6: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/6.jpg)
GenomeVariaGon:Reference
Organism Descrip+on/Strain Assembly*Human DNAisolatedfromWBCof4anonymousindividuals
(2malesand2females).However,themajorityofthesequencecamefromoneofthemaledonors
GRCh37/GRCh38
Mouse C57BL/6J GRCm37/GRCm38C.elegans N2 WormBasevWS220Fruitfly ISO1 BDGPRelease5Yeast S288C SGDFeb2011A.thaliana Colecotype TAIR10
*Availablein/nfs/genomes 6
![Page 7: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/7.jpg)
Describing/AnnotaGngVariants• Generalguidelines*
" noposiGon0" rangeindicatedby“_”(eg.586_591)
• DNA" g.957A>T(toincludechromosomeusechr9:g.957A>T)" g.413delG" g.451_452insT" InCDS,
! c.23G>C! +1isAofATG(startcodon);-1istheprevious/upstreamnucleoGde! “*”isthestopcodon(eg.*1isthefirstnucleoGdeofthestopcodon)
• RNA" r.957a>u
• Protein(three/oneleYeraa)" p.His78Gln
*Forcompletelist/guidelinesseehgvs.org
ChrPosi+on Ref AltSourceg.change:rsID:Depth=AvgSampleReadDepth:Func+onGVS:hgvsProteinVariant1689824989 G T EVS g.89824989G>T:rs140823801:Depth=141:missense:p.Q993K
7
![Page 8: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/8.jpg)
GenomeVariaGonDatabases:dbSNP
• RepositoryforSNPsandshortsequencevariaGon(<50bases)• Currentbuild:dbSNP150(Feb2017)
" Approx.135Mvalidatedrs#’sforhuman!MostlygermlinemutaGons(smallersubsetofsomaGc) #!Containsrarevariantsaswell #
" Variousorganisms(Supportfornon-humanorganismsendingSept1st.)
• EachSNP,orrecord,isidenGfiedbyanrs#thatincludes" SummaryaYributes" NCBIresources(linkedtoClinVar,GenBank,etc.)" Externalresources(linkedtoOMIMandNHGRIGWAS)
• SubmissionsaremadefrompubliclaboratoriesandprivateorganizaGons(ss#’s),andidenGcalrecordsareclusteredintoasinglerecord(rs#’s).
• rsidissamefordifferentassemblies(eg.GRCh37/38),butchromosomalcoordinatesmaydiffer!
8
![Page 9: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/9.jpg)
Hands-on:dbSNP
• Miningvariantsfromdatabases• FindingSNPsforyourfavoritegeneindbSNP
9
![Page 10: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/10.jpg)
GenomeVariaGonDatabases:1000GenomesProject
• ExtensionoftheHapMapin2008tocataloguegeneGcvariaGonbysequencingatleast1000parGcipants
• DiscoverpopulaGonlevelhumangeneGcvariaGons• IniGallyconsistedofwholegenomelowcoverage
(4X)andhighcoverageexome(20X)sequencing• VCFformatwasdeveloped,andiniGally
maintained,fortheproject• Phase3containsWGSdatafor2504individuals
across26populaGons.
hYp://www.internaGonalgenome.org/ 10
![Page 11: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/11.jpg)
MiningDisease/ClinicalVariantsDatabase Link
CatalogofPublishedGWAS(NHGRI) hYps://www.ebi.ac.uk/gwas/
GWASCentral gwascentral.org
ClinVar(NCBI) ncbi.nlm.nih.gov/clinvar
PheGenI(NCBI) ncbi.nlm.nih.gov/gap/phegeni
SNPedia snpedia.com
11
![Page 12: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/12.jpg)
MiningDisease/ClinicalVariantsinCancer:COSMIC
• hYp://cancer.sanger.ac.uk/cosmic• CatalogofSomaGcMutaGonsinCancer(COSMIC)
createdin2005• v70(Aug2014)had~2McodingpointmutaGons• Datasetsarecuratedfrompublishedliteratureandotherdatabases(eg.TCGA,ICGC)
• AvailableinbothGRCh37/38coordinates• Tools/Features"CancerGeneCensus(currently572genes)"Browser:Cancer/CellLine"COSMICMart(similartoBioMart) 12
![Page 13: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/13.jpg)
Callingvariantsfromsequencedatarequires3broadsteps
Preparedata;QC,align,SAM->BAM,sort,removePCRduplicates
Annotateforfunc+on;snpEff,HaploReg,GTEx
Callvariants;basequalityscorecalibraIon,variantcall,qualityfiltering
1
3
2
13
![Page 14: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/14.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
14
![Page 15: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/15.jpg)
Checkreadqualitywithfastqc
(hYp://www.bioinformaGcs.babraham.ac.uk/projects/fastqc/)
Alignreadstoreferencegenome• UseasensiGve(gapped)alignertoaccountforlargeindels
(BWA,hYp://bio-bwa.sourceforge.net/)*.
*SeeBaRCSOPsforusageinstrucGons. 15
![Page 16: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/16.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
16
![Page 17: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/17.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
PicardTools
17
![Page 18: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/18.jpg)
ConvertSAM->BAMandsortreadsbycoordinates(hYps://broadinsGtute.github.io/picard/)
• PicardTools:AddOrReplaceReadGroups• SO=coordinate<-sortsmappedreadsbycoordinate.
• PicardTools:MarkDuplicates• Thiscommandflagsallduplicatereadsinfile.• ThisflagisrecognizedbysamtoolsmpileupandGATK
HaplotypeCaller.• Bydefault,readswiththistagwillbeignored.
18
![Page 19: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/19.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
PicardTools
19
![Page 20: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/20.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
Samtools GATK
PicardTools
20
![Page 21: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/21.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
Samtools GATK
PicardTools
Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality
21
![Page 22: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/22.jpg)
Samtoolsmpileup
• ThempileupcommandscanseveryposiGonsupportedbyanalignedreadandrecordsthepossiblegenotypes.
• Moreover,everyGmeamappedreadhasamis-matchtothereferencegenome,itincorporatesinformaGon,suchasthenumberofreadsthatsharethemis-match,thequalityofthebaseatthatposiGon,andtheexpectedsequencingerrorrates.
• Itthencomputestheprobabilitythateachofthesegenotypesistrulypresentinthesample.
22
![Page 23: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/23.jpg)
BasequalityscorerecalibraGon*(BQSR)
• QualityscoresproducedbysequencersaresubjecttosystemaGctechnicalerror,thatmayleadtoover-orunder-esGmatedbasequalityscores.
• BQSRisaprocessthatappliesmachinelearningtomodeltheseerrorsempiricallyandadjustthequalityscoresaccordingly.
• Forexample,foragivenrun,whentwoAnucleoGdesinarowarecalled,thenextbasecalledhada1%higherrateoferror.SoanybasecallthatcomesaIerAAinareadshouldhaveitsqualityscorereducedby1%.
*hYps://gatkforums.broadinsGtute.org/gatk/discussion/44/base-quality-score-recalibraGon-bqsr
23
![Page 24: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/24.jpg)
CallingVariants
24
![Page 25: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/25.jpg)
CallingVariants:QuesGonableCalls
25
![Page 26: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/26.jpg)
CallingVariants:QuesGonableCalls
26
![Page 27: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/27.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
Samtools GATK
PicardTools
Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality
27
![Page 28: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/28.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
Samtools GATK
PicardTools
Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality
BcIoolscall HaplotypeCallerCallvariants
28
![Page 29: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/29.jpg)
bcIoolscall
• ThebcNoolscallcommandusesthegenotypelikelihoodsgeneratedfromsamtoolsmpileuptocallvariants,andoutputsallidenGfiedvariantsinvariantcall(VCF)format.
29
![Page 30: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/30.jpg)
GATKHaplotypeCaller
• WhenHaplotypeCallerencountersaread-mappedregionshowingsignsofvariaGon,itdiscardstheexisGngmappinginformaGonandcompletelyreassemblesthereadsinthatregion.
• ThisallowstheHaplotypeCallertobemoreaccuratewhencallingregionsthataretradiGonallydifficulttocall,forexamplewhentheycontaindifferenttypesofvariantsclosetoeachother.
• Foreachregion,itperformsapairwisealignmentofeachreadagainsteachhaplotype.Thisproducesamatrixoflikelihoodsofhaplotypes.ThemostlikelyalleleforeachposiGonisassigned.
• HaplotypeCallerisabletocorrectlyhandlethesplicejuncIonsthatmakeRNAseqachallengeformostvariantcallers.
30
![Page 31: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/31.jpg)
VCFFormat
www.1000genomes.org
• VariantCallFormat(VCF);BCF$ binaryversionofVCF• Textfileformatwithmeta-informaGonandheaderlines,
followedbydatalinescontaininginformaGonaboutaposiGoninthegenome.
31
![Page 32: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/32.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
Samtools GATK
PicardTools
Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality
BcIoolscall HaplotypeCallerCallvariants
32
![Page 33: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/33.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
Samtools GATK
PicardTools
Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality
BcIoolscall HaplotypeCallerCallvariants
VariantQualityScoreRecalibraGon(VQSR)vcIoolsvcf-annotate Filtervariants
33
![Page 34: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/34.jpg)
VcIooolsvcf-annotate
• VcNoolsvcf-annotateisawaytohardfilteryourcalledvariantsusing“standard”qualitythresholdsorthroughuser-specifiedthresholds.! vcf-annotate -f + myFile.vcf > myFile_annot.vcf
! “+”appliesseveralfilterswithdefaultvalues,eg.! QualINTMinimumvalueoftheQUALfield[10]! MinDPINTMinimumreaddepth[2]
34
![Page 35: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/35.jpg)
GATKVariantqualityscorerecalibrator(VQSR)
• VQSRassignsawell-calibratedprobabilitytoeachvariantcallinacallsetwhichcanbeusedtofilterforhighqualityvariants.
• VQSRachievesthisbytakingareferencesetitassumestobe“true”variants(Hapmap)andbuildsadistribuGonoftheirqualitymetrics.Thisisusedtobuildamodelofwhata“true”variantshouldlooklike.
• ThismodelthenassignsarecalibraGonqualityscoretoyourvariants.Thehigherthisscore,thegreateritsfittothe“true”model.
• Thetoolallowsforthese�ngof“Tranches”orthresholdsthattheoreGcallyallowyoutorecover100%,99%,90%,etcoftheTruevariantsinthetrainingset.Youcanfilteryourresultsonthismetrictoachievegreater/reducedspecificity/sensiGvity.
35
![Page 36: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/36.jpg)
GATKVariantqualityscorerecalibrator(VQSR)cont’d
• Caveats:• ThisproceduremustbeperformedforSNPsandINDELs
separately.• Itdoesnotworkfororganismsforwhichno“true/training”data
setsareavailable.• Thepowerofthismethodisdependentofthe#ofreads.Exome
and/orlowcoverageexperimentsmayproducemanylow-qualityvariantcalls.
• SeetheGATKbestpracGcesformoreinformaGononapplyingthis
method• hYps://soIware.broadinsGtute.org/gatk/documentaGon/
arGcle.php?id=2805
36
![Page 37: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/37.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
Samtools GATK
PicardTools
Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality
BcIoolscall HaplotypeCallerCallvariants
VariantQualityScoreRecalibraGon(VQSR)vcIoolsvcf-annotate Filtervariants
37
![Page 38: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/38.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
Samtools GATK
PicardTools
Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality
BcIoolscall HaplotypeCallerCallvariants
VariantQualityScoreRecalibraGon(VQSR)vcIoolsvcf-annotate Filtervariants
Assessforrare/commonvariants
38
![Page 39: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/39.jpg)
AnnotatecommonSNPsinyourdata
• BedtoolsintersectcanbeusedtoannotatevariantsfromyourcallsetthatoverlapwithvariantsfoundindbSNP.• intersectBed-wao-split-aA_reads.bt2.sorted_unique.raw.vcf-b
SNP146.bed>A_reads.bt2.sorted_unique.annotated.vcf
39
![Page 40: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/40.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
Samtools GATK
PicardTools
Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality
BcIoolscall HaplotypeCallerCallvariants
VariantQualityScoreRecalibraGon(VQSR)vcIoolsvcf-annotate Filtervariants
Assessforrare/commonvariants
40
![Page 41: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/41.jpg)
1)Preparedata
3)AnnotateforFuncGon
2)Callvariants
QCreads&Aligntoreference
Samtools GATK
PicardTools
Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality
BcIoolscall HaplotypeCallerCallvariants
VariantQualityScoreRecalibraGon(VQSR)vcIoolsvcf-annotate Filtervariants
Assessforrare/commonvariants
VariantAnnotaGons41
![Page 42: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/42.jpg)
CallingVariants:AnnotaGon
• Annotatevariantswith(funcGonal)consequence
eg.chr12:g25232372A>Gisamissensevariant• PopulartoolsincludesnpEff,andVariantEffectPredictor(VEP)fromEnsembl• ChoiceofannotaGonmayaffectvariantannotaGon" RefSeq" Ensembl" GENCODE
42
![Page 43: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/43.jpg)
AnnotaGonofnon-codingvariaGon
• HaploreghYp://archive.broadinsGtute.org/mammals/haploreg/haploreg.php
• SNPscanbevisualizedwith
• ChromaGnstateandproteinbindingannotaGonfromtheRoadmapEpigenomicsandENCODEprojects.
• SequenceconservaGonacrossmammals,theeffectofSNPsonregulatorymoGfs,andtheeffectofSNPsonexpressionfromeQTLstudies.
![Page 44: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/44.jpg)
Hands-on:Haploreg
• IdenGfyingthepotenGalfuncGonofnon-codingvariants.
44
![Page 45: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/45.jpg)
Hands-on:Samtools:ExamineCalledvariants
• AnalyzecalledvariantsinIGV.
45
![Page 46: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/46.jpg)
BaRCSOP
• VariantcallingusingSamtoolsandGATK.ManipulaGng/interpreGngVCFfiles
hYp://barcwiki/wiki/SOPsunderVariantcallingandanalysis
46
![Page 47: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/47.jpg)
ResourcesForMiningVariantsDatabase LinkdbSNP www.ncbi.nlm.nih.gov/SNP
HapMap hapmap.ncbi.nlm.nih.gov
1000Genomes 1000genomes.org
UK10K uk10k.org
ExomeVariantServer(EVS) evs.gs.washington.edu/EVS
PersonalGenomeProject(Harvard) personalgenomes.org
ExACBrowser(Broad) exac.broadinsGtute.org
47
![Page 48: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/48.jpg)
ResourcesForMiningVariants:Cancer
Database LinkInternaGonalCancerGenomeConsorGum(ICGC)
icgc.org
CatalogueofSomaGcMutaGoninCancer(COSMIC)
cancer.sanger.ac.uk
cBioPortalforCancerGenomics cbioportal.org
CancerCellLineEncyclopedia(CCLE) broadinsGtute.org/ccle
48
![Page 49: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/49.jpg)
ResourcesForMiningVariants:Plants
• 1001Genomes(A.thaliana1001strains)" 1001genomes.org
• 1000Genomes(large-scalegenesequencingofatleast1000plantspecies)" www.onekp.com
49
![Page 50: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20. · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue](https://reader035.fdocuments.us/reader035/viewer/2022062403/6038b6d697b58c7b587172f3/html5/thumbnails/50.jpg)
VariantCallingworkflow
• PleaseseeourVariantCallingwalkthroughexercisehere:• hYp://jura.wi.mit.edu/bio/educaGon/hot_topics/GenomeVariants_Jul2017/Genome_Variant_calling_walkthrough.txt
• WithinyouwillfindthecommandsrequiredforcallingvariantswithbothsamtoolsandGATK.
50