BLUPf90 & PreGS and Quality Controlnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=pregsqc_sa.pdf–...

28
BLUPf90 & PreGS and Quality Control

Transcript of BLUPf90 & PreGS and Quality Controlnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=pregsqc_sa.pdf–...

  • BLUPf90&PreGSandQualityControl

  • PreGSf90

    •  Interfaceprogramtothegenomicmoduletoprocessthegenomicinforma@onfortheBLUPF90familyofprograms

    •  Efficientmethods–  crea@onofthegenomicrela@onshipmatrix,rela@onshipbasedonpedigree

    –  Inverseofrela@onshipmatrices

    •  PerformsQualityControlofSNPinforma@on

  • BLUPF90programsusingGenomic

    •  Genomicprograms– controledbyaddingOPTIONScommandstotheparameterfile

    – OPTION SNP_file marker.geno.clean

    – Read2files:•  marker.geno.clean •  marker.geno.clean_XrefID

  • OutputFiles•  GimA22i

    –  Storethecontentoftheinv(G)–inv(A22)–  OnlyifpreGSf90forruns,notinapplica@onsprograms

    •  freqdata.count–  Containsthees@matedallelefrequencybeforeQC

    •  freqdata.count.aUer.clean–  Containsallelefrequenciesasusedincalcula@ons,removecode–  ForremovedSNPthesewillbezero

    •  Gen_call_rate–  Listofanimalsremovedbylowcallrate

    •  Gen_conflicts–  ReportofanimalswithMendelianconflicts

  • QualitycontrolBydefaultexclude

    •  MAF– SNPwithMAF<0.05

    •  Callrate– SNPwithcallrate<0.90–  Individualswithcallrate<0.90

    •  Monomorphic– ExcludemonomorphicSNP.ONLYwhenMAF0

  • QualitycontrolBydefaultexclude(cont)

    •  Parent-progenyconflicts(SNP&Individuals)– Exclusion->oppositehomozygous– ForSNP:>10%ofparent-progenyexclusionfromthetotal ofpairsevaluated

    – ForIndividuals:>1%ofparent-progenyfromtotalnumberofSNP

  • Controldefaultvalues

    •  ForMAF– OPTIONminfreqx

    •  Callrate– OPTIONcallratex– OPTIONcallrateAnimx

    •  Mendelianconflicts– OPTIONexclusion_thresholdx– OPTIONexclusion_threshold_snpx

  • Parent-progenyconflicts

    •  Presenceoftheseconflictsresultsinanega@veHmatrix!!!

    •  Problemsines@ma@onofvariancecomponentbyREML,programsdoesnotconverge,etc.

    •  Solu@on:–  Reportallconflicts,withcountsforeachindividualasparentorprogenytotracetheconflicts

    –  Removeprogenygenotype•  maybenotthebestop@on•  Butresultsinaposi@ve-definiteHmatrix!!!

  • Parent-progenyconflicts

    •  OPTIONverify_parentagex–  0:noac@on–  1:onlydetect –  2:detectandsearchforanalternateparent;nochangetoanyfile.Notimplemented

    •  implementedinseekparen*90program–  3:detectandeliminateprogenieswithconflicts(default)

  • OtherOp@ons•  Exclusionofselectedchromosomes:

    – OPTIONexcludeCHRn1n2n3...

    •  Inclusionofselectedchromosomes:– OPTIONincludeCHRn1n2n3...

    •  Excludesamplesfromanalyses

    – OPTIONexcludeSamplen1n2n3

    •  Informwhicharesexchromosomes:– OPTIONsex_chrn–  Chromosome#>nwillbeexcludedonlyforHWEandparent-progenychecks,butnotincalcula@ons

  • SNPmapfile•  OPTIONchrinfo•  Forsomegenomicanalyses(GWAS)orQC

    •  Format:–  SNPnumber

    •  IndexnumberofSNPinthesortedmapbychromosomeandposi@on

    –  chromosomenumber–  Posi@on–  SNPname(Op@onal)

    •  FirstrowcorrespondstofirstcolumnSNPingenotypefile!!!

  • Saving‘clean’files•  SNPexcludedfromQCaresetasmissing(i.e.Code=5)•  ExcludedIndividualsaretreatedasunrealatedinGandA22

    –  ForindividualiG[i,:]=0;G[:,i]=0;G[i,i]=1;SameforA22soG-A22willcancelout

    •  OPTIONsaveCleanSNPs•  SavecleangenotypedatawithexcludedSNPandindividuals

    –  ForexampleforaSNP_filegt–  Cleanfleswillbe:

    •  gt_clean•  gt_clean_XrefID

    –  Removedwillbeoutputinfiles:•  gt_SNPs_removed•  gt_Animals_removed

  • Inspec@onofDiagonalofG

    ¨  HighdiagonalelementsfromG¤ Mislabedsamples,individualsfromotherpopula@ons/lines

    ¤ Problemswithsample,lowcallrate

    ¤ Bydefaultvalues>1.6areexcludedfromanalysis,Thresholdcanbechangedwith:

    OPTIONthreshold_diagonal_gxSimeoneetal.,2011JABG

  • Poten@alduplicatesamples•  Allsamplesarecheckedwitheachotherusingvaluesfromgenomic

    rela@onshipmatrix–  x=G(i,j)/sqrt(G(i,i),G(j,j))

    –  Valuesofx>0.90areprintedintheoutput

    •  Thresholdtoiden@fypoten@alduplicates

    –  OPTIONthreshold_duplicate_samplesx

    •  Excludespecificsamples–  OPTIONexcludeSamplen1n2….

  • Correla@onoff-diagonalGvsA•  Computecorrela@onforallelementsofA>0.02•  Poten@alproblemswithmatchinggenotypefileandpedigree

    file•  Forlowvalues(printawarning!!!!•  Forlowvalues(programstop!!!•  Ifs@llyouwanttogo…

    –  OPTIONthrStopCorAG-1

  • Lowoff-diagonalcorrela@onHalf-sibscontemporarygroup

    OppositeHomozygousOff-diagonalGvsA22

    [email protected]

  • Lookingforstra@fica@oninpopula@ons

    •  OPTIONplotpca–  (onlypreGSf90notinapplica@on

    programs)

    –  Plotthefirst2PC•  OPTIONextra_info_pcafilenamecol

    –  Filewithvariables(alphanumeric)toplotPCwithdifferentcolorsfordifferentclasses

    –  Sameorderasgenotypefile

  • LDcalcula@onandop@ons

  • NoQualitycontrol

    •  ONLYuse:–  IfQCwasperformedinapreviousrun– and“clean”genotypefileisused

    •  OPTIONno_quality_control

  • Op@onsforBlendingGandA•  OPTIONAlphaBetaalphabeta

    –  G=alpha*Gr+beta*A

    •  OPTIONtunedG–  0:noadjustment–  1:mean(diag(G))=1,mean(offdiag(G))=0–  2:mean(diag(G))=mean(diag(A)),mean(offdiag(G))=mean(offdiag(A))(default)–  3:mean(G)=mean(A)–  4:UseFstadjustment.Powelletal.(2010)&Vitezicaetal.(2011)

  • Useinapplica@onprograms

    •  Userenumf90forproperrenumeringandcrea@onofcrossreferenceidandparameterfile

    •  Iflargenumberofgenotypes– Precomputeinv(G)-inv(A22)(PreGSF90)– ModifyparameterfiletoreadGimA22i– BLUPF90,AIREMLF90,GIBBSxF90….

    •  Generallyallstepscanbeinascriptfiletofacilitaterunningprograms

  • PreGSf90wiki

  • seekparen}90

    •  Programtocheckandassignpaternityusinggenomicinforma@on

    •  Detectparent-offspringincompa@bili@esbasedoncountsofconflicts(oppositehomzygousOHM)– Hayes2010JAS– Wiggansetal2010JDS

  • Paternitycheck

    131701210202201111012220220220100101211210102110112211110112200210210022000222010012001110111211110108489 0205101100112211110000201110120221110002205212101010111120221210012020202012122112111202110120006305 2211211200122211211020021100020212102202011210220010211101200211021000202111011121221201112210106310 121020121021111222021022020001021010121221120111111121221021522015111021115100210150055021221020

    Genotype codes: 0 Homozygous 1 Heterozygous

    2 Homozygous 5 NoCall

    Missing dam genotype 3 putatives sires Assumption that 1 conflict is genotyping error

    Sire Candidates

    Progeny

  • seekparen}90

  • seekparen}90•  Mul@pleSNPChips

    --chips

    •  SNPFile

  • seekparen}90

    •  Exclusion/inclusionSNP– ListofSNPnamestobeexcludedorincludedinparentagetes@ng

    --exclude_snp--include_snp

    •  Callrate--thr_call_rate

  • OutputfilesParent_Conflicts_Stat

    Check_Parent_Pedigree.txt

    Seek_Sire.txt