1
LastupdatedNovember8,2016
AMPT2DKnowledgePortalSubmitterandAnalysisGuideforDataattheDCC
ContentsExecutiveSummary......................................................................................................................................3
SummaryofMilestonesforDataSubmissiontotheDataCoordinatingCenter......................................5
ContactingtheDCC..................................................................................................................................5
Introduction.................................................................................................................................................5
AMPandtheAMPT2DKnowledgePortalOverview...................................................................................6
TypesofDataRequestedfortheAMPT2DKnowledgePortal................................................................6
OverviewofDataAggregationandAnalysisProcess...............................................................................7
PoliciesandDataUse...............................................................................................................................7
SubmittingDatathatCannotEntertheUnitedStates.............................................................................8
DataTransferAgreement.........................................................................................................................8
PreparingforDataSubmissiontothePortal................................................................................................8
RequiredandRequestedFiles..................................................................................................................9
1. AMPDCCDataIntakeForm..........................................................................................................9
2. Analysisresultfiles.......................................................................................................................9
3. PrimaryGenotypeDataFileTypes.............................................................................................10
4. IntensityfilesforSNParraydata................................................................................................10
5. Readfilesforsequencingdata...................................................................................................11
6. PhenotypeData..........................................................................................................................11
OverviewoftheDataIntake,Analysis,andDepositionProcess................................................................12
DataTransfer..........................................................................................................................................12
DescriptionofProject/Cohort................................................................................................................13
SummaryStatisticsOnly.........................................................................................................................13
DataQCandAnalysisatDCC..................................................................................................................14
QCProcessattheDCC........................................................................................................................15
AssociationAnalysisProcessattheDCC............................................................................................15
2
LastupdatedNovember8,2016
DataDepositionandRelease.................................................................................................................16
PublicationPolicy.......................................................................................................................................16
AppendixA:AMPDCCDataIntaketoDataDepositinAMPT2DKnowledgePortal.................................18
AppendixB:AMPDCCDataIntakeForm...................................................................................................19
AppendixC:PhenotypeSubmission...........................................................................................................21
AppendixD:DetailedOverviewofQCProcessattheDCC........................................................................24
QualityControlProcessattheDCC........................................................................................................24
InitialDataReview.................................................................................................................................24
AncestryInference,Clustering,andOutlierdetection...........................................................................24
SampleMetricOutlierDetection...........................................................................................................24
PedigreeReconstruction........................................................................................................................24
QCReport...............................................................................................................................................24
3
LastupdatedNovember8,2016
ExecutiveSummaryTheAMPT2DKnowledgePortalisawebbasedportalinplaceforthetype2diabetesscientificcommunitythatistransformingthewayresearchcommunitiesshareandvisualizegeneticdataandfacilitatingnewdiseasediscoveries.Inordertoenablescientiststoutilizethesenewtoolsontheirdatasetsandincreasethepowerofthedataontheknowledgeportal,theAMPT2DDataCoordinatingCenter(DCC)isbringinginnewdatasetsfordepositionintotheAMPT2DKnowledgePortal.AlldatasetssubmittedtotheDCChavetheresultsfromtheanalysisperformedattheDCCuploadedtotheknowledgeportalthatcanbeviewedbytheknowledgeportalusers.Individualleveldatawillnotbesharedontheknowledgeportal.Thisdocumentisaguideforstudiesthatareinterestedindepositingtheirarray,wholeexomesequencing,orwholegenomesequencingdataintotheportal.Otherdatatypeswillbeacceptedinthefuture.PleaseseeFigure1belowforabriefoverviewofthesubmissionprocess.Notethattheassociationanalysiswillbedoneonlyondatasetswithindividualdata.
SubmittingyourdatatotheDCCwillbeaninteractiveprocessbetweenyouranalyst/PIandouranalysisteam.ThedataintaketeamattheDCCwillbereviewingtheQCandassociationanalyseswiththesubmitterbeforeanydataisuploadedontotheportalandworkingwiththedatasubmittertoresolveanyissuesfoundwiththedata.TheanalysisprocessisintendedtobeiterativeandthedatasubmitterandDCCwilldecidetogethertheorderandtimelinefortheassociationanalysis.
Onceananalysisisreadyforsubmissiontotheknowledgebase,analysiswillgoliveintheportalonthenextrelease.Oncethedataisliveontheportal,ourpublicationpolicycomesintoeffect.ThedatawillinitiallyenterEarlyAccessPeriod1formonths0-3andEarlyAccessPeriod2formonths3-6months.Duringthefirst6monthsontheportaldatawillbeflaggedasEarlyAccessandundertheguidelinesoftheFortLauderdaleprinciples.Afterthedatahasbeenontheportalfor6months,theopenaccessperiodwillstartforthedata.
4
LastupdatedNovember8,2016
Figure1.OverviewofAMPT2DKnowledgePortalDataSubmissionatDCC
5
LastupdatedNovember8,2016
SummaryofMilestonesforDataSubmissiontotheDataCoordinatingCenter1. SignedDTAexecutedbyboththesubmitter’sinstitutionandtheBroadInstitute,servingasDCC.2. SubmitterhaspreparednecessaryfilesfortransfertotheDCC.3. Geneticdatauploadedtosecuretransfersite.(Individualleveldataonly)4. Initialphenotypedatauploadedtosecuretransfersite.(Individualleveldataonly)5. Precomputedanalysesuploadtosecuretransfersite.(Requiredforsummarystatistics
submissionandstronglyrecommendedforindividualleveldatasubmission)6. Projectinfosharedwithsubmitterbeforebeingloadedtotheportal.7. SubmitterandDCCapproveprojectdescription.8. Resultsofcompliancecheckandanalysisthatwillbeshownonportalsharedwithsubmitter.
(Summarystatisticsonly)9. SubmitterandDCCapprovedepositionofsummarylevelanalysesontheportal.(Summary
statisticsonly)10. Analysisresultssharedwithsubmitter.(Individualleveldataonly)11. Projectinformationthatwillbeloadedontheportalsharedwithsubmitter.12. SubmitterandDCCapproveQC’eddata.(Individualleveldataonly)13. SubmitterandDCCapproveassociationanalysis.(Individualleveldataonly)14. SubmitterandDCCapproveprojectinformation.15. Analysisgoesliveonportal.
ContactingtheDCCTogetstartedonthisprocess,pleasereachouttotheDataCoordinatingCenterattheBroadInstitutebyemailingushere:amp-dcc-data-submission@broadinstitute.com.Pleasetellusaboutthedatasetyou’dliketosubmitandanyconcernsyouhaveaboutdepositingyourdata.Amemberofthedataintaketeamwillreplywithadditionalinformationandguideyouthroughthesubmissionprocess.
IntroductionWelcometotheAMPT2DSubmissionGuideline!BringinginnewdatatotheknowledgeportalincreasesthevalueoftheAMPT2DKnowledgePortalforthetype2diabetesresearchcommunityandallowsthesubmittertoseetheirdatainthecontextofhundredsofthousandstype2diabetesandcontrolsamplesgatheredfromaroundtheworld.Ifyouhaven’tyet,pleasecheckouttheportalhere:http://www.type2diabetesgenetics.org/.Allyouneedtogetstartedisagooglelogin.
ThisdocumentoutlinestheprocessofsubmittingdatatotheAMPT2DDataCoordinatingCenter(DCC)attheBroadInstituteinCambridge,MA,USAandwillserveasaguidetosubmittersthroughouttheprocess.TheprocessmappedoutbelowbeginswithgettingyourDataTransferAgreementsignedandendswiththedepositionofyourdataintheportal.Itreviewstheprocess,roles,andresponsibilitiesoftheDCCandthedatasubmitter.Inadditiontoreviewingtheinformationbelow,weencourageyouto
6
LastupdatedNovember8,2016
reachouttoyourprojectmanagerwithanyissuesorquestionsyouencounterduringyoursubmissionprocess.Eachprocesshasdefinedmilestonesthathighlightsignificantprogressingettingthedatareadyfordeposition.
Ifyouhaven’tstartedtheprocessyetandareinterestedindepositingyourdataintotheAMPT2Dknowledgeportal,pleasecontacttheDCCattheBroadInstitutebyemailingushere:amp-dcc-data-submission@broadinstitute.com.IfyouareunabletosendyourdatatotheUSAforanyreason,youhavetheoptionofsubmittingyourdatathroughafederatednode.PleasecontacttheDCCformoreinformation.
AMPandtheAMPT2DKnowledgePortalOverviewTheAcceleratingMedicinesPartnership(AMP)effortisapublic-privatepartnershipbetweentheNationalInstitutesofHealth(NIH),10pharmaceuticalcompaniesandmultiplenon-profitorganizationsthatjoinedtogethertotransformthewayresearchersidentifyandvalidatetherapeutictargetsforseveraldiseases,includingtype2diabetes.ToreadmoreabouttheAMPinitiativeandtoseewho’sinvolved,pleasevisit:https://www.nih.gov/research-training/accelerating-medicines-partnership-amp/type-2-diabetes
TheAMPtype2diabetes(AMPT2D)consortiumisacollaborationofanumberofAMPfundedinvestigatorsfromaroundtheworld,includingtheBroadInstitute,UniversityofOxford,andUniversityofMichigan.ThegoaloftheAMPT2Dconsortiumistocreateaknowledgeportalusinggeneticandphenotypicdatageneratedfromtype2diabeticsandcontrolsacrossmultiplepopulationsinordertobringforthdiscoveriesinthegeneticarchitectureoftype2diabetesandtofacilitatethedevelopmentofnewtherapeutictargetsfortreatingthisdisease.Usingthegeneticdatacollectedfromresearchersaroundtheworldinaninteractivewebportalenvironment,researchersareabletoaskquestionsfromthedataandseesummarylevelresults.Youcanalsosearchforyourgene,variant,orregionofinterestandseeifanyoftheAMPT2Dknowledgeportaldatasetshaveassociationfortype2diabetesorrelatedtraits.
TheAMPT2Dknowledgeportalwillbecontinuingtoworktowardsimprovingthevalueoftheportalforthetype2diabetescommunity.Tothisend,theAMPT2Dconsortiumwillbeworkingtoaddnewdatasetstotheportalandimprovethewebbasedtoolsusedforanalysiswithintheportal.Wewillbeupdatingthecommunityonourprogressthroughtheuseofourmailinglistandtwitterfeedbysigninguponthehomepageoftheportal:http://www.type2diabetesgenetics.org/home/portalHome.
TypesofDataRequestedfortheAMPT2DKnowledgePortalTheDataCoordinatingCenteriscurrentlyabletoacceptarraydata,wholeexomesequencingdata,andwholegenomesequencingdatathatisabletobetransferredtotheUnitedStates.Wearebuildingthecapacitytoacceptotherdatatypes,suchasgeneexpression,metabolomics,andepigeneticdata.
7
LastupdatedNovember8,2016
OverviewofDataAggregationandAnalysisProcessAsasubmittertotheknowledgeportal,weknowit’simportantforyoutounderstandhowyourdatawillbehandledonceitisattheDCC.Onlyanalyticalresults,andnotindividualleveldata,willbeaccessiblethroughtheportal.Weanticipatethatmultipleversionsofresults,ofincreasingdetailandharmonizationwithotherdatasets,willbereleasedtotheportalintime.
Eachcohort/projectbeingsubmittedtotheknowledgeportalwillhavetheappropriateanalyticalresultsidentifiedandharmonizedwithexistinganalysesintheportalthroughacollaborativeprocessbetweenananalystattheDCCandananalystatyourinstitution.Theanalyticalresultsthatareprioritizedwillbedependentonthephenotypedataavailable,thevaluetheanalysisaddstotheknowledgeportal,andanyspecialrequestsmadebythedatasubmitter.
Theanalysiswillbereleasedtotheportalin3stages:EarlyAccessPhase1,EarlyAcessPhase2,andOpenAccessPeriod.EarlyAccessPhase1getsthedataanalysisuploadedandavailableontheportalwithlimitedQC.ThesubsequentrevisionsoftheresultswilloccurinEarlyAccessPhase2,whichwillaimto(a)addressanyinconsistenciesidentifiedbytheinitialharmonizationprocess(b)applymoreuniformQCacrossalldatasetsintheportal(c)computeadditionalstatisticsdesiredintheportalbutnotavailableintheinitialversionand(d)enableon-demandinteractiveanalysesofyourdata.Fortheserevisionswewillrequiretheoriginalgenotypeandphenotypedata.Additionally,wewillalsorequiredatainasunprocessedaformataspossible,inordertofacilitateharmonizationandqualitycontrol.OncethedataisQC’edandcomplete,theOpenAccessPeriodwillbeginforyourdata.WeexpectthetimingbetweenthestartofEarlyAccessPhase1tothebeginningoftheOpenAccessPeriodtolast6months.
PoliciesandDataUseWearecommittedtoensuringthatcollaboratorssubmittinggeneticdatatotheAMPT2DknowledgeportalunderstandhowthedatawillbeusedaftertransfertotheDCCatTheBroadInstitute.BysendingyourdatatotheBroadInstituteforuploadintotheknowledgeportalthedatasubmitterandDCCareagreeingtothefollowing:
1. Throughoutthisprocess,theBroadiscommittedtoprotectingyourdata,bothintransitandwhilethedataisinourservers.
2. Wewillonlybeabletoreceivede-identifiedleveldatathatisabletobetransferredandstoredattheDCC.WewillhaveoptionsavailableforthosewhocannotsubmitdatatotheUnitedStates.
3. IndividualdatawillbestoredinoursecureserversandonlyaccessedforQCandanalysispurposesrelatedtotheAMPT2Dknowledgeportal.
4. Individualdatawillneverbeposteddirectlytotheportal.Onlysummarylevelmetricsareavailabletoportalusers.
5. Summarylevelanalysisofthesubmitteddatawillbepostedtotheknowledgeportalandavailabletousers.Thisincludesp-values,oddsratio,minorallelefrequency,effect,directionof
8
LastupdatedNovember8,2016
effect,allelefrequenciesacrossethnicities,andotheranalysesthataredeemedappropriatebytheAMPT2DKnowledgePortalteamandAMPT2Dconsortium.
6. Usersoftheportalwillbeabletocreatecustomqueriesandviewsummarylevelresultsforthosequeries.Thiswillincludedisplayingresultsforspecificprojects/cohorts.
7. TheBroadwillQCandanalyzeyourdataforT2Dandrelatedtraitsinpartnershipwiththesubmitter.Thisisacollaborativeprocesssothesubmitterwillgettoviewtheanalysisbeforeitisuploadedtotheportal.
8. TheBroadmaybesendinggenotypedatasubmittedtotheportaltotheMichiganImputationServerforimputation.ThisisafreeservicehostedbytheUniversityofMichiganandallowsustousetheHaplotypeReferenceConsortiumpanelforimputation.TheUniversityofMichiganisakeymemberoftheAMPT2DconsortiumthatisfundedbytheNIHtodeveloptheAMPT2Dknowledgeportal.Foradditionalinformationontheimputationserver,pleasevisit:https://imputationserver.sph.umich.edu/index.html.
ThepoliciesrelatedtothedataintheAMPT2Dknowledgeportal,includingdatausefortheknowledgeportalusers,canbefoundhere:http://www.type2diabetesgenetics.org/informational/policies#.
SubmittingDatathatCannotEntertheUnitedStatesOurAMPT2DfundedcollaboratorsattheUniversityofOxfordarecurrentlybuildingacapabilitytoingestdata,QC,andharmonizedataatEBI.Ifdatacan’tleaveEuropeorentertheUnitedStatesyoucanstillsubmityourdatatotheknowledgeportalthroughthismethod.EBIwillperformthesamefunctionsastheDCCandwillworkwithyoutosubmityourdatatotheknowledgeportal.
DataTransferAgreementBeforewebegintransferringdataweneedasignedandexecutedDataTransferApproval(DTA).YouwillreceivetheDTAviaemailfromtheDCCprojectmanageroryoucanfinditontheknowledgeportalhere:http://www.type2diabetesgenetics.org/informational/policies.Thisdocumentshouldbereviewedbyyourinstitution’slegalcounselbeforesigningandanyeditsmadewillneedtobesignedoffbythelegalcounselattheBroad.ThedocumentoutlinesthatasadatacontributortotheAMPT2DPortal,youagreetotransferyourdatatotheDCC(BroadInstitute)andyouhavetheapprovaltodoso.Althoughnotcoveredinthisdocument,asimilarDTAwillbenecessarytotransferdatatoaFederatednodeincaseswherethedatacannotentertheUnitedStates.
Milestone:
1. SignedDTAexecutedbyboththesubmitter’sinstitutionandtheBroadInstitute,servingasDCC.
PreparingforDataSubmissiontothePortalWhileweworktowardsgettingaDTAinplace,thedatasubmittercanbegintheprocessofpreparingtheirfilesfordatasubmissiontotheDCC.Theinformationbelowoutlinestheinformationweneedto
9
LastupdatedNovember8,2016
getyourdatauploadedtotheportal.Forasummarytableoftheinformationneeded,pleaseseeTable1below.
IfyourdataisunabletoleaveyoursiteorcometotheBroadInstitute,locatedinCambridge,MA,USA,thendepositingyourdatainaFederatedNodewillallowyoutostillcontributeyourdatatotheknowledgeportal.PleasecontacttheDCCformoreinformation.
RequiredandRequestedFilesBelowareguidelinesforthetypesanddesiredformatsofdatasetstransferredtotheDCC.Asageneralrule,weencourageyoutosubmitasmuchdataandasmanyresultsaspossible,andtoannotateyourfileswithasmuchinformationasisfeasible.ThisinformationwillbeextremelyhelpfulasouranalystsstarttheQCandanalysisprocessonyourdata.Pleasenotethatweunderstandthatdifferentsiteswillhavedifferentdatatypesanddifferentabilitiestotransformamongdataformats,andwearethushappytoworkwithyoutofacilitatethisprocessonacase-by-casebasis.
1. AMPDCCDataIntakeForm.ThisdataisrequiredinordertosubmityourdatatotheDCC.Theformwillbesentviaemailandpleasecontactyourprojectmanageroramp-dcc-data-submission@broadinstitute.comifyouhavenotreceivedit.Foradditionaldetailsonthetypeofinformationneeded,pleaserefertoAppendixB.
2. Analysisresultfiles.Thesefilesareoptional,butanyanalyticalresultsthatyoutransferwillhelpusexpediteandverifyouranalysis.Anynumberoffilescanbeprovided.Foreachfile,thefollowingisrequired
• Atab-orcomma-delimitedfile,withaheaderrowfollowedbyonerowforeveryvariantintheresultsfile.Theheaderrowcanhaveasmanycolumnsaspossible.Mandatorycolumnsincludethechromosome,position,effectallele(withrespecttowhichanyphenotypiceffectismeasured),andnon-effectallele.Allallelesshouldbealignedtotheforwardstrandofthegenome,theversionofwhichshouldbespecifiedintheannotationdata(seebelow).Additionaldesiredcolumnsincludeminorallelefrequency,p-valueofassociationwithoneormoretraits,estimatedoddsratiooreffectsize,case/controlcounts,andnumberofanalyzedsamples.Ifmultiplestatisticsareavailableacrossmultipletraits(e.g.T2Dvs.glucose)oracrossmultiplesamplegrouping(e.g.allsamplesvs.onlysamplesofagivenancestry)theycanbeincludedinasinglefileorsplitacrossmultiplefiles.Thesetofvariantsneednotbeidenticalacrossdifferentresultfiles.
• Annotationdatadescribingthemeaningofeachcolumnarerequired.Theseshouldbehumanreadable.Theannotationscanbeembeddedintheresultsfileorprovidedasaseparatedocument.
10
LastupdatedNovember8,2016
3. PrimaryGenotypeDataFileTypes.Inordertoensurethecontinueduseofyourdataintheportalasdemandforadditionalstatisticsandanalysesgrows,werequestthefollowingfilesencodingthegenotypesofeachsample.Thesegenotypefileswillbeusedtocomputestatisticsthatareunavailablefromtheanalysisfiles,whichwillbeaddedtotheportalinsubsequentdataversions.
• GenotypefilesinVCForPLINKformatarerequired.Wewillaccepteitherformat,providedthatstrandinformationisclearlyannotated.NotethatVCFfileshaveacleardistinctionbetweenreferenceandalternatealleles,whileallelescanbeflippedbysomeplinkanalyses.
a. TheVCFfileformatisavailableatXXX.
b. InformationaboutthePLINKfileformatisavailableatXXX.Werecommendtransferringbed/bim/famfiles,whichcanbecreatedbyPLINK.
• ListsofQC+samplesandvariantsthatwereadvancedtoyourfinalanalysisareoptional.Providingthesewillensurethatwecanrecomputestatisticsconcordantwiththosethatyouproducedinyouranalysis.Ifyoudonotprovidethem,wewillperformourownQCwhichwilllikelybesimilar(butnotidentical)toyours.
• Documentationofyouroriginalanalysisplanisalsooptional.Anyhumanreadabledocumentdescribingthemotivationsofyouranalysis,thestatisticalmethodsemployed,andanyparametersettingswillalsohelpustoreplicateyouranalysis.Amethodssectionofapaper,ifsufficientlydetailed,willalsosuffice.
4. IntensityfilesforSNParraydata.Ultimately,itmaybenecessaryforustohaveaccesstotherawdatausedtocallgenotypes.Thiswillassistwithqualitycontrol(forexample,examiningevidencethatararevarianthasaccurategenotypes),aswellasharmonization(forexample,ensuringthatallvariantsarecalledusingsimilarprocedures).Thus,althoughnotessentialforthefirstversionofyourdatatoappearontheportal,thefollowingfilesarerequiredtocompletethedatatransferprocessinitsentirety.
• Rawintensityfiles(idatortheequivalent).ForSNParraygenotypingdata,anyfileformatthatlistsnormalizedX/Yintensityvaluesforeachsampleisacceptable.WhensubmittingIDAT,pleaseremembertosendbothoftheintensityfilesforeachsample.
a. ExamplefileformatsacceptedbytheSangerforasimilarprojectareatXXX
b. AguideforthefileformatsusedbyzCall(aclusteringalgorithmforexomechip)isavailableatXXX.
• Clusterandmanifestfilestoaccompanytherawintensityfiles.ForIlluminaIDATfiles,thesetwofilesarerequiredforthenecessarydownstreamanalysis.Theyshouldbeavailablefromandfamiliartotheplatformthatproducedyouroriginalgenotypecalls.Themanifestfile
11
LastupdatedNovember8,2016
describesthesamplesthatweregenotyped;theclusterfilerecordsanyinformationthatwasusedtobetterclustertheintensitiesforeachSNP.
5. Readfilesforsequencingdata.SimilartointensityfilesforSNParraydata,readfilesarerequiredforsequencingdata.Wewillusethesetorun“jointvariantcalling”acrossallsamplesattheDCC,formaximumsensitivityandaccuracyofvariantcalls.Sincevariantcallsetsfromsequencedataincludenovelvariantsandalleles,re-processingrawdataisevenmoreimportantinsequencingexperimentsthanSNParrayexperiments;theExACpaper(availableatXXX)outlinessomeoftherationaleforthis.
• BAMorCRAMfilesforeachsamplearerequiredforsequencingexperiments.Thesefilesarethestandardformatforstoringreaddataandshouldbeproducedbyyoursequencingplatform.Wewouldpreferraw,unalignedBAMfiles.
a. InformationontheBAMfileformatisavailableatXXX
b. BAMfilescanbecreatedfromFASTQfiles,asdescribedatXXX.
6. PhenotypeData.Thisisrequiredalongsidesubmissionofgenotypeand/orsequencingdata.Theofficialdocumentwithfullinstructionswillbeemailedtoyou.Foranideaofthevariablesrequested,pleaseseeAppendixC.Ifyouhaveaspecificvariablenotinthislist,butrelativetotype2diabetesorrelatedconditionsletusknowandwecanincludeitforyoursubmission.
Forasummaryviewofwhatisneededforyourdatasubmission,pleaseseetable1below.
Table1.SummaryoffilesacceptedfordatasubmissionintotheAMPT2DPortal
FileType GenotypingSubmission SequencingSubmissionAMPDCCDataIntakeForm Required RequiredAnalysisResults Optional OptionalAnnotationData Optional OptionalGenotypeFiles(VCForPLINK) Required Required(VCF)ListofQC+samplesandvariants Optional OptionalAnalysisPlanDocumentation Optional OptionalRawIntensityFiles Required N/AClusterFile Required N/AManifestFile Required N/ASequencingReadFiles(BAMorCRAM) N/A RequiredPhenotypeData Required Required
Milestone:
12
LastupdatedNovember8,2016
1. SubmitterhaspreparednecessaryfilesfortransfertotheDCC.
OverviewoftheDataIntake,Analysis,andDepositionProcessWhenyouarereadytostartsubmittingfilestotheDCCfordepositionintotheAMPT2Dportal,emailamp_dcc_data_submission@broadinstitute.organdwewillsetupasecuretransferportalforyoutouploadyourfiles.WewillbeusinganASPERAsite,whichwillcomewithdetailedinstructionsonhowtouploadthefiles.OncetheASPERAsiteiscreated,wehave30daysbeforethesiteexpirestouploaddata.IfitbecomesnecessarytoextendthattimelinepleaseletusknowsowecanextendthelifeoftheASPERAsite.
Thedatatransferprocessitoutlinedbystepbelow.Forafullpictureofdataintaketodeposition,pleaseseeAppendixA.
DataTransferThedatatransferprocessstartsoncethesubmitterandDCCattheBroadInstitutehaveallnecessarydocumentationinplaceandarereadytobeginphysicallytransferringthedatatotheDCC.Duringthesesteps,ifindividualleveldataisbeingprovided,thedatasubmitterwilltransferthephenotypicandgeneticdatatotheportal.Thisincludestherawdataandanyavailableprecomputedanalyses.ForsiteswherewearereceivingsummarystatisticsonlyweaskfortheprecomputedanalysestobesenttotheDCC.PleaseseeFigure2belowforanoverviewofthedataintakeprocessattheDCC.
Regardlessofwhichtypeofsubmissionbeingsent,weaskthateachsitecompletesadataintakeform,asnotedabove.ThepurposeoftheformistoinformtheDCCofthedatabeingsubmittedandtohelpuscreateaproject/cohortdescriptionforthisdataontheportal.
Figure2.DataTransferProcessatDCC
13
LastupdatedNovember8,2016
Milestones:
1. Geneticdatauploadedtosecuretransfersite.(Individualleveldataonly)2. Initialphenotypedatauploadedtosecuretransfersite.(Individualleveldataonly)3. Precomputedanalysesuploadtosecuretransfersite.(Requiredforsummarystatistics
submissionandstronglyrecommendedforindividualleveldatasubmission)
DescriptionofProject/CohortEachprojectandcohortwithdataincludedintheAMPT2Dportalwillhaveadescriptionoftheprojectand/orcohortthatissubmittingdata.ThisdescriptionwillbecreatedbytheContentManagerattheDCCusingtheprojectinformationprovidedbythesubmitterontheDataIntakeForm.Duringthisprocess,thesubmitterwillhavetheopportunitytoprovidefeedbackonthedescriptionoftheirstudy.
Figure3.DescriptionofProjectandCohortInformationSubmissionProcessatDCC
Milestones:
1. Projectinfosharedwithsubmitterbeforebeingloadedtotheportal.2. SubmitterandDCCapproveprojectdescription.
SummaryStatisticsOnlyIfyouarenotabletosharerawdatawiththeportalforsomereason,theportalcanacceptsummarylevelstatisticsthatcanbepostedtotheportal.Inthisinstance,theDCCwouldtakethesummarylevelinformationthatyouhavegeneratedthensecurelystorethedataandperformadatacompliancecheck.Oncethecompliancecheckhasbeencompleted,theDCCwillshareresultswiththesubmitterandconfirmthatwecanproceedwithdepositingthedatatotheportal.
14
LastupdatedNovember8,2016
Figure4.SummaryStatisticsOnlyDataSubmissionProcessatDCC
Milestones:
1. Resultsofcompliancecheckandanalysisthatwillbeshownonportalsharedwithsubmitter.
2. SubmitterandDCCapprovedepositionofsummarylevelanalysesontheportal.
DataQCandAnalysisatDCCDatasetswithindividualleveldatabeingsubmittedtotheDCCwillundergosecuredatastorage,QC,andassociationanalysisattheDCC.DuringthisprocesstheDCCwillworkwiththedatasubmittertocreateananalysisplanthatwillbeusedtodrivethefutureanalysesandcreatedatasetswithintheproject/cohortthatwillbedepositedintotheportal.Adatasetinthiscontextreferstoaspecificsetofsamplespairedwithspecificphenotype(s).WeexpecteachdatasubmissiontocontainanumberofdatasetsandwewillworkwiththedatasubmitterstoprioritizethedatasetsforsubmissiontotheAMPT2Dportal.OncetheanalysisiscompletedtheDCCwillreachouttothesubmitterandreviewtheresultsoftheanalysis.ResultswillnotbeuploadedtotheportaluntilboththedatasubmitterandDCCaffirmsthatthedataisreadytoshare.
Figure5.DataQCandAnalysisProcessforincomingdatatotheDCC
TheDCChascompiledalistofstandardsinglevariantassociationanalysesthatwillbeusedasaguideforcreatingtheanalysisplanwiththesubmitter.Eachanalysisplanwillbeuniquetoeachsite,dependingonthephenotypevariablesthatareavailableandthevalueeachanalysisaddstotheportal.
Milestones:
15
LastupdatedNovember8,2016
1. Analysisresultssharedwithsubmitter.2. Projectinformationthatwillbeloadedontheportalsharedwithsubmitter.3. SubmitterandDCCapproveQC’eddata.4. SubmitterandDCCapproveassociationanalysis.5. SubmitterandDCCapproveprojectinformation.
QCProcessattheDCCTheQCprocessattheDCCisvitaltoharmonizingthedatabeingaddedtotheAMPT2DKnowledgePortal.ThegoalofourQCistoidentifyartifacts,ensurestatisticscanbecomputedconsistently,andhelpstheDCCunderstandthedatabeingsubmitted.ThisprocessisundertakenonindividualleveldatathatisprovidedtotheDCCbyananalystwhoisworkingwithalldatabeingloadedintotheknowledgeportalandperformstheQCusinganautomatedandconsistentprocess.
TheanalystattheDCCwillbecomputingmetricsadjustedforancestryandotherconfoundersandthenexcludeoutliersamples,whicharepotentiallyartifacts.TheQCcompletedfordatadestinedfortheknowledgeportaltendstobeconservatives,sinceweareaimingtoensurehighqualitydataforusers.OncethisQChascompleted,wewillprovideareporttosharewiththedatasubmitters.AnexampleQCreportcanbefoundintheAMPT2DKnowledgePortal:CAMPQCReport.ForfulldetailsontheQCprocesspleaseseeAppendixD.
AssociationAnalysisProcessattheDCCAssociationanalysisattheDCCisaninteractiveprocessbetweentheanalystattheDCCandtheanalystatthesubmittingsite.TheinitialanalysisperformedwillconsistofasetnumberoftraitsdecideduponbytheDCCandthedatasubmitter.Asaguideforoursubmitters,theDCCrecommendsfocusingonsomeinitialtraitsofrelevancetotype2diabetesfortheinitialanalysisdoneattheDCC.PleaseseeTable2foralistofrecommendedtraits.
Table2.StandardT2DtraitsforpossibleDCCassociationanalysis
Categoriesoftraits ExamplerelatedphenotypevariablesType2Diabetesstatus T2Dstatus,T2DageofdiagnosisCardiometabolic Systolicbloodpressure,Diastolicbloodpressure,HypertensionstatusLipids HDLcholesterol,LDLcholesterol,Triglycerides,TotalcholesterolGlycemic Insulin,glucose(2hr,fasting,and/orrandom)HbA1C,Anthropomorphic BMI,age,weight,waisthipratioKidneyFunction Creatinine,Urinaryalbumin
OnceaninitialanalysishasbeencompletedattheDCC,wewillsendthedatasubmitterananalysisreportfortheirreviewandcomments.AnexamplereportcanbefoundontheAMPT2DKnowledgePortal:CAMPAnalysisReport.
16
LastupdatedNovember8,2016
DataDepositionandReleaseOncetheanalysesandtheproject/cohortdescriptionhavebeenreviewedandapprovedbyboththedatasubmitterandDCC,thedatawillbedepositedontotheknowledgeportalbasebeforegoingliveontheportal.
Figure6.DataDepositionProcessatDCC
Thedatawillgolivewiththenextquarterlyportalrelease,occurringinFebruary,May,August,andNovember.
Milestone:
1. Analysisgoesliveonportal.
PublicationPolicyOnceyourdataisliveontheknowledgeportal,submittersareprotectedbya6monthearlyaccessperiodthatissubjecttotheguidelinesoftheFt.Lauderdaleprinciples.This6monthperiodisbrokendownintoa3monthEarlyAccessPhase1,wheredataisliveontheportalwithlimitedQCanda3monthEarlyAccessPhase2,wherethedataisfullyintegratedintotheportal.AlldataineitheroftheEarlyAccessPeriodswillbeflaggedtoknowledgeportaluserswhoareviewingthedata.PleaseseeFigure7forthescheduleddatareleases.
17
LastupdatedNovember8,2016
FormoreinformationontheFt.Lauderdalepolicies,pleasevisit:https://www.genome.gov/pages/research/wellcomereport0303.pdf.
Submitteddatawillbemadeliveontheknowledgeportaloveranumberofdatasetfreezes.Sincewewillberunningassociationanalysisovertimeandaddingtotheportal,eachdatasetwillbedefinedasasetofgeneticdataassociatedwithspecifictraits.Anyanalysisadditionaltraits,samples,ordatawillbeconsideredanewdatasetandwillstartagainintheEarlyAccessPeriod1.Forexample,iffortheinitialanalysisthedatasubmitterandDCCchosetorunanassociationanalysison3,000Exomechipsamplesusingtype2diabetesstatusthatwouldequaltoonedatasetandwouldstarttheEarlyAccessPeriodonthenextscheduledreleaseoftheportal.Ifthesame3,000exomechipsampleswerethenanalyzedlaterforBMI,fastingglucose,andfastinginsulinthatwouldcreateanewdatasetthatwouldstartintheEarlyAccessPeriod,evenifthesamesamplesanalyzedfortype2diabeteshavetheanalysisintheopenaccessperiod.
Figure7.ScheduledAMPT2DPortalReleasesforDataSubmission
18
LastupdatedNovember8,2016
AppendixA:AMPDCCDataIntaketoDataDepositinAMPT2DKnowledgePortalFigure8isacompleteflowchartoutliningtheDCC’sdataintakeprocess,startingatthepointwherethedataislegallyandphysicallypreparedtobetransferredbythesubmittertotheDCCandendingwiththedatabeingliveintheportal.Thisdocumentcontains5subsectionsofworkthatisgroupedtogethertocreatethelargerprocess.Thesubsectionsarediscussedinmoredetailinthemaindocument.
Figure8.FlowchartofAMPDCCDataIntaketoDataDepositioninAMPT2DKnowledgePortal
19
LastupdatedNovember8,2016
AppendixB:AMPDCCDataIntakeFormForacopyoftheAMPDCCDataIntakeFormtocompletepleasecontactyourprojectmanageroramp-dcc-data-submission@broadinstitute.com.Inordertogiveyouanideaofthetypeofinformationneeded,wehaveincludedascreenshotoftheinformationbelow.Theinformationwillbegivenfortheprojectonthefirsttab(seeFigure9)andbycohort(s)inthefollowingtabs(seeFigure10).
Inthefirsttab,weaskforthegeneralinformationaboutthefiletypesyouaresubmittingalongwithprojectinformation,includingaprojectdescriptionandinformation,anystudyspecificcovariatesthatwereusedduringyouranalysis,specialanalysisrequests,andotherplacesthedatacanbefound(ie.dbGAP,EGA,oraprojectwebsite).
Thefollowingtabsallowthedatasubmittertogiveadditionaldetailsonthecohortsthatmakeuptheproject.Weexpectthatsomeprojectswillhaveonecohort,whileotherswillhaveseveral.
Figure9.AMPDCCDataIntakeFormProjectInformation
Figure10.AMPDCCDataIntakeFormCohortInformation
20
LastupdatedNovember8,2016
21
LastupdatedNovember8,2016
AppendixC:PhenotypeSubmissionTheAMPT2Dconsortiumhasdeterminedanumberoftraitsthatwillbeusefulforunderstandingyourdataandperformingrelevantanalysisthatcanbesharedontheknowledgeportal.Weaskthatyousubmitanyofthesevariablesthatareavailableforyourdataandalsopleaseletusknowifyouhaveauniquevariablethatweshouldbeincluding.Thislistismeantforinformationpurposesonly.PleaseseetheAMPPhenotypeVariableInfosheetemailedtoyouforadditionalinstructionsandinformation.
Table3.AMPT2DKnowledgePortalPhenotypeVariables
Category Variable FormatIDvariables StudyID characterIDvariables SampleIDusedingenotypedataset
(ifdifferent) characterIDvariables dbGaPsampleID(ifexisting) characterIDvariables StudyIDoffather characterIDvariables StudyIDofmother characterDemographics Race characterDemographics Race-opentextdescription characterDemographics Ethnicity characterDemographics
SexPleasecodevaluesas"Male"and
"Female"Demographics Yearofbirth 4-digitinteger
Type2Diabetes(T2D)statusvariables
T2Dstatusbasedonself-report(1=T2D;0=notT2D) integer(1=T2D;0=notT2D)
Type2Diabetes(T2D)statusvariables
T2Dstatusbasedonhistoryofhealthcareproviderdiagnosis
(1=T2D;0=notT2D) integer(1=T2D;0=notT2D)Type2Diabetes(T2D)status
variablesT2Dmedicationstatus(1=Yes;
0=No) integer(1=yes;0=no)Type2Diabetes(T2D)status
variablesT2Dstatusbasedonfastingglucose
level(1=T2D;0=notT2D) integer(1=T2D;0=notT2D)Type2Diabetes(T2D)status
variablesT2DstatusbasedonHbA1c(1=T2D;
0=notT2D) integer(1=T2D;0=notT2D)Type2Diabetes(T2D)status
variablesGlucosetolerancestatusbasedonoralglucosetolerancetest(OGTT) character
Type2Diabetes(T2D)statusvariables
T2Dstatusdefinedinawayotherthanoneoftheapproachesabove
(1=T2D;0=notT2D)-e.g.acombinationoftheabovethatcan't
beseparatedintoindividualvariables integer(1=T2D;0=notT2D)
Type2Diabetes(T2D)statusvariables
T2Dstatuswithunknowndefinition(1=T2D;0=notT2D)-e.g.whereaT2Dstatusvariableisavailablebutthereisnotdocumentationonhow
itwasdefined integer(1=T2D;0=notT2D)Type2Diabetes(T2D)status
variablesT2Dtreatmentwithinsulinor
analogs integer(1=yes;0=no)Type2Diabetes(T2D)status
variablesT2Dtreatmentwithnon-insulin
medication integer(1=yes;0=no)
22
LastupdatedNovember8,2016
Type2Diabetes(T2D)statusvariables
T2Dageofdiagnosis(forthosethatareaffected)(years) nn.nnn
Type2Diabetes(T2D)statusvariables
Timeinterval,inyears,betweendiagnosisofdiabetesandbeginning
oftreatmentwithinsulin integerType2Diabetes(T2D)status
variablesThisisanopentextvariableto
indicatetypesofdiabetesotherthanType2(orunclearifType2).
Examplesinclude:"Type1diabetes","MODY","LADA",
"Gestationaldiabetes","Diabetesknowntobecausedbyother
processessuchascysticfibrosis,hemochromatosisorpancreaticsurgery","Diabetesstatusonlyavailableduringpregnancy". text
Bloodbiomarkers Fastingplasmaglucose(mmol/l) nn.nnBloodbiomarkers Fastinginsulin(mU/l) nnn.nBloodbiomarkers OGTT2-hourfastingglucose
(mmol/l) nn.nnBloodbiomarkers OGTT2-hourfastingInsulin(mU/l) nnn.nBloodbiomarkers Randomglucose(i.e.notfastingor
unknownfasting)(mmol/l) nn.nnBloodbiomarkers FastingC-peptide(nmol/l) nn.nnBloodbiomarkers HbA1c(fraction,%) nnn.nBloodbiomarkers HbA1c(mmol/mol) nn.nnBloodbiomarkers GlutamicAcidDecarboxylase
Autoantibodies(GADAb) integer(1=positive;0=negative)Bloodbiomarkers IsletCellAutoantibodies integer(1=positive;0=negative)Bloodbiomarkers Anti-insulinAutoantibodies integer(1=positive;0=negative)Bloodbiomarkers ZNT8Autoantibodies integer(1=positive;0=negative)Bloodbiomarkers Serumcreatinine(umol/L) nnn.nBloodbiomarkers Adiponectin(ug/ml) nn.nnBloodbiomarkers Leptin(ng/ml) nnn.nBloodbiomarkers Totalcholesterol(mmol/l) nn.nnBloodbiomarkers LDLcholesterol(mmol/l)(if
measureddirectly,missingifnot) nn.nnBloodbiomarkers CalculatedLDLcholesterol(mmol/l)
(usingFriedewaldequation) nn.nnBloodbiomarkers HDLcholesterol(mmol/l) nn.nnBloodbiomarkers Triglycerides(mmol/l) nn.nnBloodbiomarkers Anylipidloweringmedicationstatus
(1=yes,0=no) integer(1=yes;0=no)Bloodbiomarkers Statinmedicationstatus(1=yes,
0=no) integer(1=yes;0=no)Anthropometry Height(centimeters) nnn.nAnthropometry Weight(kg) nnn.nAnthropometry Hipcircumference(centimeters) nnn.nAnthropometry Waistcircumference(centimeters) nnn.nBloodpressureandhypertension Systolicbloodpressure(mmHg) nnn.nBloodpressureandhypertension Diastolicbloodpressure(mmHg) nnn.n
23
LastupdatedNovember8,2016
Bloodpressureandhypertension Hypertensionstatus(1=yes,0=no) integer(1=yes;0=no)Bloodpressureandhypertension Hypertensionmedicationstatus
(1=yes,0=no) integer(1=yes;0=no)Urinemeasures Urinarycreatinine(mg/dL) nn.nnUrinemeasures Urinaryalbumin(mg/dL) nn.nnUrinemeasures Urinaryalbumintocreatinineratio
(mg/g) nn.nnSmokingstatus Currentsmokingstatus(1=yes,
0=no) integer(1=yes;0=no)Smokingstatus Eversmokingstatus(1=yes,0=no) integer(1=yes;0=no)
Reproductiveandexogenoushormoneuse Menopausalstatus character
Reproductiveandexogenoushormoneuse
Currentuseofanyfemalehormones(1=yes,0=no) integer(1=yes;0=no)
Reproductiveandexogenoushormoneuse
Currentuseof,specifically,peri-orpost-menopausalhormoneuse(i.e.
notincludingcontraceptives)(1=yes,0=no) integer(1=yes;0=no)
24
LastupdatedNovember8,2016
AppendixD:DetailedOverviewofQCProcessattheDCC
QualityControlProcessattheDCCAlldatasubmittedtotheDCCwillbeprocessedthroughcomprehensivesampleandvariantqualitycontrolalgorithmstopromoteharmonizationwithexistingdataontheportal.Sincegenotypedataislikelytoexhibituniquepatternsofancestryandclassesofvariants,wehavedevelopedalgorithmsfordetectingmajorlinesofancestryandforidentifyingoutliersamongvarioussamplemetrics.SampleQCwillbeperformedusingbi-allelicvariantsonly.
InitialDataReviewInitially,whentheDCCreceivesyourdata,itwillbecheckedforduplicatesandanycrypticrelatednessthatmayresultfromcontaminationordatacollectionerrors.Duplicatesandcrypticrelatednesswillbeidentifiedusingacombinationofpairwiseidentitybydescentandarobustalgorithmforcalculatingpairwisekinshipinthepresenceofpopulationstratification.Shouldanyconcernsarise,thesubmittermaybecontactedinordertoinvestigatepossiblecausesandissuesthatmightbecorrectedpriortocontinuingwithsampleQC.
AncestryInference,Clustering,andOutlierdetectionAfteranagreementtoproceedwithQC,wewillinfermajorlinesofancestry.Ourapproachconsistsofprojectingyourdataontoprincipalcomponentsderivedfromacollectionofcommonancestryinformativevariantsin1000GenomesProjectdata.ThePCsarethenusedasfeaturesinaGaussianMixtureModelingalgorithmtoclusterthemaccordingtotheirancestry.Anysamplesthatcannotbeincludedinanyofthesubsetsduetotheiruniqueancestryorbadgenotyping,areflaggedasoutliers.
SampleMetricOutlierDetectionDuringclustering,metricsforeachsamplewillbecalculated.Whichmetricsarecalculatedwillvarydependingonthetypeofdatareceived.Someofthemorerecognizablemetricsaretransition/transversionrate,callrate,andthenumberofsingletonscalled.Foreachsamplemetric,wewillcalculatetheresidualsresultingfromregressingthemetriconprincipalcomponentsofancestry.Thenwewillcalculateprincipalcomponentsonthoseadjustedmetrics.GaussianMixtureModelingisemployedagainatthisstage,bothontheprincipalcomponentsoftheadjustedmetrics,andoneachoftheindividualadjustedmetrics.Anysamplesthatdonotclusterusingthesetwoapproacheswillbeflaggedasoutliers.
PedigreeReconstructionIfyourdataisfoundtohavepairsofrelatedsamples,pedigreereconstructionwillbeperformed.
QCReportUponcompletionofsampleQC,areportwillbeprovidedtothesubmittertofacilitatethecreationofasuitableanalysisplan.
25
LastupdatedNovember8,2016
Figure11.AMPT2DQualityControlProcess
Top Related