2014 agbt giab_progress update

1
In 2012, NIST convened the Genome in a Bottle Consortium to develop the metrology infrastructure needed to enable confidence in human whole genome variant calls. Consortium products will include: Well-characterized whole genome and synthetic DNA Reference Materials (RMs) Reference data associated with the RMs Reference methods (Comparison tools, documentary standards) These Genome in a Bottle products will enable translation of whole genome sequencing to clinical applications. Expected use cases of these products include: Enable regulated applications Validation, QC, proficiency testing Identify and quantify sources of bias & variability Optimize measurement technologies Resolve structural variants Improve reference assembly Integrate data from multiple platforms New participants are welcome to join: www.genomeinabottle.org Overview Reference Material Selection and Design Group Personal Genome Project samples – consent for commercialization Ashkenazi Jewish trio East Asian trio Looking for additional large family Supporting interlaboratory analysis of potential commercial reference materials, new participants welcome COLO-829/COLO 829BL cancer/normal cell line Artificial structures as spike-ins for point mutations or more complex structural variants FFPE samples based on RM cell lines www.personalgenomes.org Genome in a Bottle Consortium: Update on a Public-Private-Academic Consortium Developing a Standards Infrastructure for Human Genome Sequencing Marc Salit 1,2 , Sarah A. Munro 1,2 , Justin Zook 1 , Genome in a Bottle Consortium (1)National Institute of Standards and Technology, Gaithersburg, MD 20899; (2) National Institute of Standards and Technology Advances in Biomedical Measurement Sciences Program at Stanford University, Stanford, CA 94035 Measurements for Reference Material Characterization Group Bioinformatics, Data Integration, and Data Representation Group Performance Metrics Group Milestones Worked with Coriell to develop ~8300 vials of pilot RM from NA12878 cell line; samples received by NIST April 2013 NCBI-team led development of Genome in a Bottle FTP site containing curated data associated with RMs (ftp://ftp-trace.ncbi.nih.gov/giab/ftp) NIST-led team prepared Data Integration Manuscript; preprint available on arXiv (http://arxiv.org/abs/1307.4661). in press in Nature Biotechnology. Selected next 2 families for genome RMs from PGP collection; DNA samples at NIST February 2014 Decided on governance policies including formation of a steering committee and a data release policy based on Fort Lauderdale Principles, August 2013 Pilot RM release planned for May 2014; simultaneous release of pilot RM genotype calls Initiated experiments to characterize pilot RM (NA12878) 6 institutions have signed Material Transfer Agreements with NIST for pilot RM Other institutions are welcome to contact NIST to help with sequencing PGP trio RMs Cell lines are also available from Coriell Integration of software from partners: GCAT (Genome Comparison & Analytic Testing) tool enables mapping and variant call comparisons in exome, NIST analysis results publicly available here NCBI/CDC GeT-RM browser supports visualization of variant calls; includes NIST highly confident genotypes track RTG tool vcfeval for complex Mutation of Interest Alien Barcode Point Mutation Control Plasmids from M. Williams et al. Frederick National Laboratory for Cancer Research Vial of ~10 mg of NA12878 cell line genomic DNA Multiple measurement technologies and modes e.g. Illumina Life Technologies Complete Genomics Pacific Biosciences BioNano Genomics Real Time Genomics pedigree- based method courtesy of Francisco De La Vega Developed data integration methods and genotype calls for NA12878 Multi-platform method NIST-led team preprint on arXiv, accepted by Nature Biotechnology Pedigree methods Real Time Genomics (RTG) Illumina Platinum Genomes Established data release and QC policies, FTP site with curated data How you can get involved: Sequencing/analyzing the new Personal Genome Project trios Help with Structural Variant calls Help with analyzing data from long-read technologies Attend our biannual workshops (January in CA, August in MD) Help develop methods to measure performance using our well-characterized genomes Use our integrated SNP/indel genotypes for Reference Materials Sample Preparatio n Sequencing Bioinforma tics Variant List, Performance metrics Check calls vs haplotype framework Connect haplotype islands Phase contiguity extension Identify crossovers

Transcript of 2014 agbt giab_progress update

Page 1: 2014 agbt giab_progress update

In 2012, NIST convened the Genome in a Bottle Consortium to develop the metrology infrastructure needed to enable confidence in human whole genome variant calls.Consortium products will include: • Well-characterized whole genome and synthetic DNA Reference

Materials (RMs)• Reference data associated with the RMs• Reference methods (Comparison tools, documentary standards)

These Genome in a Bottle products will enable translationof whole genome sequencing to clinical applications. Expected use cases of these products include:• Enable regulated applications• Validation, QC, proficiency testing• Identify and quantify sources of bias & variability• Optimize measurement technologies• Resolve structural variants• Improve reference assembly• Integrate data from multiple platforms

New participants are welcome to join:

www.genomeinabottle.org

Overview Reference Material Selection and Design Group• Personal Genome Project samples – consent for

commercialization• Ashkenazi Jewish trio• East Asian trio• Looking for additional large family

• Supporting interlaboratory analysis of potential commercial reference materials, new participants welcome • COLO-829/COLO 829BL cancer/normal cell line• Artificial structures as spike-ins for point

mutations or more complex structural variants• FFPE samples based on RM cell lines

www.personalgenomes.org

Genome in a Bottle Consortium:Update on a Public-Private-Academic Consortium Developing a Standards Infrastructure for Human Genome Sequencing

Marc Salit1,2, Sarah A. Munro1,2, Justin Zook1, Genome in a Bottle Consortium(1) National Institute of Standards and Technology, Gaithersburg, MD 20899; (2) National Institute of Standards and Technology

Advances in Biomedical Measurement Sciences Program at Stanford University, Stanford, CA 94035

Measurements for Reference Material Characterization Group

Bioinformatics, Data Integration, and Data Representation Group

Performance Metrics Group

MilestonesWorked with Coriell to develop ~8300 vials of pilot RM from NA12878 cell line; samples received by NIST April 2013

NCBI-team led development of Genome in a Bottle FTP site containing curated data associated with RMs(ftp://ftp-trace.ncbi.nih.gov/giab/ftp)

NIST-led team prepared Data Integration Manuscript; preprint available on arXiv (http://arxiv.org/abs/1307.4661). in press in Nature Biotechnology.

Selected next 2 families for genome RMs from PGP collection; DNA samples at NIST February 2014

Decided on governance policies including formation of a steering committee and a data release policy based on Fort Lauderdale Principles, August 2013

Pilot RM release planned for May 2014; simultaneous release of pilot RM genotype calls

• Initiated experiments to characterize pilot RM (NA12878)

• 6 institutions have signed Material Transfer Agreements with NIST for pilot RM

• Other institutions are welcome to contact NIST to help with sequencing PGP trio RMs

• Cell lines are also available from Coriell

• Integration of software from partners:• GCAT (Genome Comparison & Analytic

Testing) tool enables mapping and variant call comparisons in exome, NIST analysis results publicly available here

• NCBI/CDC GeT-RM browser supports visualization of variant calls; includes NIST highly confident genotypes track

• RTG tool vcfeval for complex variants• VCFComparator• HSPH bcbio.variation tools

Mutation of Interest

Alien Barcode

Point Mutation Control Plasmids from M. Williams et al. Frederick National Laboratory for Cancer Research

Vial of ~10 mg of NA12878 cell line genomic DNAMultiple measurement technologies and modes e.g.• Illumina• Life Technologies• Complete Genomics• Pacific Biosciences• BioNano Genomics

Real Time Genomics pedigree-based method courtesy of Francisco De La Vega

• Developed data integration methods and genotype calls for NA12878• Multi-platform method• NIST-led team preprint on arXiv,

accepted by Nature Biotechnology• Pedigree methods• Real Time Genomics (RTG)• Illumina Platinum Genomes

• Established data release and QC policies, FTP site with curated data

How you can get involved:• Sequencing/analyzing the new Personal Genome Project trios• Help with Structural Variant calls• Help with analyzing data from long-read technologies• Attend our biannual workshops (January in CA, August in MD)• Help develop methods to measure performance using our well-

characterized genomes• Use our integrated SNP/indel genotypes for NA12878 and give us

feedback

Reference MaterialsSample

Preparation

Sequencing

Bioinformatics

Variant List, Performance metrics

Check calls vs haplotype framework

Connect haplotype islands

Phase contiguity extension

Identify crossovers