Explaining the assembly model

Post on 27-Nov-2014

167 views 8 download

Tags:

description

GRC Workshop at Churchill College on Sep. 21, 2014. This is Valerie Schneider's talk describing the assembly model.

Transcript of Explaining the assembly model

Explaining the assembly model

Valerie SchneiderNCBI

21 September 2014

Dilthey et al.Paten et al.

Scientific Models

• Differences between the reference genome assembly and other assemblies• Features of the current reference assembly

model and their relationship to genomic analyses and tools• The changing reference genome assembly

Outline

Sequences from haplotype 1Sequences from haplotype 2

Old Assembly model: compress into a consensus

New Assembly model: represent both haplotypes

GRC Assembly Model

Assembly (e.g. GRCh38)

Primary Assembly

Unit

Non-nuclear assembly unit

(e.g. MT)

PAR

Genomic Region(MHC)

Genomic Region

(UGT2B17)Genomic

Region(MAPT)

Church et al., PLoS Biol. 2011 Jul;9(7):e1001091 GRC Assembly Model

The human reference genome assembly is not a haploid model

ALT 2

ALT 3

ALT 4

ALT 5

ALT 6

ALT 7

ALT 1

Alternate loci are not synonymous with haplotypes

Assembly (e.g. GRCh38.p1)

Primary Assembly

Unit

Non-nuclear assembly unit

(e.g. MT)

ALT 1

ALT 2

ALT 3

ALT 4

ALT 5

ALT 6

ALT 7

PAR

Genomic Region(MHC)

Genomic Region

(UGT2B17)Genomic

Region(MAPT)

Church et al., PLoS Biol. 2011 Jul;9(7):e1001091

Patches

Genomic Region(ABO)

Genomic Region

(FOXO6)Genomic

Region(FCGBP)

GRC Assembly Model

Patches

FIX NOVEL

SCAFFOLD STATUS AT NEXTMAJOR ASSEMBLY RELEASE

ALT LOCI

--(integrated)

1q32 1q21 1p21

Dennis et al., 2012

GRC Assembly Model

Fix patches are different than novel patches

The alignments of the alternate loci scaffolds to the chromosomes are part of the assembly

Anatomy of an alt

Alignment Legend

no alignmentmismatchdeletion

Anatomy of an alt

AC012314.8

CU151838.1

ALT LOCI

AC012314.8

AC245052.3 CHR. 19

Alternate loci contain some sequence that is redundant to the primary assembly unit

Alt Loci: Informatics Challenges

Masks and alt aware aligners reduce the incidence of ambiguous alignments observed when aligning

reads to the full assembly

Mask1: mask chr for fix patches, scaffold for novel/alts. Mask2: mask only on scaffoldsSimulated Reads

GRCh38: Alt Loci

GRC: Assembly Model

GRCh38

• 178 regions with alt loci: 2% of chromosome sequence (61.9 Mb)

• 261 Alt Loci: 3.6 Mb novel sequence relative to chromosomes

GRCh38: Alt Loci

chromosome

alt/patch

reads On-target alignment

Off-target alignments

(n=122,922)

GRCh38: Alt Loci

The Changing Reference

The Changing Reference

Collaborators• NCBI RefSeq and gpipe annotation team• Havana annotators• Karen Miga• David Schwartz• Steve Goldstein• Mario Caceres• Giulio Genovese• Jeff Kidd• Peter Lansdorp• Mark Hills• David Page• Jim Knight• Stephan Schuster• 1000 Genomes

GRC SAB• Rick Myers• Granger Sutton• Evan Eichler• Jim Kent• Roderic Guigo• Carol Bult• Derek Stemple• Matthew Hurles• Richard Gibbs

GRC Credits

Source/Recruitment of DNA Donors for Library Construction

Another implication of the fact that 99.9% of the human DNA sequence is shared by any two individuals is that the backgrounds of the individuals who donate DNA for the first human sequence will make no scientific difference in terms of the usefulness and applicability of the information that results from sequencing the human genome. At the same time, there will undoubtedly be some sensitivity about the choice of DNA sources. There are no scientific reasons why DNA donors should not be selected from diverse pools of potential donors.

http://www.genome.gov/10000921 (August 17, 1996)

Reference Composition

Today’s reference assembly does not represent:1.The most common allele

2.The longest allele3.The ancestral allele

Roles for the reference

• Getting the sequence• Cataloging genes (and other features)• Establishing a coordinate system• Humans vs. other organisms