Explaining the assembly model
-
Upload
genome-reference-consortium -
Category
Science
-
view
167 -
download
8
description
Transcript of Explaining the assembly model
Explaining the assembly model
Valerie SchneiderNCBI
21 September 2014
Dilthey et al.Paten et al.
Scientific Models
• Differences between the reference genome assembly and other assemblies• Features of the current reference assembly
model and their relationship to genomic analyses and tools• The changing reference genome assembly
Outline
Sequences from haplotype 1Sequences from haplotype 2
Old Assembly model: compress into a consensus
New Assembly model: represent both haplotypes
GRC Assembly Model
Assembly (e.g. GRCh38)
Primary Assembly
Unit
Non-nuclear assembly unit
(e.g. MT)
PAR
Genomic Region(MHC)
Genomic Region
(UGT2B17)Genomic
Region(MAPT)
Church et al., PLoS Biol. 2011 Jul;9(7):e1001091 GRC Assembly Model
The human reference genome assembly is not a haploid model
ALT 2
ALT 3
ALT 4
ALT 5
ALT 6
ALT 7
ALT 1
Alternate loci are not synonymous with haplotypes
Assembly (e.g. GRCh38.p1)
Primary Assembly
Unit
Non-nuclear assembly unit
(e.g. MT)
ALT 1
ALT 2
ALT 3
ALT 4
ALT 5
ALT 6
ALT 7
PAR
Genomic Region(MHC)
Genomic Region
(UGT2B17)Genomic
Region(MAPT)
Church et al., PLoS Biol. 2011 Jul;9(7):e1001091
Patches
Genomic Region(ABO)
Genomic Region
(FOXO6)Genomic
Region(FCGBP)
GRC Assembly Model
Patches
FIX NOVEL
SCAFFOLD STATUS AT NEXTMAJOR ASSEMBLY RELEASE
ALT LOCI
--(integrated)
1q32 1q21 1p21
Dennis et al., 2012
GRC Assembly Model
Fix patches are different than novel patches
The alignments of the alternate loci scaffolds to the chromosomes are part of the assembly
Anatomy of an alt
Alignment Legend
no alignmentmismatchdeletion
Anatomy of an alt
AC012314.8
CU151838.1
ALT LOCI
AC012314.8
AC245052.3 CHR. 19
Alternate loci contain some sequence that is redundant to the primary assembly unit
Alt Loci: Informatics Challenges
Masks and alt aware aligners reduce the incidence of ambiguous alignments observed when aligning
reads to the full assembly
Mask1: mask chr for fix patches, scaffold for novel/alts. Mask2: mask only on scaffoldsSimulated Reads
GRCh38: Alt Loci
GRC: Assembly Model
GRCh38
• 178 regions with alt loci: 2% of chromosome sequence (61.9 Mb)
• 261 Alt Loci: 3.6 Mb novel sequence relative to chromosomes
GRCh38: Alt Loci
chromosome
alt/patch
reads On-target alignment
Off-target alignments
(n=122,922)
GRCh38: Alt Loci
The Changing Reference
The Changing Reference
Collaborators• NCBI RefSeq and gpipe annotation team• Havana annotators• Karen Miga• David Schwartz• Steve Goldstein• Mario Caceres• Giulio Genovese• Jeff Kidd• Peter Lansdorp• Mark Hills• David Page• Jim Knight• Stephan Schuster• 1000 Genomes
GRC SAB• Rick Myers• Granger Sutton• Evan Eichler• Jim Kent• Roderic Guigo• Carol Bult• Derek Stemple• Matthew Hurles• Richard Gibbs
GRC Credits
Source/Recruitment of DNA Donors for Library Construction
Another implication of the fact that 99.9% of the human DNA sequence is shared by any two individuals is that the backgrounds of the individuals who donate DNA for the first human sequence will make no scientific difference in terms of the usefulness and applicability of the information that results from sequencing the human genome. At the same time, there will undoubtedly be some sensitivity about the choice of DNA sources. There are no scientific reasons why DNA donors should not be selected from diverse pools of potential donors.
http://www.genome.gov/10000921 (August 17, 1996)
Reference Composition
Today’s reference assembly does not represent:1.The most common allele
2.The longest allele3.The ancestral allele
Roles for the reference
• Getting the sequence• Cataloging genes (and other features)• Establishing a coordinate system• Humans vs. other organisms