GRC Workshop
description
Transcript of GRC Workshop
![Page 1: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/1.jpg)
GRC WorkshopASHG
22 Oct 2013
![Page 2: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/2.jpg)
OutlineReference Assembly BasicsGRC: Assembly management and dataflowGRCh38Accessing the assembly and data
http://genomereference.org
![Page 3: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/3.jpg)
What is the Reference Assembly?
Reference Assembly Basics
![Page 4: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/4.jpg)
![Page 5: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/5.jpg)
![Page 6: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/6.jpg)
An assembly is a MODEL of the genome
![Page 7: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/7.jpg)
![Page 8: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/8.jpg)
Lander and Waterman(1988) Genomics
Reads are randomly distributedOverlap between reads does not vary
AssumptionsVariables:G= haploid genome length in bpL= sequence read length in bpN= number of reads sequencedT= amount of overlap needed for detection in bpC= Coverage (C=LN/G)
Poisson distribution:P(Y=y)=(ly * e–l)/y!y= number of events in an intervall = mean number of events in an interval
For sequence calculations, coverage can be viewed as l
Reference Assembly Basics
Using this equation, you can calculate the probability that a base hasbeen sequenced y number of times.
By manipulating this formula, you can estimate the numbers of gaps for any given level of coverage.
![Page 9: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/9.jpg)
SequencedNot sequenced1X Coverage5X Coverage
10X Coverage
37% 63%0.6% 99.4%
0.005% 99.995%
Reference Assembly Basics
![Page 10: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/10.jpg)
2009 Sanger cost: shotgun sequence ~ $0.01/base finished sequence ~ $0.03/base
This clone: Shotgun=$1500Finish=$3000
Reference Assembly Basics
![Page 11: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/11.jpg)
Reference Assembly Basics
![Page 12: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/12.jpg)
tetrao
don
muntja
k_ind
ian
zebra
finch
zebra
fish
macaq
ue
alliga
tor
chick
ensh
eep
monod
elphis
orang
utan
gorill
ave
rvet
cpba
t
chim
p
owl_m
onke
y cat
pig
dusk
y_titi co
w
eleph
ant
fugu
babo
on dog
hedg
ehog
shrew
armad
illo
opos
sum
squir
rel_m
onke
yrab
bit
galag
olem
urrfb
at rat
mouse
marmos
et
wallab
y
colob
us_m
onke
y
platyp
us
0
1
2
3
4
5
6
7
8
9
10
Sequence Gaps : Uncaptured vs. Total
Uncaptured gaps Captured gaps
Species
Gap
Ave
. per
BA
C
Captured gap= no sequence, but a sub-clone spans the gapUncaptured gap= no sequence, no sub-clone spanning gap
Bob Blakesley, NISC
Reference Assembly Basics
![Page 13: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/13.jpg)
BiologyRepetitive sequence (interspersed repeats, segmental duplications)Variation
(regions of high diversity, structural variation)
Kidd et al., 2008
Reference Assembly Basics
![Page 14: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/14.jpg)
Reference Assembly Basics
Eugene Yaschenko, NCBI
![Page 15: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/15.jpg)
EnrichmentObservedExpected
-5
-4
-3
-2
-1
0
1
2
3
4
5
60
40
20
0
20
40
60
Maj
or h
isto
com
patib
ility
com
plex
ant
igen
Che
mok
ine
Tum
or n
ecro
sis
fact
or re
cept
or
Oth
er c
ytok
ine
rece
ptor
Cys
tein
e pr
otea
se in
hibi
tor
CA
M fa
mily
adh
esio
n m
olec
ule
Apo
lipop
rote
in
KR
AB
box
tran
scrip
tion
fact
or
Inte
rmed
iate
fila
men
t
Imm
unog
lobu
lin re
cept
or fa
mily
mem
ber
Oth
er c
ell a
dhes
ion
mol
ecul
e
Zinc
fing
er tr
ansc
riptio
n fa
ctor
Def
ense
/imm
unity
pro
tein
Stru
ctur
al p
rote
in
Cys
tein
e pr
otea
se
Cyt
okin
e re
cept
or
Oxy
gena
se
Cel
l adh
esio
n m
olec
ule
Tran
scrip
tion
fact
or
Mis
cella
neou
s fu
nctio
n
Sig
nalin
g m
olec
ule
Oxi
dore
duct
ase
Unc
lass
ified
Nuc
leic
aci
d bi
ndin
g
Sel
ect r
egul
ator
y m
olec
ule
Kin
ase
Hyd
rola
se
Rib
osom
al p
rote
in
Pro
tein
kin
ase
G-p
rote
in m
odul
ator
Ext
race
llula
r mat
rix
Oth
er tr
ansc
riptio
n fa
ctor
Human- PANTHER classifications (biological process)
Evan Eichler, University of Washington
Reference Assembly Basics
![Page 16: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/16.jpg)
TechnologyRead length long reads vs. short readsMate lengths distribution of insert sizesRead accuracy error model for your technologyRead depth coverage at each baseGenome distribution reads covering entire genome equally
Ajay et al., 2011
![Page 17: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/17.jpg)
Genome Research, May, 1997
Reference Assembly Basics
![Page 18: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/18.jpg)
Restrict and make libraries2, 4, 8, 10, 40, 150 kb
End-sequence allclones and retainpairing information“mate-pairs”
Find sequence overlaps
Each end sequenceis referred to as a read
WGS contig
tails
WGS: Sanger Reads
Scaffold
Reference Assembly Basics
![Page 19: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/19.jpg)
Contig: a sequence constructed from smaller, overlapping sequences, which contains no gaps.
Scaffold: a sequence constructed from smaller sequences, which may contain gaps.
Genome Vocabulary
Typically built from reads, but also from sequences in GenBank/EMBL/DDBJ
Typically built from sequences in GenBank/EMBL/DDBJ
Reference Assembly Basics
![Page 20: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/20.jpg)
Schatz et al, 2010
Reference Assembly Basics
![Page 21: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/21.jpg)
A T T T T C C C T T C T G A A A T G A T G A A A G A G T C
Reference Assembly Basics
![Page 22: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/22.jpg)
BAC insertBAC vector
Shotgun sequence
Assemble
Fold
sequ
ence
Gaps
deeper sequencecoverage rarelyresolves all gaps
GAPS
“finishers” go in to manually fill the gaps, often by PCR
Clone based assemblies
Reference Assembly Basics
![Page 23: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/23.jpg)
A
BCD
EFGH
IJKLMNO
ABCD
FGH
KL
ON
Ideally…
Non-sequence based Map
(flip)
ABCD
FGH
KL
ON
Reference Assembly Basics
![Page 24: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/24.jpg)
More like…
A
BCD
EFGH
IJKLMNO
A
BC
ZYX
W
HJ
M
V
N
O
AB
HIJ
CDY
LMNO
AB
HIJ
LMNO
?
Reference Assembly Basics
![Page 25: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/25.jpg)
Sequence vs. Non-sequence based mapsMmu7
WI GeneticWI/MRC RH
![Page 26: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/26.jpg)
Human assemblies available in the NCBI assembly database
http://www.ncbi.nlm.nih.gov/assembly
Reference Assembly Basics
![Page 27: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/27.jpg)
Reference Assembly Basics
![Page 28: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/28.jpg)
Reference Assembly Basics
N50:Measure of continuity.Half of the contigs in the assembly are this length or greater.
![Page 29: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/29.jpg)
Reference Assembly BasicsFragmented genomes tend to
have more partial modelsFragmented genomes have
fewer frameshifts
Alexander Souvorov, NCBI
![Page 30: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/30.jpg)
OutlineReference Assembly BasicsGRC: Assembly management and dataflowGRCh38Accessing the assembly and data
http://genomereference.org
![Page 31: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/31.jpg)
http://genomereference.org
![Page 32: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/32.jpg)
Distributed data
Genome not in INSDC Database
Old Assembly Model
GRC Assembly Management
Human Genome Project (HGP)
![Page 33: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/33.jpg)
GRC Assembly Management
![Page 34: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/34.jpg)
GRC Assembly Management
![Page 35: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/35.jpg)
Distributed data
Genome not in INSDC Database
Old Assembly Model
Centralized Data
GRC Assembly Management
![Page 36: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/36.jpg)
Issue tracking system (based on JIRA)
GRC Assembly Management
http://genomereference.org
![Page 37: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/37.jpg)
GRC Assembly Management
![Page 38: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/38.jpg)
GRC Assembly Management
5 July 2011
![Page 39: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/39.jpg)
GRC Assembly Management
![Page 40: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/40.jpg)
GRC Assembly Management
![Page 41: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/41.jpg)
ACCESSION NAME CONTIG
GAP Telomere 10000
AP006221 XX-190A2 Hschr1_ctg1
AL627309 RP11-34P13 Hschr1_ctg1
GAP type-3
AC114498 RP5-857K21 Hschr1_ctg3
AL669831 RP11-206L10 Hschr1_ctg3
AL645608 RP11-54O7 Hschr1_ctg3
Tiling Path File (TPF)
GRC Assembly Management
![Page 42: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/42.jpg)
Full Dovetail
Half-dovetail
Contained
Short/Blunt
GRC Assembly Management
![Page 43: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/43.jpg)
GRC Assembly Management
![Page 44: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/44.jpg)
GRC Assembly Management
![Page 45: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/45.jpg)
GRC Assembly Management
![Page 46: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/46.jpg)
GRC Assembly Management
![Page 47: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/47.jpg)
Build sequence contigs based on contigs defined in TPF (Tiling Path File).
Check for orientation consistenciesSelect switch pointsInstantiate sequence for further analysis
Switch point
Representative chromosome sequence
GRC Assembly Management
![Page 48: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/48.jpg)
AGP: A Golden Path
Provides instructions for building a sequence• Defines components sequences used to build scaffolds/chromosome• Switch points• Defines gaps and types
GRC Produces
GRC Assembly Management
• AGP• FASTA
![Page 49: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/49.jpg)
Distributed data
Old Assembly ModelCentralized Data
Updated Assembly Model
GRC Assembly Management
Genome not in INSDC Database
![Page 50: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/50.jpg)
Sequences from haplotype 1Sequences from haplotype 2
Old Assembly model: compress into a consensus
New Assembly model: represent both haplotypes
GRC Assembly Management
![Page 51: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/51.jpg)
Assembly (e.g. GRCh37)
Primary Assembly
Non-nuclear assembly unit
(e.g. MT)
ALT 1
ALT 2
ALT 3
ALT 4
ALT 5
ALT 9
ALT 6
ALT 7ALT
8
PAR
Genomic Region(MHC)
Genomic Region
(UGT2B17)Genomic
Region(MAPT)
GRC Assembly Management
![Page 52: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/52.jpg)
AC074378.4AC079749.5
AC134921.2AC147055.2
AC140484.1AC019173.4
AC093720.2AC021146.7
NCBI36 NC_000004.10 (chr4) Tiling Path
Xue Y et al, 2008
TMPRSS11E TMPRSS11E2
GRCh37 NC_000004.11 (chr4) Tiling Path
AC074378.4AC079749.5
AC134921.1AC147055.2
AC093720.2AC021146.7
TMPRSS11E
GRCh37: NT_167250.1 (UGT2B17 alternate locus)
AC074378.4AC140484.1
AC019173.4AC226496.2
AC021146.7
TMPRSS11E2
UGT2B17 RegionGRC Assembly Management
![Page 53: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/53.jpg)
GRC Assembly Management
7 alternate haplotypesat the MHC
Alternate loci released as:FASTA
AGPAlignment to chromosome
UGT2B17 MHC MAPT
GRCh37 (hg19)
![Page 54: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/54.jpg)
Assembly (e.g. GRCh37.p13)
Primary Assembly
Non-nuclear assembly unit
(e.g. MT)
ALT 1
ALT 2
ALT 3
ALT 4
ALT 5
ALT 9
ALT 6
ALT 7ALT
8
PAR
…
Genomic Region(MHC)
Genomic Region
(UGT2B17)Genomic
Region(MAPT)
Patches
Genomic Region(ABO)
Genomic Region(SMA)
Genomic Region
(PECAM1)
GRC Assembly Management
![Page 55: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/55.jpg)
GRC Assembly Management
GRCh37.p13• 178 Regions: 3.15% of chromosome
sequence• 131 FIX patches: add 6.8 Mb novel
sequence• 73 NOVEL patches: add >800kb novel
sequence
![Page 56: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/56.jpg)
MHC (chr6)Chr 6 representation (PGF)
Alt_Ref_Locus_2 (COX)
GRC Assembly Management
![Page 57: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/57.jpg)
17q deletion
H1
H2
Zody et al, 2008
GRC Assembly Management
![Page 58: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/58.jpg)
GRC Assembly Management
![Page 59: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/59.jpg)
chromosome
alt/patch
reads On-target alignment
Off-target alignments
(n=122,922)
GRC Assembly Management
![Page 60: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/60.jpg)
GRC Assembly Management
![Page 61: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/61.jpg)
Masks and alt aware aligners reduce the incidence of ambiguous alignments observed when aligning
reads to the full assembly
Mask1: mask chr for fix patches, scaffold for novel/alts. Mask2: mask only on scaffolds
GRC Assembly Management
![Page 62: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/62.jpg)
Distributed data
Genome not in INSDC Database
Old Assembly ModelCentralized Data
Updated Assembly Model
Genome in INSDC DatabaseGenome not in INSDC Database
GRC Assembly Management
![Page 63: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/63.jpg)
OutlineReference Assembly BasicsGRC: Assembly management and dataflowGRCh38Accessing the assembly and data
http://genomereference.org
![Page 64: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/64.jpg)
GRCh38 Impact
GRCh38
![Page 65: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/65.jpg)
GRCh38 Impact
GRCh37 Scaff N50: 44,983,201GRCh37B Scaff N50: 62,124,159
GRCh37 Contig N50: 38,440,852GRCh37B Contig N50: 49,319,739
![Page 66: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/66.jpg)
GRCh38 Impact
![Page 67: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/67.jpg)
GRCh38 Impact
![Page 68: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/68.jpg)
Modeled CentromeresIndividual base updatesFixed tiling path/assembly errorsAddition of novel sequence
GRCh38 Impact
Major Features of GRCh38
![Page 69: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/69.jpg)
CENTROMERES
GRCh38 Impact
![Page 70: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/70.jpg)
61-mer analysis set
9664
1kG high-confidence set
13584222
Mismatches MAF = 0n=15,244
MAF=0Insertio
nsn=834
MAF=0Deletion
sn=1541
MAF<5%Mismatc
h in pseudo/pr txptn=1413
Annotator and clinical
requestsn= ~260
GRCh38 Impact
![Page 71: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/71.jpg)
Pile-Up Analysis: “Never Seen” Mismatched Bases Originating from RP11 Components
GRCh38 Impact
79% of these bases are heterozygous in RP11 WGS
![Page 72: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/72.jpg)
GRCh37 Insertions Originating from RP11
GRCh38 Impact
GRCh37 Deletions Originating from RP11
17% heterozygous in RP11 WGS 18% heterozygous in RP11 WGS
![Page 73: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/73.jpg)
GRCh38 Impact
![Page 74: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/74.jpg)
GRCh38 Impact
![Page 75: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/75.jpg)
GRCh38 Impact
![Page 76: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/76.jpg)
1q32 1q21 1p211p21 patch alignment to chromosome 1
Dennis et al., 2012GRCh38 Impact
![Page 77: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/77.jpg)
HYDIN: chr16 (16q22.2)HYDIN2: chr1 (1q21.1)Missing in NCBI35/NCBI36 Unlocalized in GRCh37 Finished in GRCh38
Alignment of HYDIN2 Genomic, 300 Kb, 99.4% ID
Alignment of HYDIN CHM1_1.0, >99.9% IDAlignment of HYDIN2 Genomic, 300 Kb, 99.4% ID
Alignment of HYDIN CHM1_1.0, >99.9% ID
Doggett et al., 2006GRCh38 Impact
![Page 78: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/78.jpg)
GRCh38 Impact
Other Major Tiling Path Updates• Single CHM1 haplotype paths for:
• 1p12, 1q21, 1q32: SRGAP2• IGH• LRC/KIR• CCL3L1 (17q21)
• OM-guided• 10q11• Chr. 9 peri-centromeric inversion
![Page 79: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/79.jpg)
GRCh38 Impact
NOVEL GENES!GRCh37.p13: 211 genes found only on alt
loci and patches
![Page 80: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/80.jpg)
GRCh38 Impact
Sudmant et al., 2010
![Page 81: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/81.jpg)
Genovese et al., 2013
![Page 82: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/82.jpg)
1000G decoy sequence, viewed by:• GenBank alignment• Percent Repeat Masked• Repeat Mask type• Sequence Source (HTG, HuRef, ALLPATHS)
GRCh38 Impact
In a preliminary analysis, 90% of NA12878 reads that previously aligned uniquely to the decoy sequence had
an alignment to the updated assembly.
![Page 83: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/83.jpg)
GRCh38 Impact
Where is the decoy sequence in GRCh38?• Alt loci (low repeat content)• Model centromeres (high repeat content)• Unlocalized/Unplaced Scaffolds• Chromosomes
![Page 84: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/84.jpg)
OutlineReference Assembly BasicsGRC: Assembly management and dataflowGRCh38Accessing the assembly and data
http://genomereference.org
![Page 85: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/85.jpg)
Accessing the Data
![Page 86: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/86.jpg)
Accessing the Data
![Page 87: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/87.jpg)
Accessing the Data
![Page 88: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/88.jpg)
Accessing the Data
![Page 89: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/89.jpg)
Accessing the Data
![Page 90: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/90.jpg)
NCBI Genes, Ensembl Genes, Annotated Clone Problems, Segmental Duplications
Accessing the Data
![Page 91: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/91.jpg)
Accessing the Data
![Page 92: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/92.jpg)
Accessing the Data
![Page 93: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/93.jpg)
Accessing the Data
![Page 94: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/94.jpg)
Accessing the Data
![Page 95: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/95.jpg)
GRCh38 in Ensembl
GRCh38 will be incorporated into the existing Ensembl interface. Features such as genes, variation, regulation will be remade or remapped onto the new genome. Nearly 500 tracks are available.
GENCODE gene set
![Page 96: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/96.jpg)
Accessing the Data
![Page 97: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/97.jpg)
Alternate sequences in Ensembl
Haplotypes and patches on the chromosome
A fix patch around the ABO gene
Use the Region comparison view to see the difference between the patch and primary assembly
The GRC alignment track indicates edits
![Page 98: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/98.jpg)
View your data on the Genome
Zoomed in
Zoomed out
Follow the link from the homepage
Red bases show mismatches
![Page 99: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/99.jpg)
Transition to GRCh38 in Ensembl
INSDC coordinates identify the assembly as well as the position
Convert coordinates between assemblies
Our blog series details our progress with GRCh38Ensembl.info
![Page 100: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/100.jpg)
Remap Set up slide
![Page 101: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/101.jpg)
Accessing the Data
![Page 102: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/102.jpg)
Accessing the Data
![Page 103: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/103.jpg)
1000 Genomes Browser: http://www.ncbi.nlm.nih.gov/variation/tools/1000genomesGeT-RM Browser: http://www.ncbi.nlm.nih.gov/variation/tools/getrmVariation Viewer: http://www.ncbi.nlm.nih.gov/variation/view (coming Fall 2013!)
![Page 104: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/104.jpg)
![Page 105: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/105.jpg)
Tiling Path
Sequence Bar
Segmental Duplications, Eichler Lab
1000 Genomes strict accessibility mask
Annotated clone assembly problems
![Page 106: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/106.jpg)
dbSNP Build 138 based on annotation run 104
Model based paralogous sequence differences, NCBI annotation run #Paralogous/pseudo gene alignments, NCBI annotation run #
Single Unique Nucleotide (SUN) map, Sudmant 2010ClinVar Long Variations
GRC Curation Issues
ClinVar Short Variations
![Page 108: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/108.jpg)
http://genomeref.blogspot.com/
Accessing the Data
![Page 109: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/109.jpg)
![Page 110: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/110.jpg)
![Page 111: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/111.jpg)
![Page 112: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/112.jpg)
![Page 113: GRC Workshop](https://reader031.fdocuments.us/reader031/viewer/2022020920/56816913550346895de02fe4/html5/thumbnails/113.jpg)
Accessing the Data