Summary stats for SNP density, 1mb windows (2,754 windows, no unbridged gaps)
description
Transcript of Summary stats for SNP density, 1mb windows (2,754 windows, no unbridged gaps)
Summary stats for SNP density, 1mb windows(2,754 windows, no unbridged gaps)
Sum Mean Stdev 0% 25% 50% 75% 100%
Venter3248704 1179 420 17 936 1182 1415 6606
Watson1989392 722 245 0 587 740 875 2172
Ceu2670808 969 343 0 768 976 1178 3222
Yri 2938231 1066 364 0 847 1073 1293 3307
Chb 2520952 915 331 0 717 920 1114 3146
Jpt 2489482 903 328 0 710 909 1104 3102
*using base coverage, feature coverage never finishes
** Sums reflect regions smaller than window size being thrown out
Sum Mean Stdev 0% 25% 50% 75% 100%
Venter 3347391 117 69 0 71 113 156 1851
Watson 2052667 72 41 0 45 71 97 1107
Ceu2725973 95 47 0 63 93 125 557
Yri 2998514 105 50 0 71 102 136 557
Chb 2573242 90 46 0 58 87 119 622
Jpt 2541151 89 46 0 57 86 117 574
dbSNP 126 12266167 430 259 20 303 386 496 12479
SNP density 100kb windows (28,442)
dbSNP 126 has 12million SNPs, including randoms , etc.The region with the most SNPs is chr16 44943302-45043302
Regions with no SNPs (100kb)
• Watson and Venter have 121 regions in common (292 & 162)
• All HapMap has 444 in common (469-487)• They all have 111 in common– dbSNP has entries in these regions– Ensembl has a few Watson SNPs and many Venter
SNPs (only 2 in chrY remained 0) here
Genome Graphs import uses a 10kb window for computing depth and coverage. For these graphs depth was chosen and connections were drawn between items up to 1mb away. Ceu was done with both 1mb connections and 10kb connections and there wasn’t a noticeable difference.
Graph A Graph B R R-SquaredWatson ceu .547 .299
yri .511 .261chb .553 .306jpt .554 .307
Venter ceu .463 .214yri .424 .180chb .470 .221jpt .471 .221
Ceu yri .942 .888Watson Venter .539 .290
Allele comparisonsWatson Venter
Exact match 1 or more Exact match 1 or more
Ceu 38.9% 49.2% 21.2% 37.5%
Chb 37.0% 48.3% 21.3% 36.6%
Jpt 36.5% 48.0% 21.2% 36.4%
Yri 38.0% 47.3% 18.3% 35.9%
Percent = (matches/total SNPs)*100Total SNPs is Watson or Venter1 or more includes the exact matches
Coding SNPs (RefSeq Genes)• Watson– 857 substitutions• 779 in dbSNP 128• 706 heterozygous
• Venter– 13 frameshifts• 1 in dbSNP 128• 13 heterozygous
– 1109 substitutions• 1003 in dbSNP 128• 648 heterozygous
Comparing Venter’s deletion to alignments
• 96,181 deletions• Extracted maf for +- 2bps of deletions• Found no deletions in other species at the
same locations• Found from 0 to 27 species with alignments– Mean 2 per deletion, median 1, max 27– chr9 36092117 36092118 A/-
The max region
The gene
OMIM
Watson homozygous? SNPs
• Only 1 allele found, not guaranteed homozygous• Found 382024 SNPs• matching species: min 0, max 27 (2 SNPs), ave 3,
median 2– 18,935 with 10 or more species
• aligned but not matching: min 0, max 27 (2 SNPs), ave 3, median 2– 25,663 with 10 or more species
Venter Homozygous? SNPs
• Only 1 allele found, not guaranteed homozygous• 1,450,836 SNPs• matching species: min 0, max 27 ,ave 3, median 2• aligned but not matching: min 0, max 27, ave 3,
median 2