Post on 15-Jul-2015
Should I be dead?a very personal genomics
Neil Saunders
Digital Productivitywww.csiro.au
personal genomics: Slide 2 of 20
personal genomics pipelines
- we need reports that patients and clinicians can use
personal genomics: Slide 3 of 20
personal genetics is already reality
http://genomesunzipped.org/2011/06/3747.php
personal genomics: Slide 4 of 20
introduction to 23andme
https://www.23andme.com
Who Should Have Access to Your DNA?
https://medium.com/backchannel/who-should-have-access-to-your-dna-6830fbf8dc79
personal genomics: Slide 5 of 20
data visualization at 23andme
personal genomics: Slide 6 of 20
23andme “raw data”
rs4477212 1 82154 AArs3094315 1 752566 AArs3131972 1 752721 GGrs12562034 1 768448 GGrs12124819 1 776546 AArs11240777 1 798959 GGrs6681049 1 800007 CCrs4970383 1 838555 ACrs4475691 1 846808 CTrs7537756 1 854250 AG
personal genomics: Slide 7 of 20
promethease + SNPedia
http://www.snpedia.com/index.php/Promethease
personal genomics: Slide 8 of 20
ensembl variant effect predictor (VEP)
http://www.ensembl.org/info/docs/tools/vep/index.html
personal genomics: Slide 9 of 20
converting 23andMe data to VCF
personal genomics: Slide 10 of 20
VCF conversion attempt #1 - 23andme2vcf.pl
https://github.com/arrogantrobot/23andme2vcf
in vcf 946 275not in reference 30 734DI 26DD 161II 689D 36I 112- - 13 752total 991 785in raw data 991 786
personal genomics: Slide 11 of 20
VCF conversion attempt #2 - plink
990 762 / 991 786 lines converted (but with issues)
sau103@spanxc-nh ˜/vep/data $ grep -v "ˆ#" vcf/plink19.vcf | head -4 | sort1 752566 rs3094315 A . . . . GT 0/01 752721 rs3131972 G . . . . GT 0/01 768448 rs12562034 G . . . . GT 0/01 776546 rs12124819 A . . . . GT 0/0
sau103@spanxc-nh ˜/vep/data $ grep -v "ˆ#" vcf/23andme2vcf.vcf | head -4 | sortchr1 752566 rs3094315 g A . . . GT 1/1chr1 752721 rs3131972 A G . . . GT 1/1chr1 776546 rs12124819 A . . . . GT 0/0chr1 798959 rs11240777 g . . . . GT 0/0
http://www.snpedia.com/index.php/User:Donwulff
personal genomics: Slide 12 of 20
VCF conversion attempt #3 - python script
(not tried)https://github.com/hammer/personal-genome-analysis/tree/master/scripts
personal genomics: Slide 13 of 20
VCF conversion attempt #4 - bcftools
(not tried)http://samtools.github.io/bcftools/bcftools.html
personal genomics: Slide 14 of 20
VCF conversion attempt #5 - the winner
973 306 / 991 786 lines converted
http://apol1.blogspot.com.au/2013/08/impute-apoe-and-apol1-with-23andme.html
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Neil_Saunders1 82154 rs4477212 A G . . . GT 0/01 752566 rs3094315 G A . . . GT 1/11 752721 rs3131972 A G . . . GT 1/11 768448 rs12562034 G A . . . GT 0/01 776546 rs12124819 A G . . . GT 0/0
personal genomics: Slide 15 of 20
running the VEP - summary output
personal genomics: Slide 16 of 20
parsing VEP output
you can read this later
personal genomics: Slide 17 of 20
visualization of stop-gained variants
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
0 Mb 50 Mb 100 Mb 150 Mb 200 Mb 250 Mb
clinical significance
benign
likely_pathogenic
not_provided
not_provided,not_provided
other
pathogenic
pathogenic,other
Genomic location of my stop_gained variants
personal genomics: Slide 18 of 20
so should I be dead?
yes and so should you“All genomes are dysfunctional: broken genes in healthy individuals”
“A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes”
http://genomesunzipped.org/2012/02/all-genomes-are-dysfunctional-broken-genes-in-healthy-individuals.php
http://www.sciencemag.org/content/335/6070/823.full
personal genomics: Slide 19 of 20
summary
a growing “hacker community” around personal genomics data
no shortage of inspiration for reporting and visualization tools
the challenge is interpretation for non-specialists