Mongo db and_academia

download Mongo db and_academia

If you can't read please download the document

  • date post

    01-Jul-2015
  • Category

    Documents

  • view

    406
  • download

    0

Embed Size (px)

Transcript of Mongo db and_academia

  • 1. MongoDB and academia Jan Aerts, PhD Wellcome Trust Sanger Institute Hinxton, UK [email_address] @jandot

2. Disclaimer 1 3. Disclaimer 2 4. Acknowledgments MongoDB community Caren Brockington 10gen 5. 6. transcriptomics genomics proteomics *omics 7. transcriptomics genomics proteomics *omics instantiationomics metabolomics spliceomics interactomics metallomics lipidomics orfeomics phenomics histomics 8. Academia != industry 9. heterogeneous systems 10. transitory 11. little optimization 12. slow adoption of new technology (don't break anything that works) 13. data management = afterthought money 14. Who are the players? 15.

  • large genome/data centers

genome hackers (lone bioinformaticians) bench-based scientists Drawings by Morag Ann Lewis 16.

  • large genome/data centers

genome hackers (lone bioinformaticians) bench-based scientists heavy investment in infrastructure/pipelines data exchange => standards! 17.

  • large genome/data centers

genome hackers (lone bioinformaticians) bench-based scientists little investment in infrastructure little time/effort for optimization one-off getting it done creating legacy need IT support for heavier work often self-taught 18.

  • large genome/data centers

genome hackers (lone bioinformaticians) bench-based scientists use whatever everyone else is using "normalization?" 19. The data landscape 20. 1. Flat text files

  • LOCUS SCU49845 5028 bp DNA PLN 21-JUN-1999
  • DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2)
  • and Rev7p (REV7) genes, complete cds.
  • VERSION U49845.1 GI:1293613 KEYWORDS . SOURCE Saccharomyces cerevisiae (baker's yeast) ORGANISM Saccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Saccharomycotina;
  • Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces.
  • REFERENCE1 (bases 1 to 5028)
  • AUTHORS Torpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W.
  • TITLE Cloning and sequence of REV7, a gene whose function is required for DNA
  • damage-induced mutagenesis in Saccharomyces cerevisiae
  • JOURNAL Yeast 10 (11), 1503-1509 (1994)
  • PUBMED 7871890
  • FEATURES Location/Qualifiers
  • gene 687..3158
  • /gene="AXL2" gene complement(3300..4037)
  • /gene="REV7"
  • ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg
  • 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct
  • 121 ctgcatctga agccgctgaa gttctactaa gggtggataa catcatccgt gcaagaccaa
  • 181 gaaccgccaa tagacaacat atgtaacata tttaggatat acctcgaaaa taataaaccg
  • 241 ccacactgtc attattataa ttagaaacag aacgcaaaaa ttatccacta tataattcaa
  • 301 agacgcgaaa aaaaaagaac aacgcgtcat agaacttttg gcaattcgcg tcacaaataa
  • 361 attttggcaa cttatgtttc ctcttcgagc agtactcgag ccctgtctca agaatgtaat
  • 421 aatacccatc gtaggtatgg ttaaagatag catctccaca acctc...
  • //
  • LOCUS SCU498455028 bpDNAPLN21-JUN-1999
  • DEFINITIONSaccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2)
  • and Rev7p (REV7) ...

21. 1. Flat text files

  • LOCUS SCU49845 5028 bp DNA PLN 21-JUN-1999
  • DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2)
  • and Rev7p (REV7) genes, complete cds.
  • VERSION U49845.1 GI:1293613 KEYWORDS . SOURCE Saccharomyces cerevisiae (baker's yeast) ORGANISM Saccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Saccharomycotina;
  • Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces.
  • REFERENCE1 (bases 1 to 5028)
  • AUTHORS Torpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W.
  • TITLE Cloning and sequence of REV7, a gene whose function is required for DNA
  • damage-induced mutagenesis in Saccharomyces cerevisiae
  • JOURNAL Yeast 10 (11), 1503-1509 (1994)
  • PUBMED 7871890
  • FEATURES Location/Qualifiers
  • gene 687..3158
  • /gene="AXL2" gene complement(3300..4037)
  • /gene="REV7"
  • ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg
  • 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct
  • 121 ctgcatctga agccgctgaa gttctactaa gggtggataa catcatccgt gcaagaccaa
  • 181 gaaccgccaa tagacaacat atgtaacata tttaggatat acctcgaaaa taataaaccg
  • 241 ccacactgtc attattataa ttagaaacag aacgcaaaaa ttatccacta tataattcaa
  • 301 agacgcgaaa aaaaaagaac aacgcgtcat agaacttttg gcaattcgcg tcacaaataa
  • 361 attttggcaa cttatgtttc ctcttcgagc agtactcgag ccctgtctca agaatgtaat
  • 421 aatacccatc gtaggtatgg ttaaagatag catctccaca acctc...
  • //
  • LOCUS SCU498455028 bpDNAPLN21-JUN-1999
  • DEFINITIONSaccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2)
  • and Rev7p (REV7) ...

22. 1. Flat text files

  • ##format=PCFv1
  • ##fileDate=20090805
  • ##source=myImputationProgramV3.1
  • ##reference=1000GenomesPilot-NCBI36
  • ##phasing=partial
  • #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
  • 1 967433 .GA151.43 0 AB=0.42;AC=1 GT:DP:GQ1/0:11:99.00
  • 1 970323 .GA492.61 0 AB=0.41;AC=1;AF=0.50 GT:DP:GQ 1/0:28:99.00
  • 1 970950 .AG1287.90 0 AB=0.55;AC=1;AF=0.50 GT:DP:GQ 0/1:108:99.00
  • 1972804 .TC 210.56 0 AB=0.53;AC=1;AF=0.50GT:DP:GQ 1/0:13:99.00
  • 1972857 .TC846.18 0 AB=0.53;AC=1;AF=0.50;AN=2 GT:DP:GQ 1/0:58:99.00
  • 1 974165 .TC 810.47 0 AB=0.38;AC=1;AF=0.50;AN=2 GT:DP:GQ 1/0:6:67.05
  • 1 977063 .CT1110.31 0 AB=0.50;AC=1;AF=0.50;AN=2 GT:DP:GQ0/1:67:99.00
  • 1 1006892 .CG 62.39 SF AC=2;AF=1.00;AN=2 GT:DP:GQ 1/1:2:6.02
  • 1 1148494 .AG5237.88 0 AC=2;AF=1.00;AN=2 GT:DP:GQ 1/1:160:99.00
  • 1 1149380 .TC165.10 0 AC=2;AF=1.00;AN=2 GT:DP:GQ 1/1:6:18.05
  • 1 1212553.CT426.61 0 AB=0.26;AC=1;AF=0.50;AN=2 GT:DP:GQ0/1:18:99.00
  • 1 1235867 .AG1158.08 0 AC=2;AF=1.00;AN=2 GT:DP:GQ1/1:30:90.28
  • 1 1237357 .TC 142.01 0 AC=2;AF=1.00;AN=2 GT:DP:GQ1/1:5:15.04
  • 1 1239050 .GA13952.03 0 AC=2;AF=1.00;AN=2 GT:DP:GQ 1/1:340:99.00
  • 2014370 . G A 29 0 NS=58;DP=258;AF=0.786 GT:GQ:DP:HQ 0|0:48:1:51,51
  • 2013330 . T A 3 q10 NS=55;DP=202;AF=0.024 GT:GQ:DP:HQ 0|0:49:3:58,50
  • 20 1110696 . A G,T 67 0 AF=0.421,0.579;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27
  • 2010237 . T . 47 0 NS=57;DP=257;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60
  • ...

23. 1. Flat text files

  • ##format=PCFv1
  • ##fileDate=20090805
  • ##source=myImputationProgramV3.1
  • ##reference=1000GenomesPilot-NCBI36
  • ##phasing=partial
  • #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
  • 1 967433 .GA151.43 0 AB=0.42;AC=1 GT:DP:GQ1/0:11:99.00
  • 1 970323 .GA492.61 0 AB=0.41;AC=1;AF=0.50 GT:DP:GQ 1/0:28:99.00
  • 1 970950 .AG1287.90 0 AB=0.55;AC=1;AF=0.50 GT:DP:GQ 0/1:108:99.00
  • 1972804 .TC 210.56 0 AB=0.53;AC=1;AF=0.50GT:DP:GQ 1/0:13:99.00
  • 1972857 .TC846.18 0 AB=0.53;AC=1;AF=0.50;AN=2 GT:DP:GQ 1/0:58:99.00
  • 1 974165 .TC 810.47 0 AB=0.38;AC=1;AF=0.50;AN=2 GT:DP:GQ 1/0:6:67.05
  • 1 977063 .CT1110.31 0 AB=0.50;AC=1;AF=0.50;AN=2 GT:DP:GQ0/1:67:99.00
  • 1 1006892 .CG 62.39 SF AC=2;AF=1.00;AN=2 GT:DP:GQ 1/1:2:6.02
  • 1 1148494 .AG5237.88 0 AC=2;AF=1.00;AN=2 GT:DP:GQ 1/1:160:99.00
  • 1 1149380 .TC165.10 0 AC=2;AF=1.00;AN=2 GT:DP:GQ 1/1:6:18.05
  • 1 1212553.CT426.61 0 AB=0.26;AC=1;AF=0.50;AN=2 GT:DP:GQ0/1:18:99.00
  • 1 1235867 .AG1158.08 0 AC=2;AF=1.00;AN=2 GT:DP:GQ1/1:30:90.28
  • 1 1237357 .TC 142.01 0 AC=2;AF=1.00;AN=2 GT:DP:GQ1/1:5:15.04
  • 1 1239050 .GA13952.03 0 AC=2;AF=1.00;AN=2 GT:DP:GQ 1/1:340:99.00
  • 2014370 . G A 29 0 NS=58;DP=258;AF=0.786 GT:GQ:DP:HQ 0|0:48:1:51,51
  • 2013330 . T A 3 q10 NS=55;DP=202;AF=0.024 GT:GQ:DP:HQ 0|0:49:3:58,50
  • 20 1110696 . A G,T 67 0 AF=0.421,0.579;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27
  • 2010237 . T . 47 0 NS=57;DP=257;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60
  • ...

24. 1. Flat text files

  • ##format=PCFv1
  • ##fileDate=20090805
  • ##source=myImputationProgramV3.1
  • ##reference=1000GenomesPilot-NCBI36
  • ##phasing=partial
  • #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
  • 1 967433 .GA151.43 0 AB=0.42;AC=1 GT:D