Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery
description
Transcript of Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery
![Page 1: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/1.jpg)
Integration and analysis Integration and analysis of multi-type high-of multi-type high-throughput data for throughput data for biomolecular knowledge biomolecular knowledge discoverydiscovery
Dr. Erik Bongcam-RudloffDr. Erik Bongcam-Rudloff
SGBC-SLUSGBC-SLU
Uppsala, SwedenUppsala, Sweden
![Page 2: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/2.jpg)
Biologists modus Biologists modus operandioperandi
Observing a phenomenon that is in some way Observing a phenomenon that is in some way interesting or puzzling.interesting or puzzling.
Making a guess as to the explanation of the Making a guess as to the explanation of the phenomenon.phenomenon.
Devising a test to show how likely this Devising a test to show how likely this explanation is to be true or false.explanation is to be true or false.
Carrying out the test, and, on the basis of the Carrying out the test, and, on the basis of the results, deciding whether the explanation is a results, deciding whether the explanation is a good one or not. In the latter case, a new good one or not. In the latter case, a new explanation will (with luck) 'spring to mind' as explanation will (with luck) 'spring to mind' as a result of the first test.a result of the first test.
httphttp:://www.biology.ed.ac.uk/archive/jdeacon/statistics/tress2.html//www.biology.ed.ac.uk/archive/jdeacon/statistics/tress2.html
![Page 3: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/3.jpg)
The Observed The Observed phenomenonphenomenon
![Page 4: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/4.jpg)
Selection of test timesSelection of test times
![Page 5: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/5.jpg)
But was is the real But was is the real event?event?
![Page 6: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/6.jpg)
Sometimes you could be Sometimes you could be luckylucky
PositivePositive
““PositivePositive”” results are used results are used ““negativenegative”” rejectedrejected
Why?Why?Only positive results are publishableOnly positive results are publishable
““PositivePositive”” results are used results are used ““negativenegative”” rejectedrejected
Why?Why?Only positive results are publishableOnly positive results are publishable
![Page 7: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/7.jpg)
Next Generation Next Generation techniquestechniques
![Page 8: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/8.jpg)
New challengesNew challenges
1 TB data1 TB data1 TB data1 TB data
![Page 9: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/9.jpg)
Gbases produced at Gbases produced at SangerSanger
![Page 10: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/10.jpg)
World NGS MapWorld NGS Map
http://omicsmaps.com/http://omicsmaps.com/
![Page 11: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/11.jpg)
But this is wonderful! But this is wonderful! Or?Or?
Sequence without knowledge connected to it is Sequence without knowledge connected to it is worth: 0worth: 0
The deluge of data produced by these hordes The deluge of data produced by these hordes of machines worldwide demand automatic of machines worldwide demand automatic workflowsworkflows
Complete new systems to shuffle data aroundComplete new systems to shuffle data around
Storage of never used amountsStorage of never used amounts
Machines with gigantic amounts of RAMMachines with gigantic amounts of RAM
![Page 12: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/12.jpg)
COSTSCOSTS
![Page 13: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/13.jpg)
PROBLEMSPROBLEMS
NOmenclatureNOmenclature
Publishing culturePublishing culture
Moving target developmentMoving target development
Old ways of work and resistance to changes in Old ways of work and resistance to changes in cultureculture
![Page 14: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/14.jpg)
Publishing culture as Publishing culture as exampleexample
We get tax payers money, we pay publishers to We get tax payers money, we pay publishers to publish, the publishers sell the articles and obtain publish, the publishers sell the articles and obtain the copy rightsthe copy rights
To connect knowledge to sequences we need To connect knowledge to sequences we need automatic methods, workflows, text mining. Most automatic methods, workflows, text mining. Most of this is limited by close database systems. Only of this is limited by close database systems. Only available is PubMed. But PubMed has only short available is PubMed. But PubMed has only short abstracts. NO information about conditions, M&M abstracts. NO information about conditions, M&M etcetc
We need to change this cultureWe need to change this culture
![Page 15: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/15.jpg)
The BLAST analogy...The BLAST analogy...
By far the most used tool by biologists By far the most used tool by biologists
Not possible if databases were not Open Not possible if databases were not Open Access and freely searchable Access and freely searchable
Imagine if Nucleotide and Protein databases Imagine if Nucleotide and Protein databases followed the life science publishing modelfollowed the life science publishing model
![Page 16: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/16.jpg)
BLASTBLAST
![Page 17: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/17.jpg)
BLASTBLAST
![Page 18: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/18.jpg)
BLASTBLAST
![Page 19: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/19.jpg)
BLASTBLAST
![Page 20: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/20.jpg)
BLASTBLAST
![Page 21: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/21.jpg)
Human centricHuman centric
What about all other areas of the Life What about all other areas of the Life Sciences?Sciences?
Most genes are named by sequence similarity, Most genes are named by sequence similarity, but are the functions the same?but are the functions the same?
![Page 22: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/22.jpg)
MicrobiomeMicrobiome
http://www.secondgenome.comhttp://www.secondgenome.com
A microbiome is the A microbiome is the totality of microbes, totality of microbes,
their genetic elements their genetic elements (genomes), and (genomes), and environmental environmental
interactions in a interactions in a particular environment.particular environment.
![Page 23: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/23.jpg)
![Page 24: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/24.jpg)
Fat and leanFat and lean
Metabolic effects of transplanting gut microbiota Metabolic effects of transplanting gut microbiota from lean donors to subjects with metabolic from lean donors to subjects with metabolic syndrome.syndrome.A. Vrieze et al, EASD abstracts, 24 September 2012.A. Vrieze et al, EASD abstracts, 24 September 2012.
The result was: Lean donor faecal infusion The result was: Lean donor faecal infusion improves hepatic and peripheral insulin resistance improves hepatic and peripheral insulin resistance as well as fasting lipid levels in obese individuals as well as fasting lipid levels in obese individuals with the metabolic syndromewith the metabolic syndrome
![Page 25: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/25.jpg)
Genome sizesGenome sizes
![Page 26: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/26.jpg)
How many species?How many species?
Several orders of magnitude:Several orders of magnitude:
Some estimates:Some estimates:3-50 million species of arthropods3-50 million species of arthropods1-100 million species of nematodes1-100 million species of nematodes
Only a portion of bacterias have being Only a portion of bacterias have being identified, 99% of bacterias cannot be cultured.identified, 99% of bacterias cannot be cultured.
““Once the diversity of the microbial worldis Once the diversity of the microbial worldis catalogued, it will make astronomy to look like catalogued, it will make astronomy to look like a pitiful sciencea pitiful science””Julian Davies, Professor Emeritus. UBCJulian Davies, Professor Emeritus. UBC
![Page 27: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/27.jpg)
New research strategiesNew research strategies
MicrobialMicrobial LivestockLivestock PlantsPlants
![Page 28: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/28.jpg)
Typical Sources of Typical Sources of MetagenomicsMetagenomics
Soil samplesSoil samples
Sea water samplesSea water samples
Air samplesAir samples
Medical samplesMedical samples
Farm animal samplesFarm animal samples
Ancient bonesAncient bones
Human microbiomeHuman microbiome
![Page 29: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/29.jpg)
Ion Proton: "Personal Genome Ion Proton: "Personal Genome Machine".Machine".
LIFE TECHNOLOGIES CORPORATIONLIFE TECHNOLOGIES CORPORATION
Real tests of transcriptome sequencing on the Real tests of transcriptome sequencing on the Proton. Using 500 ng of input poly-A RNA, it was Proton. Using 500 ng of input poly-A RNA, it was possible to generate 50 million reads from a possible to generate 50 million reads from a melanoma cancer sample.melanoma cancer sample.Joe Boland of the National Cancer Institute according to Genomeweb.Joe Boland of the National Cancer Institute according to Genomeweb.
![Page 30: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/30.jpg)
Oxford NanoporeOxford Nanopore
http://http://www.nanoporetech.com/www.nanoporetech.com/
![Page 31: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/31.jpg)
High technology High technology everywhere!everywhere!
![Page 32: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/32.jpg)
New applicationsNew applications
Only imagination will put the limits of what its Only imagination will put the limits of what its possible to be done using Next Generation possible to be done using Next Generation Technologies!Technologies!
![Page 33: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/33.jpg)
The big challenge:The big challenge:
Open Access, Open source, collaborative Open Access, Open source, collaborative networksnetworks
Data sharingData sharing
Common languageCommon language
Tool systems to glue all together!!Tool systems to glue all together!!
![Page 34: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/34.jpg)
SeqAheadSeqAhead
COST Action BM1006: Next Generation COST Action BM1006: Next Generation Sequencing Data Analysis Network. 2011-2014Sequencing Data Analysis Network. 2011-2014
COST Action 25 countriesCOST Action 25 countries
http://www.seqahead.euhttp://www.seqahead.eu//
![Page 35: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/35.jpg)
ALLBIOALLBIO
10 partners 8 countries10 partners 8 countries
FP7 projectFP7 project
Broadening the Bioinformatics Infrastructure to Broadening the Bioinformatics Infrastructure to unicellular, animal, and plant scienceunicellular, animal, and plant science
www.allbioinformatics.euwww.allbioinformatics.eu
![Page 36: Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery](https://reader036.fdocuments.us/reader036/viewer/2022062309/568152db550346895dc0f8ce/html5/thumbnails/36.jpg)
THANKS!!THANKS!!
Como 2012Como 2012