ReComp for genomics
-
Upload
paolo-missier -
Category
Technology
-
view
88 -
download
0
Transcript of ReComp for genomics
ReComp for genomics
Our Vision:selective re-computation of genomics pipelines
in reaction to changesNov, 2016
Dr. Paolo MissierSchool of Computing Science
Newcastle University
Data Analytics enabled by NGS
Genomics: WES / WGS, Variant calling, Variant interpretation diagnosis- Eg 100K Genome Project, Genomics England, GeCIP
Submission of sequence data for archiving and analysis
Data analysis using selected EBI and external software tools
Data presentation and visualisation through web interface
Visualisation
Metagenomics: Species identification- Eg The EBI metagenomics portal
Understanding change: threats and opportunities
BigData
Life SciencesAnalytics
“ValuableKnowledge”
V3
V2
V1
Meta-knowledge
AlgorithmsTools
Middleware
Referencedatasets
t
t
t
Key questions for the ReComp project:
• Threats: Will any of the changes invalidate prior findings?
• Opportunities: Can the findings from the pipelines be improved over time?
• Cost: Need to model future costs based on past history and pricing trends for virtual appliances
• Impact:• Which patients/samples are likely to be affected?• How do we estimate the potential benefits on affected patients?• Re-computations are expensive. Can we estimate the impact of these changes without re-
computing entire cohorts?
Many of the elements involved in producing analytical knowledge change over time:• Algorithms and tools• Accuracy of input sequences• Reference databases (HGMD, ClinVar,
OMIM GeneMap, GeneCard,…)
The ReComp vision
Observe change• In big data• In meta-knowledge
Assess and measure• knowledge decay
Estimate• Cost and benefits of refresh
Enact• Reproduce (analytics)
processes
BigData
Life SciencesAnalytics “Valuable
Knowledge”
V3V2
V1Meta-knowledge
AlgorithmsTools
MiddlewareReferencedatasets
t
t
t
ReComp:a decision support system for selectively re-computing complex analytics in reaction to change
- Generic: not just for the life sciences!- Customisable: eg for genomics pipelines
Approach and challenges
Challenges:
1. Learning from history and optimisation:• What types of meta-knowledge needs to be captured, and how much history is required to make
optimal re-computation decisions?• Can we use history to learn estimates of impact without the need for actual re-computation?
2. Software infrastructure and toolingReComp aims to deliver a metadata management and analytics stack
3. Reproducibility:How do we ensure that the “ReComp” button will actually performe a valid re-computation?
4. Impact:Which areas of genomics and more broadly bioinformatics can benefit from ReComp?
Approach: It’s all in the meta-data!
1. History of past computations. Capture details of analytics tasks and their executions:- Structure and dependencies of the process- Cost- Provenance of the outcomes
2. Metadata analytics: Learn from history- Estimation models for impact, cost, benefits
Project structure
• 3 years funding from the EPSRC (£585,000 grant) on the Making Sense from Data call• Feb. 2016 - Jan. 2019
• 2 RAs fully employed in Newcastle• PI: Dr. Missier, School of Computing Science, Newcastle University (30%)• CO-Investigators (8% each):
• Prof. Watson, School of Computing Science, Newcastle University• Prof. Chinnery, Department of Clinical Neurosciences, Cambridge University• Dr. Phil James, Civil Engineering, Newcastle University
Builds upon the experience of the Cloud-e-Genome project: 2013-2015
Aims: - To demonstrate cost-effective workflow-based processing of NGS pipelines on the cloud- To facilitate the adoption of reliable genetic testing in clinical practice
- A collaboration between the Institute of Genetic Medicine and the School of Computing Science at Newcastle University
- Funding: NIHR / Newcastle BRC (£180,000) plus $40,000 Microsoft Research grant “Azure for Research”