Dutch medical research and clinical data infrastructure ...service, process, apps, data, technology....
Transcript of Dutch medical research and clinical data infrastructure ...service, process, apps, data, technology....
Dutch medical research and clinical data infrastructure coordinated by NFU Morris Swertz, Richard Sinke, Marc Rietveld and
many others.
Outline
• NFU – Towards program research data infrastructure
• VKGL – Towards data sharing for diagnostics
• UMCG – Some hooks for future collaboration
NFU, DTL, and beyond
NFU - Netherlands Federation of UMCs
The Netherlands Federation of University Medical Centres (Nederlandse Federatie van Universitair Medische Centra) (NFU) represents the eight cooperating UMCs in the Netherlands, as an advocate for and employer of 65,000 people. The NFU was founded in 2004 as a spin-off from the University Hospitals Association (Vereniging Academische Ziekenhuizen) (VAZ), which was established in 1989. The objective has remained the same: to ensure that agencies that decide healthcare issues in the Netherlands take into account the special role of the academic hospitals (in the past) and the UMCs (presently).
Desire to coordinate research data infra
Challenges: • Increasing demands from government and society • Innefficient use of available data • Barriers in access to data and facilities • Hidden costs because of inadequate infrastructure • Difficulties to integrate health into research • Fragmentation / duplication of the work NFU Needs • Coordination of the existing infrastructure programs • In particular because these go beyond one UMCs walls • Better alignment with other NFU strategic activities (registratie aan de
bron; kwaliteitsborging mensgebonden onderzoek).
• Improve position in EU
6 sept: project initiated to define the program
Towards one integrated research infrastructure for UMCs in 10y Fase 1: 2014 – 2018: harmonization / standardization Fase 2: 2018 – 2023: integration += DTL
health Jeroen Belien,
David v Enckevort, Freek de Bruijn, et
al
Themes (embracing existing WG) • Data stewardship guidelines
• Standards for process and architecture • Coordination in EU calls
• Collaboration in ‘hard’ IT infrastructure • Big data / HPC Clusters
• Medical intelligence • Use EHR for research
• TTPs / pseudonimisation and security policy • Standards data models / interfaces
• Integration of registries • Findability / Catalogues
• Data access • Sharing of expertise (“loket”)
Relevant themes for NGS
Big data infrastructure [definition unclear] Large network, storage and compute needs are increasing. We need coordination for base capacity (in each house?) and peak capacity (shared?). Ability to scale-out is key. We can coordinate via BBMRI-IT, TraIT, SURF, EYR.
Standards process & architecture Reference architecture incl. business, service, process, apps, data, technology. NGS is on data (std. of meta data), apps (pipelines, auth) and technology. We can coordinate via CTMM/TraIT, ACZIE/TACZIE and SURF Medical intelligence
Patients are increasingly classified based on many phenotypic/imaging profiles. DNA/RNA is becoming dominant in these approaches. Sharing is needed for suitably large populations and efficient IT development. Unclear yet how to coordinate as still fragmented.
Data expertise loket Each researcher should have access to local expertise center. Emerging centers should work together, using each others specialties, and collaborate on SOPs etc. For NGS we can coordinate this via DTL theme meetings.
VKGL data sharing
Rien Blok, Marielle van Gijn, Ronald Lekanne, Pieter Neerincx, Rolph Pfundt, Claudia
Ruivenkamp, Jasper Saris, Rolf Sijmons, Morris Swertz (secretary), Richard Sinke (chair), Peter
Taschner, Maartje Vogel, Joeri van der Velde, Terry Vrijenhoek
What is VKGL
• Dutch Society of Clinical Genetic Diagnostic Laboratories • Aims to promote clinical genetic diagnostics specifically and
clinical genetics in general, via • Education and registration of its members specialist • Quality guidelines and certification • Spokesperson in government policy making • Coordination of care and diagnostics together with sister VKGN
(Society clinical genetics NL) • Coordination of research activity with NVHG (Dutch society for
human genetics).
Motivation
• From DNA diagnostics there is a high need to gain insight in the observations of other labs, E.g.
• (how often) have variants been seen before? • (how often) have variants been seen in a patient? • Are we trying to solve the same families?
• There are many national and international initiatives • For known variants there is a range of services: HGMD, 1KG, GoNL,
div LOVDs, DMuDB, EBI, NIH, etc. • For data sharing there are many models; centralized, de-centralized;
federated; closed; (partially) open; etc.
• What is needed to start sharing? • Remove technical barriers from hospital system to sharing • Remove organization barriers as sharing still labour intensive • Agree on content, purpose and conditions of data sharing
Work plan Step 1: share example data as representative test set
• four pilots: ‘legacy’ brca1; cardio (CGD); NGS panel; fenotype • Collect a list of all gene panels used
Step 2: gap analysis / standardization on format/content • Evaluate to what extend notations/nomenclature diverge • Expect to use HGVS nomenclature, references used (LRG) • Incl. classifications, quality, coverage, etc
Step 3: demontrator implementations • Evaluation of various software and architectures used • User interfaces that can answer the desired questions • Evaluation how to integrate with existing infra (e.g. Cartagenia)
Step 4: evaluate.
Other BBMRI/BioMedBridges/BioSHaRE/UMCG actions
NGS research, diagnostics, patient registries ... Can we share notes?
Mission
GWAS Explore summary level GWAS data
Compute Run analysis workflows on big data compute infra- structure
Catalogue Find data item and sample collections
Protocol CRFs, Questionnaires, Lab protocols, and assays
xQTL Multi-omics association & visualization tools
Share Friends, Groups and Permission management
NGS Next-Generation Sequencing
File File storage and drivers for images and data
Mutation Explore genetic mutations and patho-genicity effects
XGAP Multi-omics genotypes and phenotypes
Data Filter individual data sets and download to Excel & SPSS
Organization Institutes, Departments, People, Locations & Containers
Download as open source at http://github.com/molgenis/molgenis
48h diagnostics Gene panels Lung cancer PM Leukemia PM CVD PM LifeLines deep / RP3 (RNA) +5 RD patient registries
Data management SOPs
‘light-weight’ solution for bioinformaticians
sample glucose
disease
patient1 5.6 diabetes
patient2 7.8 diabetes
patient3 12.3 diabetes
step script parameterMapping
step1 assessRisk.sh sample=user_sample; glucose=user_glucose
step2 report.sh dis=user_disease;risk=step1_risk
#input sample #input glucose #output risk if ((10 < $glucose)); then risk+=("yes"); else risk+=("no"); fi
assessRisk.sh
#list risk #string dis nRisk=0 for r in $risk; do if [[ "yes"="$r" ]]; then ((nRisk++)); fi; done echo "Fraction of samples with $dis risk:" echo "scale=2;$nRisk/${#risk[*]}" | bc
report.sh
#input glucose #output risk if (( 10<$glucose )); then class=“yes”; else class=“no”; fi
assessRisk_0.sh #input glucose #output risk if (( 10<$glucose )); then class=“yes”; else class=“no”; fi
assessRisk_1.sh #input glucose #output risk if (( 10<$glucose )); then class=“yes”; else class=“no”; fi asssesRisk_2.sh
#input exp #input list risk nRisk=0 for r in $risk; do if (( “yes”=“$r” )); then nRisk++; fi; echo “Risk in experiment $exp:” echo “scale=2;$nRisk/${#risk[*]}” | bc
report_0.sh
workflow.csv 1. Design
parameters.csv
2. Parameters 3. Run + logs
Data API – to deal with all data modalities
• F: Java, REST or R build on Observ-OM format and model • B: Excel, csv, database, index and custom (VCF) formats
JPA repo
Mongo repo
Indexing Service
Specific repo
(VCF,plink)
Spreadsheet repo (Excel,csv)
Various Repositories
molgenis-data
Generic/ Specific
http://github.com/molgenis/molgenis
VCF / PED backend
https://github.com/molgenis/systemsgenetics
Self-describing file format (anything you want)
Patient Submit1
Patient Height Weight BMI .. LL_123041 176 68 25 .. LL_123042 163 62 23 .. LL_123043 188 75 25 .. LL_123044 180 60 23 .. LL_123045 165 106 32 .. .. .. .. .. ..
Name Sex … LL_123041 M … LL_123042 F … LL_123043 M … LL_123044 F … LL_123045 F … ..
Feature
name description unit_name dataType Patient Patient observed ref Patient Height Height standing up by nurse cm decimal Weight Weight on digital floor scale by nurse kg decimal BMI Body Mass Index kg/cm^2 decimal .. .. .. ..
(d) (c)
(b) Protocol
name features .. general • Height
• Weight • BMRI • …
(a)
eDAS + Genome Browser (collab. with U Leic)
http://github.com/molgenis/molgenis
Any data having ‘positions’ will have genome browser
Annotation/integration wizards (NGS/PM)
http://github.com/molgenis/molgenis
Extensible (ws, cmd, script)
Thanks! • NFU – Towards program research data infrastructure
• VKGL – Towards data sharing for diagnostics
• UMCG – Some hooks for future collaboration