Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML,...
Transcript of Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML,...
Data Mining at VIB
Alexander Botzki, BITS Sven Degroeve, Proteomics Saskia Lippens, BioImaging Core Yvan Saeys, DaMBI
Challenging data types @VIB Structured data types
More structured than traditional vectorial representation of data
– Sequences
– Chemical structures
– Fragmentation spectra
– (3D) Images
– Numerical data
– sometimes multiple of these collected over time
Often specific techniques are needed to exploit all structural dependencies
Data types Next-generation sequencing (NGS) data
Whole genome sequencing, RNA-seq
Generates massive amounts of short DNA/RNA sequences
Algorithms: alignment, mapping
Data Types Fragmentation spectra
Compounds are iteratively scattered by mass spectrometry, resulting in a fragmentation tree
– Hierarchical structure of spectra
Used in proteomics, metabolomics
Data Types Imaging 2D - 3D - XD
Most extreme example: Image: 5000x5000 to 10000x10000 pixels (50 to 190 MB)
Dataset: 100 to 2000 slides (5 to 380 GB )
1 week : 10 datasets (50 to 1 TB )
High Content Screening Image-based phenotyping
Data Types Flow cytometry
VIB Databases and Tools
pE-DB
SNPEffect
LNCipedia PLAZA
PeptideShaker MS2PIP
CP-DT
ConTra2 Physbinder DynaMine
Documentation and Meta-Data Domain-specific
Unlocking Data via Data Integration
CORNET EVEX
BioGraph
Developed @ VIB
Commercial Solutions offered to VIB scientists
Standardization and Ontologies
Pistoia Alliance Debates webinar: Ontologies as the glue
for knowledge management: use cases and challenges
TraML,
qcML, jqcML
PRIDE Converter 2
Data resources provided by EMBL-EBI
Genes, genomes & variation
ArrayExpress
Expression Atlas
Metabolights
PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Literature &
ontologies
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
1000 Genomes
Gene, protein & metabolite expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels
Enzyme Portal
BioSamples
Ensembl
Ensembl Genomes
European Genome-phenome Archive
Metagenomics portal
Complex data landscape aka federations of bio-nations
‘Too much data’, ‘too many applications’, ‘you need a PhD in IT to use this stuff’,
‘What does this really mean to my project?’
Data tombs
Bioinformatics Training
More info at http://www.bits.vib.be/training
Basic bioinformatics Statistics
Omics data analysis Programming & IT
Contact
Address
VIB BITS
Rijvisschestraat 126 3/R
9052 Gent - Belgium
Email: [email protected]
Tel: +32 (0)9 248 16 00
Thank you!
• Bioinformatics Training
• Scientific Software Support
• Research Informatics Solutions
• Bioinformatics Data Support