Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML,...

15
Data Mining at VIB Alexander Botzki, BITS Sven Degroeve, Proteomics Saskia Lippens, BioImaging Core Yvan Saeys, DaMBI

Transcript of Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML,...

Page 1: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

Data Mining at VIB

Alexander Botzki, BITS Sven Degroeve, Proteomics Saskia Lippens, BioImaging Core Yvan Saeys, DaMBI

Page 2: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed
Page 3: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

Challenging data types @VIB Structured data types

More structured than traditional vectorial representation of data

– Sequences

– Chemical structures

– Fragmentation spectra

– (3D) Images

– Numerical data

– sometimes multiple of these collected over time

Often specific techniques are needed to exploit all structural dependencies

Page 4: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

Data types Next-generation sequencing (NGS) data

Whole genome sequencing, RNA-seq

Generates massive amounts of short DNA/RNA sequences

Algorithms: alignment, mapping

Page 5: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

Data Types Fragmentation spectra

Compounds are iteratively scattered by mass spectrometry, resulting in a fragmentation tree

– Hierarchical structure of spectra

Used in proteomics, metabolomics

Page 6: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

Data Types Imaging 2D - 3D - XD

Most extreme example: Image: 5000x5000 to 10000x10000 pixels (50 to 190 MB)

Dataset: 100 to 2000 slides (5 to 380 GB )

1 week : 10 datasets (50 to 1 TB )

High Content Screening Image-based phenotyping

Page 7: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

Data Types Flow cytometry

Page 8: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

VIB Databases and Tools

pE-DB

SNPEffect

LNCipedia PLAZA

PeptideShaker MS2PIP

CP-DT

ConTra2 Physbinder DynaMine

Documentation and Meta-Data Domain-specific

Page 9: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

Unlocking Data via Data Integration

CORNET EVEX

BioGraph

Developed @ VIB

Commercial Solutions offered to VIB scientists

Page 11: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

Data resources provided by EMBL-EBI

Genes, genomes & variation

ArrayExpress

Expression Atlas

Metabolights

PRIDE

InterPro Pfam UniProt

ChEMBL ChEBI

Literature &

ontologies

Europe PubMed Central

Gene Ontology

Experimental Factor

Ontology

Molecular structures

Protein Data Bank in Europe

Electron Microscopy Data Bank

European Nucleotide Archive

1000 Genomes

Gene, protein & metabolite expression

Protein sequences, families & motifs

Chemical biology

Reactions, interactions &

pathways

IntAct Reactome MetaboLights

Systems

BioModels

Enzyme Portal

BioSamples

Ensembl

Ensembl Genomes

European Genome-phenome Archive

Metagenomics portal

Page 12: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

Complex data landscape aka federations of bio-nations

‘Too much data’, ‘too many applications’, ‘you need a PhD in IT to use this stuff’,

‘What does this really mean to my project?’

Page 13: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

Data tombs

Page 14: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

Bioinformatics Training

More info at http://www.bits.vib.be/training

Basic bioinformatics Statistics

Omics data analysis Programming & IT

Page 15: Data Mining at VIB - SIM-Flanders · for knowledge management: use cases and challenges TraML, qcML, jqcML PRIDE Converter 2 ... ChEMBL ChEBI Literature & ontologies Europe PubMed

Contact

Address

VIB BITS

Rijvisschestraat 126 3/R

9052 Gent - Belgium

Email: [email protected]

Tel: +32 (0)9 248 16 00

Thank you!

• Bioinformatics Training

• Scientific Software Support

• Research Informatics Solutions

• Bioinformatics Data Support