Data Management Challenges in the Human Brain Project
description
Transcript of Data Management Challenges in the Human Brain Project
Data Management Challenges in the Human Brain Project
Thomas HeinisData-Intensive Applications & Systems LabEcole Polytechnique Federale de Lausanne
1
2
Human Brain Project
Neocortex Model
Single Neuron 3D Model
Goal: develop platform to simulate human brain!
10 year and 1 billion Euro project to integrate all available knowledge/data
Involves coordination of more than 200 interdisciplinary partners
Lead by Henry Markram Based on EPFL’s Blue
Brain Project
3
Human Brain Project Workflow
Experimental Observations
Analysis
Visualization
Model Building
Simulation
Literature Research
Medical Records/Data
4
Present Goal
# of Neurons 1M 86 BillionFull Brain
Model DataCoarse grained
270GB 27PB
Model DataFine grained (3D Mesh)
5.8TB 0.5EB
Data Deluge Medical Data from hundreds of sources Model Data alone:
Simulation Trace: potentially infinite
5
Querying of Petascale Data Data Privacy/Anonymization Cloud Analytics Quick & efficient access to raw data Distributed Workflow Execution Provenance/Reproducibility Data Personalization
HBP Data Management Challenges
6
Spatial Indexing
1K 10K 100K 1M0
5
10
15
20
25
30
Simulation Size [# of Neurons]
Mod
el S
ize
[Gi-
gaBy
tes]
2010
20082006
3D Spatial Range Query
Bottleneck in Spatial Analysis
Model Building Analysis
Applications:
3D Spatial Range Query
c
Overlap Problem
7Overlap reduces performance, increases with level of detail
R-Trees: Hierarchy of Minimum Bounding Rectangles (MBR)
Overlap
Range Query
STEP 1: Index Construction STEP 2: Query Execution
“FLAT”, A Two Phase Spatial Index*
2) CRAWLING: Retrieve remaining results by traversing the neighbors.
8
c1) SEEDING: Find any one object arbitrarily inside the query region.
Add reachabilityinformation during index construction
*joint work with Farhan Tauheed, Laurynas Biveinis and Anastasia Ailamaki
9
50 100 150 200 250 300 350 400 4500
50
100
150
200
250
300 Hilbert R-TreeSTR R-TreePR-TreeFLAT
Dataset Density [Million of Elements per unit Volume]
Tim
e [s
econ
ds]
Performance Evaluation
10
Blue Brain Project:Part of the toolset used every dayFebruary 2012: first 1 million neuron model indexedStill 5 orders of magnitude smaller than human brain
General Applicability:3D surface mesh models N-Body simulation data set
1K 10K 100K 1M0
5
10
15
20
25
30
Simulation Size [# of Neurons]
Mod
el S
ize
[Gi-
gaBy
tes]
Impact
2010
20082006
2012(270 GB)
11
Querying of Petascale Data Data Privacy/Anonymization Cloud Analytics Quick & efficient access to raw data Distributed Workflow Execution Provenance/Reproducibility Data Personalization
HBP Data Management Challenges
12
Thank you
Data Management in the Human Brain Project
Thomas Heinis