Data Management Challenges in the Human Brain Project

12
Data Management Challenges in the Human Brain Project Thomas Heinis Data-Intensive Applications & Systems Lab Ecole Polytechnique Federale de Lausanne 1

description

Data Management Challenges in the Human Brain Project. Thomas Heinis Data-Intensive Applications & Systems Lab Ecole Polytechnique Federale de Lausanne. Human Brain Project. Goal: develop platform to simulate human brain! - PowerPoint PPT Presentation

Transcript of Data Management Challenges in the Human Brain Project

Page 1: Data Management Challenges in the Human Brain Project

Data Management Challenges in the Human Brain Project

Thomas HeinisData-Intensive Applications & Systems LabEcole Polytechnique Federale de Lausanne

1

Page 2: Data Management Challenges in the Human Brain Project

2

Human Brain Project

Neocortex Model

Single Neuron 3D Model

Goal: develop platform to simulate human brain!

10 year and 1 billion Euro project to integrate all available knowledge/data

Involves coordination of more than 200 interdisciplinary partners

Lead by Henry Markram Based on EPFL’s Blue

Brain Project

Page 3: Data Management Challenges in the Human Brain Project

3

Human Brain Project Workflow

Experimental Observations

Analysis

Visualization

Model Building

Simulation

Literature Research

Medical Records/Data

Page 4: Data Management Challenges in the Human Brain Project

4

Present Goal

# of Neurons 1M 86 BillionFull Brain

Model DataCoarse grained

270GB 27PB

Model DataFine grained (3D Mesh)

5.8TB 0.5EB

Data Deluge Medical Data from hundreds of sources Model Data alone:

Simulation Trace: potentially infinite

Page 5: Data Management Challenges in the Human Brain Project

5

Querying of Petascale Data Data Privacy/Anonymization Cloud Analytics Quick & efficient access to raw data Distributed Workflow Execution Provenance/Reproducibility Data Personalization

HBP Data Management Challenges

Page 6: Data Management Challenges in the Human Brain Project

6

Spatial Indexing

1K 10K 100K 1M0

5

10

15

20

25

30

Simulation Size [# of Neurons]

Mod

el S

ize

[Gi-

gaBy

tes]

2010

20082006

3D Spatial Range Query

Bottleneck in Spatial Analysis

Model Building Analysis

Applications:

3D Spatial Range Query

Page 7: Data Management Challenges in the Human Brain Project

c

Overlap Problem

7Overlap reduces performance, increases with level of detail

R-Trees: Hierarchy of Minimum Bounding Rectangles (MBR)

Overlap

Range Query

STEP 1: Index Construction STEP 2: Query Execution

Page 8: Data Management Challenges in the Human Brain Project

“FLAT”, A Two Phase Spatial Index*

2) CRAWLING: Retrieve remaining results by traversing the neighbors.

8

c1) SEEDING: Find any one object arbitrarily inside the query region.

Add reachabilityinformation during index construction

*joint work with Farhan Tauheed, Laurynas Biveinis and Anastasia Ailamaki

Page 9: Data Management Challenges in the Human Brain Project

9

50 100 150 200 250 300 350 400 4500

50

100

150

200

250

300 Hilbert R-TreeSTR R-TreePR-TreeFLAT

Dataset Density [Million of Elements per unit Volume]

Tim

e [s

econ

ds]

Performance Evaluation

Page 10: Data Management Challenges in the Human Brain Project

10

Blue Brain Project:Part of the toolset used every dayFebruary 2012: first 1 million neuron model indexedStill 5 orders of magnitude smaller than human brain

General Applicability:3D surface mesh models N-Body simulation data set

1K 10K 100K 1M0

5

10

15

20

25

30

Simulation Size [# of Neurons]

Mod

el S

ize

[Gi-

gaBy

tes]

Impact

2010

20082006

2012(270 GB)

Page 11: Data Management Challenges in the Human Brain Project

11

Querying of Petascale Data Data Privacy/Anonymization Cloud Analytics Quick & efficient access to raw data Distributed Workflow Execution Provenance/Reproducibility Data Personalization

HBP Data Management Challenges

Page 12: Data Management Challenges in the Human Brain Project

12

Thank you

Data Management in the Human Brain Project

Thomas Heinis