Cloud hpc-bigdata-challenges

29

Transcript of Cloud hpc-bigdata-challenges

Page 2: Cloud hpc-bigdata-challenges
Page 3: Cloud hpc-bigdata-challenges
Page 4: Cloud hpc-bigdata-challenges

IT

PAC

Page 5: Cloud hpc-bigdata-challenges
Page 6: Cloud hpc-bigdata-challenges

Melbourne

Sydney

Brazil

Beijing

Page 7: Cloud hpc-bigdata-challenges

Programming tools: Scala, IPython, Azure ML, …

Frameworks: Spark, Hadoop, Yarn, HDInsight, Reef, Twister, Brisk

Software Defined Storage

Software Defined Networks

Hardware Abstraction/Virtualization

Page 8: Cloud hpc-bigdata-challenges

http://tce.technion.ac.il/files/2012/06/Scott-shenker.pdf

www.opennetsummit.org/pdf/2013/presentations/albert_greenberg.pdf

http://www.cs.princeton.edu/~jrex/papers/pyretic-login13.pdf

Page 9: Cloud hpc-bigdata-challenges
Page 10: Cloud hpc-bigdata-challenges

The Science Perspective

Page 12: Cloud hpc-bigdata-challenges

Last

few decades

Thousand

years ago

Today and the FutureLast few

hundred years

2

2

2.

3

4

a

cG

a

a

Simulation of

complex phenomena

Newton’s laws,

Maxwell’s equations…

Description of natural

phenomena

Unify theory, experiment and

simulation with large

multidisciplinary Data

Using data exploration and

data mining

(from instruments, sensors,

humans…)

Distributed Communities

Page 14: Cloud hpc-bigdata-challenges
Page 15: Cloud hpc-bigdata-challenges
Page 16: Cloud hpc-bigdata-challenges
Page 17: Cloud hpc-bigdata-challenges
Page 18: Cloud hpc-bigdata-challenges
Page 19: Cloud hpc-bigdata-challenges

Inputs (training data)

Labels

Hidden layers

Input dataDetected featuresMona Lisa

Page 20: Cloud hpc-bigdata-challenges
Page 21: Cloud hpc-bigdata-challenges

• The Genetic Causes of Disease

(David Heckerman)

• Wellcome Trust for a GWAS for a large

population

• Looking for causes for seven common

diseases (bipolar, r. arthritis, coronary,

hypertension, ….)

• Confounding is a problem. Needed a

new algorithm.

• Ran on Azure cloud using 35,000 cores

in 3 weeks.

Page 22: Cloud hpc-bigdata-challenges
Page 23: Cloud hpc-bigdata-challenges
Page 24: Cloud hpc-bigdata-challenges

Chameleon Cloud SDN

NIH data commons

Page 25: Cloud hpc-bigdata-challenges

Mesos

Tachyon

Docker Spark

Data Analytics and ML programming tools

Reef

Twister

Page 26: Cloud hpc-bigdata-challenges
Page 27: Cloud hpc-bigdata-challenges
Page 28: Cloud hpc-bigdata-challenges
Page 29: Cloud hpc-bigdata-challenges

• Many Examples

• The Challenge: sustainability Data

Acquisition &

modelling

Collaboration

and

visualisation

Analysis &

data mining

Dissemination

& sharing

Archiving and

preserving