Massive Data Computing

18
Massive Data Computing Symposium on Application Accelerators in HPC July 28, 2009 Pradeep K. Dubey IEEE Fellow and Director of Throughput Computing Intel Corporation

Transcript of Massive Data Computing

Page 1: Massive Data Computing

Massive Data Computing

Symposium on Application Accelerators in HPC

July 28, 2009

Pradeep K. Dubey

IEEE Fellow and Director of Throughput Computing

Intel Corporation

Page 2: Massive Data Computing

Who Needs Acceleration

7/28/[email protected]

Traditional drivers of compute

• Norman’s Gulf: Quest for natural human-machine interface

• Entertainment: Unending fascination with virtual and unreal

• The data deluge: The problem of drinking out of fire hydrant

• Real-time analytics: Decision delayed is objective denied

• Curious minds want to know (HPC): Science moves on!

Recent catalysts of compute

• Changing demographics of computer users

• Massive compute meets massive data

• Connected computing

Page 3: Massive Data Computing

Norman’s Gulf

7/28/[email protected]

Human’s Conceptual ModelComputer’s Simulated Model

Evaluation Gap

Execution Gap

Page 4: Massive Data Computing

Decomposing Compute-Intensive Apps

7/28/[email protected]

Photo-real Synthesis

Real-world animation

Ray tracing

Global Illumination

Behavioral Synthesis

Physical simulation

Kinematics

Emotion synthesis

Audio synthesis

Video/Image synthesis

Document synthesis

Multimodal event/object Recognition

Statistical Computing

Machine Learning

Clustering / Classification

Model-based:

Bayesian network/Markov Model

Neural network / Probability networks

LP/IP/QP/Stochastic OptimizationSynthesis

Large dataset mining

Semantic Web/Grid Mining

Streaming Data MiningDistributed Data Mining

Content-based Retrieval

Collaborative Filters

Multidimensional Indexing

Dimensionality Reduction

Dynamic Ontologies

Efficient access to large, unstructured, sparse datasets

Stream Processing

Mining

Indexing

Recognition

(Modeling)

Streaming

Graphics

Human’s

Conceptual Model

Computer’s

Simulated Model

EvaluationExecution

Mining or Synthesis Quality Determines Model Goodness

Page 5: Massive Data Computing

Interactive RMS Loop

5Feb 7, 2007 Pradeep K. Dubey [email protected]

Find an existing

model instance

What is …? What if …?Is it …?

Recognition Mining Synthesis

Create a new

model instanceModel

5

Most RMS apps are about enabling interactive (real-time) RMS Loop (iRMS)

7/28/[email protected]

Page 6: Massive Data Computing

Visual and Analytics Computing

6 7/28/[email protected]

Model Physical

SimulationRendering

Data Acquisition and

Mining Based

Procedural or

Analytical

Visual Computing (Graphics and Vision)

Model Real-time Indexing What-if evaluation

Summary synthesis

Procedural or

Analytical

Non-Visual Computing (Search and Analytics)

Data Mining Based

(Crawling)

Growing Importance of Data Driven Models

Page 7: Massive Data Computing

Massive Data & Ubiquitous Connectivity

7/28/[email protected]

Data-driven models are now tractable and usable

We are not limited to analytical models any more

No need to rely on heuristics alone for unknown models

Massive data offers new algorithmic opportunities

Many traditional compute problems worth revisiting

Web connectivity significantly speeds up model-

training

Real-time connectivity enables continuous model

refinement

Poor model is an acceptable starting point

Classification accuracy improves over time

Page 8: Massive Data Computing

Image Completion

8

Scene completion using millions of photographs James Hays, Alexei A.

Efros, Siggraph 2007, Also, October 2008 Communications of the ACM,

Volume 51 Issue 10

“Dramatic improvement when moving from ten thousand to one million images”

“Brute force many currently unsolvable vision and graphics problems!” For example: “Feasibility of sampling from the entire space of scenes as a way of

exhaustively modeling our visual world”

7/28/[email protected]

Page 9: Massive Data Computing

Language Translation

9

Rule-based system exceeds human performance in a structured, deterministic domain

[Kasparov vs. Deep Blue]

1997

• Statistical inference (not rules)• 100s of TB of training data• Racks of computation

[Google MT wins NIST contest]

Today

Newcomer Google beats decades of rule-based translation research with a language-unaware statistical approach to MT

http://www.nature.com/news/2006/061106/full/news061106-6.html

7/28/[email protected]

Page 10: Massive Data Computing

Semantic Search

10

Google Rolls Out Semantic Search

http://www.pcworld.com/businesscenter/article/161869/google_rolls_out_semant

ic_search_capabilities.html

"What we're seeing actually is that with a lot of data, you ultimately

see things that seem intelligent even though they're done through

brute force," she said. "Because we're processing so much data, we

have a lot of context around things like acronyms. Suddenly, the

search engine seems smart, like it achieved that semantic

understanding, but it hasn't really. It has to do with brute force. That

said, I think the best algorithm for search is a mix of both brute-force

computation and sheer comprehensiveness and also the qualitative

human component."

-- Marissa Mayer, VP of Search Products, Google

7/28/[email protected]

Page 11: Massive Data Computing

Mind Reading

11

How Technology May Soon "Read" Your Mind http://www.cbsnews.com/stories/2008/12/31/60minutes/main4694713.shtml

“… Tom Mitchell at Carnegie Mellon University have done is combine fMRI's

ability to look at the brain in action with computer science's new power to sort

through massive amounts of data. The goal: to see if they could identify exactly

what happens in the brain when people think specific thoughts.”

7/28/[email protected]

Page 12: Massive Data Computing

3D Content Creation for Amateurs

7/28/[email protected]

Send images to a

desktop/server

3D model creation from

images

Camera recovery resultsRough model by fast

creation

Refined model by dense

creationTextured 3D model

Outputs

Page 13: Massive Data Computing

Nested RMS

13

What is …? What if …?Is it …?

Recognition Mining Synthesis

Visual Input

Streams

Synthesized

Visuals

Learning &

Modeling

Graphics Rendering + Physical Simulation

Computer

Vision

Reality

Augmentation

Structured Data +

Unstructured

Blogs

Synthesized

Structures

Learning &

Filling Ontologies

Semantic Web Mining

Mining Structured

Augmentation

7/28/[email protected]

Page 14: Massive Data Computing

Nested RMS Instance: Virtual World

14

Outer loop:iRMS Visual Loop:One Per Real User

Inner Loop:iRMS Analytics LoopOne per bot per user

Performance Needs:Limited by input/output limits of

human perception

Performance Needs:Can far exceed typical i/o limits of

human perception

Shop Bots

Trade Bots

Chat Bots

Player Bots

Gambler Bots

Reporter Bots

7/28/[email protected]

Page 15: Massive Data Computing

Putting it all together

7/28/[email protected]

Sensory Immersion

Machine learningNeural networks

Probabilistic reasoning

Fuzzy logicBelief networks

Evolutionary computingChaos theory

Soft Computing

RenderingSimulation

collision detectionforce solver

global illumination…

Physics Soft Physics?

Dynamics Constraints ConstraintDynamics

Behavioral Immersion

Super Immersion

Outer loop:iRMS Visual Loop:One Per Real User

Inner Loop:iRMS Analytics LoopOne per bot per user Shop Bots

Trade Bots

Chat Bots

Player BotsGambler Bots

Performance Needs:Limited by input/output limits of

human perception

Performance Needs:Can far exceed typical i/o limits of

human perception

Reporter BotsComputational Requirements for Bridging Norman’s Gulf Are Huge!

Page 16: Massive Data Computing

Where is my computer

16

Massive Data Analytics

(Cloud of Servers)

Intersection of massive data with massive compute

real-time analytics, massive data mining-learning

Visual Computing

(Clients)

Private data, sensory inputs, streams/feeds

immersive 3D graphics output, interactive visualization

7/28/[email protected]

Architectural Implications Are Radical!

Page 17: Massive Data Computing

Summary

7/28/[email protected]

Connected Computing

• It’s all about three C’s:

• Content – Connect -- Compute

Algorithmic Opportunity

• Massive data approach to traditional compute problems

• Data … data everywhere, … not a bit of sense …

Architectural Challenge

• Distributed decomposition

• Feeding the Beast: Data management challenge

• Programming barrier

Page 18: Massive Data Computing

Thank You!

Questions?

7/28/[email protected]