Massive Data Computing
Transcript of Massive Data Computing
Massive Data Computing
Symposium on Application Accelerators in HPC
July 28, 2009
Pradeep K. Dubey
IEEE Fellow and Director of Throughput Computing
Intel Corporation
Who Needs Acceleration
7/28/[email protected]
Traditional drivers of compute
• Norman’s Gulf: Quest for natural human-machine interface
• Entertainment: Unending fascination with virtual and unreal
• The data deluge: The problem of drinking out of fire hydrant
• Real-time analytics: Decision delayed is objective denied
• Curious minds want to know (HPC): Science moves on!
Recent catalysts of compute
• Changing demographics of computer users
• Massive compute meets massive data
• Connected computing
Norman’s Gulf
7/28/[email protected]
Human’s Conceptual ModelComputer’s Simulated Model
Evaluation Gap
Execution Gap
Decomposing Compute-Intensive Apps
7/28/[email protected]
Photo-real Synthesis
Real-world animation
Ray tracing
Global Illumination
Behavioral Synthesis
Physical simulation
Kinematics
Emotion synthesis
Audio synthesis
Video/Image synthesis
Document synthesis
Multimodal event/object Recognition
Statistical Computing
Machine Learning
Clustering / Classification
Model-based:
Bayesian network/Markov Model
Neural network / Probability networks
LP/IP/QP/Stochastic OptimizationSynthesis
Large dataset mining
Semantic Web/Grid Mining
Streaming Data MiningDistributed Data Mining
Content-based Retrieval
Collaborative Filters
Multidimensional Indexing
Dimensionality Reduction
Dynamic Ontologies
Efficient access to large, unstructured, sparse datasets
Stream Processing
Mining
Indexing
Recognition
(Modeling)
Streaming
Graphics
Human’s
Conceptual Model
Computer’s
Simulated Model
EvaluationExecution
Mining or Synthesis Quality Determines Model Goodness
Interactive RMS Loop
5Feb 7, 2007 Pradeep K. Dubey [email protected]
Find an existing
model instance
What is …? What if …?Is it …?
Recognition Mining Synthesis
Create a new
model instanceModel
5
Most RMS apps are about enabling interactive (real-time) RMS Loop (iRMS)
7/28/[email protected]
Visual and Analytics Computing
6 7/28/[email protected]
Model Physical
SimulationRendering
Data Acquisition and
Mining Based
Procedural or
Analytical
Visual Computing (Graphics and Vision)
Model Real-time Indexing What-if evaluation
Summary synthesis
Procedural or
Analytical
Non-Visual Computing (Search and Analytics)
Data Mining Based
(Crawling)
Growing Importance of Data Driven Models
Massive Data & Ubiquitous Connectivity
7/28/[email protected]
Data-driven models are now tractable and usable
We are not limited to analytical models any more
No need to rely on heuristics alone for unknown models
Massive data offers new algorithmic opportunities
Many traditional compute problems worth revisiting
Web connectivity significantly speeds up model-
training
Real-time connectivity enables continuous model
refinement
Poor model is an acceptable starting point
Classification accuracy improves over time
Image Completion
8
Scene completion using millions of photographs James Hays, Alexei A.
Efros, Siggraph 2007, Also, October 2008 Communications of the ACM,
Volume 51 Issue 10
“Dramatic improvement when moving from ten thousand to one million images”
“Brute force many currently unsolvable vision and graphics problems!” For example: “Feasibility of sampling from the entire space of scenes as a way of
exhaustively modeling our visual world”
7/28/[email protected]
Language Translation
9
Rule-based system exceeds human performance in a structured, deterministic domain
[Kasparov vs. Deep Blue]
1997
• Statistical inference (not rules)• 100s of TB of training data• Racks of computation
[Google MT wins NIST contest]
Today
Newcomer Google beats decades of rule-based translation research with a language-unaware statistical approach to MT
http://www.nature.com/news/2006/061106/full/news061106-6.html
7/28/[email protected]
Semantic Search
10
Google Rolls Out Semantic Search
http://www.pcworld.com/businesscenter/article/161869/google_rolls_out_semant
ic_search_capabilities.html
"What we're seeing actually is that with a lot of data, you ultimately
see things that seem intelligent even though they're done through
brute force," she said. "Because we're processing so much data, we
have a lot of context around things like acronyms. Suddenly, the
search engine seems smart, like it achieved that semantic
understanding, but it hasn't really. It has to do with brute force. That
said, I think the best algorithm for search is a mix of both brute-force
computation and sheer comprehensiveness and also the qualitative
human component."
-- Marissa Mayer, VP of Search Products, Google
7/28/[email protected]
Mind Reading
11
How Technology May Soon "Read" Your Mind http://www.cbsnews.com/stories/2008/12/31/60minutes/main4694713.shtml
“… Tom Mitchell at Carnegie Mellon University have done is combine fMRI's
ability to look at the brain in action with computer science's new power to sort
through massive amounts of data. The goal: to see if they could identify exactly
what happens in the brain when people think specific thoughts.”
7/28/[email protected]
3D Content Creation for Amateurs
7/28/[email protected]
Send images to a
desktop/server
3D model creation from
images
Camera recovery resultsRough model by fast
creation
Refined model by dense
creationTextured 3D model
Outputs
Nested RMS
13
What is …? What if …?Is it …?
Recognition Mining Synthesis
Visual Input
Streams
Synthesized
Visuals
Learning &
Modeling
Graphics Rendering + Physical Simulation
Computer
Vision
Reality
Augmentation
Structured Data +
Unstructured
Blogs
Synthesized
Structures
Learning &
Filling Ontologies
Semantic Web Mining
Mining Structured
Augmentation
7/28/[email protected]
Nested RMS Instance: Virtual World
14
Outer loop:iRMS Visual Loop:One Per Real User
Inner Loop:iRMS Analytics LoopOne per bot per user
Performance Needs:Limited by input/output limits of
human perception
Performance Needs:Can far exceed typical i/o limits of
human perception
Shop Bots
Trade Bots
Chat Bots
Player Bots
Gambler Bots
Reporter Bots
7/28/[email protected]
Putting it all together
7/28/[email protected]
Sensory Immersion
Machine learningNeural networks
Probabilistic reasoning
Fuzzy logicBelief networks
Evolutionary computingChaos theory
…
Soft Computing
RenderingSimulation
collision detectionforce solver
global illumination…
Physics Soft Physics?
Dynamics Constraints ConstraintDynamics
Behavioral Immersion
Super Immersion
Outer loop:iRMS Visual Loop:One Per Real User
Inner Loop:iRMS Analytics LoopOne per bot per user Shop Bots
Trade Bots
Chat Bots
Player BotsGambler Bots
Performance Needs:Limited by input/output limits of
human perception
Performance Needs:Can far exceed typical i/o limits of
human perception
Reporter BotsComputational Requirements for Bridging Norman’s Gulf Are Huge!
Where is my computer
16
Massive Data Analytics
(Cloud of Servers)
Intersection of massive data with massive compute
real-time analytics, massive data mining-learning
Visual Computing
(Clients)
Private data, sensory inputs, streams/feeds
immersive 3D graphics output, interactive visualization
7/28/[email protected]
Architectural Implications Are Radical!
Summary
7/28/[email protected]
Connected Computing
• It’s all about three C’s:
• Content – Connect -- Compute
Algorithmic Opportunity
• Massive data approach to traditional compute problems
• Data … data everywhere, … not a bit of sense …
Architectural Challenge
• Distributed decomposition
• Feeding the Beast: Data management challenge
• Programming barrier