Data-Intensive Computing Symposium Data-Intensive Computing Symposium: Report Out Phillip B. Gibbons...

download Data-Intensive Computing Symposium Data-Intensive Computing Symposium: Report Out Phillip B. Gibbons Intel Research Pittsburgh

of 36

  • date post

    11-Jan-2016
  • Category

    Documents

  • view

    218
  • download

    5

Embed Size (px)

Transcript of Data-Intensive Computing Symposium Data-Intensive Computing Symposium: Report Out Phillip B. Gibbons...

Data-Intensive Computing Symposium*
Sponsored by:
Yahoo! Research
Computing Community Consortium supports the computing research community in creating compelling research visions and the mechanisms to realize these visions (http://www.cra.org/ccc/)
~100 invited attendees, ~12 invited talks
Slides and video to be posted on CCC web site
*
DISC has been renamed
*
*
*
*
*
Big Data: Should focus more on the user experience
How to manage resources
Cloud computing can help organically orchestrate resources on demand
Initiative to bring academics, business, and users together under the big data problem ( PCAST NITRD review )
Phillip B. Gibbons, Data-Intensive Computing Symposium
*
Broad has 4.8K processors, 1.4 PBs storage on site
Big Data Problem: Mining genome expression arrays
Row: patients; Column: genes, Value: expression values
Example: classify leukemias based on expression arrays
Solved by grad student over the weekend using web sources
Challenge: Computation/Analysis/Provenance infrastructure needed
Usable by biologists
*
Petascale Data Storage Institute
Understanding disk failures, cfdr.usenix.org
Another local speaker, so I’ll skip in interest of time
Phillip B. Gibbons, Data-Intensive Computing Symposium
*
Phillip B. Gibbons, Data-Intensive Computing Symposium
*
*
*
*
*
*
*
*
Iraq war authorization protest
Phillip B. Gibbons, Data-Intensive Computing Symposium
*
*
*
Scalable Hyperlink Store: used internally within MSR, for web graphs
Query-dependent link-based ranking algorithm (HITS, SALSA)
Phillip B. Gibbons, Data-Intensive Computing Symposium
*
Hardware revolution: datacenters/virtualization, many-core
Why “What”:
Rapid prototyping
Allow optimization and adaptability
*
*
Evita Raced: Overlog Metacompiler (compiler is written declaratively)
matches datalog optimizations (dynamic prog.), cycle tests
Datalog with known extensions and tweaks
Centrality of Rendezvous & graphs
Phillip B. Gibbons, Data-Intensive Computing Symposium
*
Phillip B. Gibbons, Data-Intensive Computing Symposium
*
*
*
Phillip B. Gibbons, Data-Intensive Computing Symposium
*
*
*
Memory hierarchy issues:
where the (intermediate) data are at, over the course of the computation
Pervasive multimedia sensing:
processing & querying must be pushed out of the data center to where the sensors are at
Focus of this talk:
Phillip Gibbons (Intel Research)
I know where it’s at, man!
Phillip B. Gibbons, Data-Intensive Computing Symposium
*
Hierarchy-savvy:
Sweet-spot between ignorant
and fully aware
architectural features, etc. to realize the model
Important component: fine-grain threading
computation for parallel algorithm design
Phillip B. Gibbons, Data-Intensive Computing Symposium
*
*
Allocate in units of rack weeks
NSF will review proposals for use: Cluster Exploratory (CluE)
Running Xen; Won’t open up performance monitoring
Goal: Show applicable outside of computer science
Academic-Industry-Government partnership
*
Collection of ~20 people (looking for volunteers)
Goals:
Building community