PowerPoint Presentation · TechEd 2012 Keywords: TechEd 2012 Created Date: 3/4/2014 9:37:35 AM ...
Transcript of PowerPoint Presentation · TechEd 2012 Keywords: TechEd 2012 Created Date: 3/4/2014 9:37:35 AM ...
1080 ~ 2240
3Sources: The Economist, Feb ‘10; IDC
By 2016 the New Large Synoptic Survey Telescope in Chile will acquire 140 terabytes in 5
days - more than Sloan acquired in 10 years
In 2000 the Sloan Digital Sky Survey collected more data in its 1st week than was collected in
the entire history of Astronomy
The Large Hadron Collider at CERN generates 40 terabytes of data every second
Power Map for Excel is a three-
dimensional (3D) data
visualization tool for Excel 2013.
http://www.microsoft.com/en-us/powerbi
Big Datain Research
Microsoft Research ATL Europe, Munich
Marcel TillyProgram Manager
Big Data.
Sources: The Economist, Feb ‘10; DBMS2; Microsoft Corp
Cisco predicts that by 2013 annual internet traffic flowing will reach 667 exabytes
The Twitter community generates over 1 terabyte of tweets every day
Bing ingests > 7 petabyte a month
Talks• From Text to Entities and from Entites to Insight: a Perspective on
Unstructured Big Data
• Querying and Exploring Big Brain Data
• Big Data with Stratosphere
• SCOPE: Parallel Databases Meet MapReduce
• Online Data Processing with S4 and Omid
• Predictable Data Centers
• From Terabytes to Megabytes: Finding the Needle by Shrinking the
Haystack
• Incremental, Iterative, and Interactive Computation using
Differential Dataflow
• Big Data on Small Machines
• Graphs and Linear Measurements
• Partitioning & Clustering Big Graphs
• Online Team Formation in Social Networks
• Big Data and Enterprise Analytics
• Streaming Verification of Outsourced Computation
• Big Data Analytics: A Happy Marriage of Systems and Theory?
• Fast Algorithms for Perfect Matchings in Regular Bipartite Graphs
• Cuts, Trees, and Electrical Flows
• Neighborhood Sampling for Estimating Local Properties on a
Graph Stream
• What Can't We Compute on Data Streams?
• Querying Big, Dynamic, Distributed Datahttp://research.microsoft.com/en-
US/events/bda2013/default.aspx
Scope
We witness a rapid development of the
research and technology for efficient
processing of big data. There is a surge of
commercial and open source platforms for big
data analytics, including platforms for querying
of massive datasets, batch processing, real-time
analytics, streaming computations, iterative
computations, graph data processing, and
distributed machine learning.
Database queries
How can we efficiently resolve database queries on massive
amounts of input data?
Here the input data may be presented in the form of a distributed
data stream.
Machine learning
How can we efficiently solve large-scale machine learning problems?
Here the input data may be massive, stored in a distributed cluster of
machines.
Distributed computing
How can we efficiently solve large-scale optimization problems in
distributed computing environments? For example, how can we
efficiently solve large-scale combinatorial problems, e.g. processing of
large scale graphs?
0
2
RedFIR® is unrivaled worldwide as a tool
for analyzing performance in team sports,
making it possible to objectively analyze
games and assess players against a
consistent set of criteria.
http://www.orgs.ttu.edu/debs2013/index.php?goto=cfchal
lengedetails
“How to Fit when No One Size Fits”, Lim and al, CIDR 13