The Rise of Big Data Science
-
Upload
gilad-barkan -
Category
Technology
-
view
121 -
download
4
description
Transcript of The Rise of Big Data Science
![Page 1: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/1.jpg)
GILAD BARKAN
The Rise of Big Data Science
![Page 2: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/2.jpg)
Big Data Science
Big Data
Data Scienc
e
Big Data
Science
![Page 3: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/3.jpg)
Big Data
Why ?What ?How ?
![Page 4: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/4.jpg)
Big Data
Why ?What ?How ?
![Page 5: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/5.jpg)
Why Big Data ?
It’s the flooded information era we live inIn a world where data is power, big data is
big power
![Page 6: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/6.jpg)
Why Big Data ?
Web 2.0
![Page 7: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/7.jpg)
Why should we care about Big Data ?
The big business opportunities Competitive fast moving marketplace
Capitalize on business opportunities before everyone else Existing channels to every person on the planet Maximizing revenues from customers Segment-of-1 - more personal customer
experiences
![Page 8: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/8.jpg)
Big Data
Why ?What ?How ?
![Page 9: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/9.jpg)
What is Big Data ?
Volume
Variety
Velocity
The 3 V’s
![Page 10: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/10.jpg)
What is Big Data ?
Volume
Variety
Velocity
The 3 V’s
![Page 11: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/11.jpg)
Big Data - Volume
![Page 12: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/12.jpg)
Big Data - Volume
Smartphone Users
Hours Spent Online
35Billion Hours
1Billion
+
Global Online
Population
2Billion
Big UsersMore Users, All the Time
![Page 13: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/13.jpg)
Big Data+
More Data
More Users
![Page 14: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/14.jpg)
What is Big Data ?
Volume
Variety
Velocity
The 3 V’s
![Page 15: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/15.jpg)
Heterogeneous sources of data Structured Unstructured
Tri
llio
ns
of
Gig
ab
ytes
(Zett
ab
ytes)
Text, Log Files, Click Streams, Blogs, Tweets, Audio, Video,
etc.
Big Data - Variety
Unstructured NoSQLTraditional Structured SQL
tables
5 KB / record
text
50 KB / record
images
1000 KB / image
Audio
5000 KB / song
video
700 MB / movie
Un/Semi-Structured Data
Structured Data
![Page 16: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/16.jpg)
What is Big Data ?
Volume
Variety
Velocity
The 3 V’s
![Page 17: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/17.jpg)
Big Data - Velocity
How the hell does Google return an answer in 0.28 seconds by looking at 4 Billion pages?
![Page 18: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/18.jpg)
Big Data - Velocity
Online Advertisement - Real Time Bidding (RTB)
![Page 19: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/19.jpg)
Big Data - Velocity
Recommendations
![Page 20: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/20.jpg)
Big Data
Why ?What ?How ?
![Page 21: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/21.jpg)
How is Big Data Handled ?
The challenge is huge Store, analyze and serve huge volume of variety
of data in high velocity
We can’t achieve this using a single machine, no matters how strong it is. Why? Expensive – stay tuned Load balancing requests
Outbrain serves 3,000 per second DG (MediaMind) serves 500K per second!!!
Not fault tolerant
![Page 22: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/22.jpg)
Distributing the Data
The Big Data Paradigms Shifts
Scale Up (Vertical)
SQL Server
Scale Out(Horizontal)
Volume
HDFS(GFS)
NodesHadoop Cluster
![Page 23: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/23.jpg)
Big Data –Reducing Costs
Hadoop is a 5 times cheaper infrastructure !!!TCO (purchase + maintenance) for 3 years per 300 TB:
75 nodes cluster = 1 M$DBMS server = 5 M$
![Page 24: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/24.jpg)
Big Data Paradigm Shift - Computing
MapReduce Computing Paradigm
Exploiting the distributed architecture for large scale computations in parallel
![Page 25: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/25.jpg)
MapReduce
“Hello MapReduce” – counting words
C W
5 the
0 Cow
2 quick
C W
7 the
1 Cow
0 quick
C W
9 the
1 Cow
3 quick
URL 1
URL 3
URL 2
C W
21 the
2 Cow
5 quick
MapReduc
e
+
Hadoop Cluster
Master
Mappers
Reducer
{𝑤 ,𝑐 }
{𝑤 ,𝑐 }
{𝑤 ,𝑐}
![Page 26: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/26.jpg)
Big Data Paradigm Shift – NoSQL
Schema-less databases to support the variety of dataComplex SQL queries (joins, etc.) in a distributed data
framework is extremely inefficient Key-Value Store NoSQL
Value Key
user_id
url
image_id
video_id
tables
text
images
video
anyAny – not single
primary as in SQL
Variety
![Page 27: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/27.jpg)
Big Data Paradigm Shift –
RAM-based DBs instead of traditional disk-based DBsStore critical data in memory (much more expensive)
If the data doesn't come to Alg - Alg will come to the data
Velocity
Alg
Read
traditional
Data
WriteAlg
Data
today
Read Write
![Page 28: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/28.jpg)
Big Data - Summary
![Page 29: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/29.jpg)
Big Data - Summary
BIG business opportunitiesThe 3 V’s: Volume, Variety, VelocityTechnological paradigm shifts
![Page 30: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/30.jpg)
Big Data Technological Paradigm Shifts
NoSQL
Value Key Scale up
Master
Mappers
Reducer
Scale Out
ReduceMap
Volume Variety
Velocity
Data
Alg
Data
Alg
![Page 31: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/31.jpg)
Big Data - Summary
BIG business opportunitiesThe 3 V’s: Volume, Variety, VelocityComputing and DB paradigm shiftsFlood of new (open source) technologies
![Page 32: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/32.jpg)
Flood of New Big Data Technologies
Open Source
![Page 33: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/33.jpg)
Big Data - Summary
BIG business opportunitiesThe 3 V’s: Volume, Variety, VelocityComputing and DB paradigm shiftsFlood of new (open source) technologiesIt’s definitely not just a buzz
![Page 34: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/34.jpg)
Big Buzz ?
![Page 35: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/35.jpg)
Big Data - Summary
BIG business opportunitiesThe 3 V’s: Volume, Variety, VelocityComputing and DB paradigm shiftsFlood of new (open source) technologiesIt’s definitely not just a buzz
It’s a real response to the world hectic paced evolution
reducing costs by order of magnitudeStill it doesn’t mean every business today will /
should transform its technology stack to support big data
![Page 36: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/36.jpg)
Big Data Science
Big Data
Data Scienc
e
Big Data
Science
![Page 37: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/37.jpg)
Data Science
Why ?What ?How ?
![Page 38: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/38.jpg)
Data Science
Why ?What ?How ?
![Page 39: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/39.jpg)
data scientist
s
Why Data Science ?
![Page 40: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/40.jpg)
Data is a real value
Facebook acquires Onavo for ~150M$
![Page 41: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/41.jpg)
Data Science
Why ?What ?How ?
![Page 42: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/42.jpg)
Welcome to the Intelligent world
Data Scienc
e
Data Analysis
Data Mining
Automatic Decisionin
g
Predictive
Analytics
Machine Learning
Data Analytics
![Page 43: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/43.jpg)
Data Miners are the New Gold Miners
![Page 44: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/44.jpg)
Search
![Page 45: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/45.jpg)
Online Advertisement - Real Time Bidding (RTB)
![Page 46: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/46.jpg)
Recommendations
Recommendations
![Page 47: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/47.jpg)
Text Analysis
![Page 48: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/48.jpg)
CRM – Customers Churn Prediction
![Page 49: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/49.jpg)
Time Series Analysis
![Page 50: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/50.jpg)
Machine Learning
ClassificationClusteringRegressionRecommendation
![Page 51: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/51.jpg)
Third PartyCharges
Pay Bill
Abnormal
fee
Classification
Amdocs Insight™ - why is the customer calling the Call Center ?
Bill too high
Overage
![Page 52: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/52.jpg)
Clustering
Market Segmentation Social Network
Analysis
![Page 53: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/53.jpg)
Regression
Housing price prediction
50 100 150 200 250
100
200
300
400
130
280
Size in m2
Price ($)in 1000’s 215
![Page 54: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/54.jpg)
The Data Scientist
![Page 55: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/55.jpg)
Data Scientist Skillset
Hands on tools,
languages, technologies
MsC / PhD in Math, CS,
Stats, Physics
Hands on the specific problem domain
![Page 56: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/56.jpg)
Data Science ≠ BI
Apply advanced statistical machine learning algorithms to: dig deeper to find patterns that traditional BI
tools may not reveal much wider domains / applications spectrum
Predictive Analytics ≠ Exploratory Analytics
![Page 57: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/57.jpg)
Exploratory AnalyticsBusiness Intelligence
Traditional BIExploratory Analytics
Big Data Science
Predictive Analytics Data Science Vs.
![Page 58: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/58.jpg)
Academia Response to Data Science
![Page 59: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/59.jpg)
Data Science
Why ?What ?How ?
![Page 60: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/60.jpg)
The Art of Data Science
We need at least one semester course for itStill…
![Page 61: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/61.jpg)
Data Science Life Cycle
Understand Data
Prepare Data
Model
Evaluate
Deploy
Monitor
Offline Data Analysis
Run Time
Business Goal
![Page 62: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/62.jpg)
Big Data
Data Scienc
e
Big Data
Science
Closing the Loop
Technically wise, what do you think? Is Big Data good or bad for Data Science ?
![Page 63: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/63.jpg)
The Bad - Finding a Needle in a Haystack
It’s the same treasure that hides – the problem is that the pile is now huge
Big Data Big Noise
![Page 64: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/64.jpg)
The Bad - Finding a Needle in a Haystack
It’s the same treasure that hides – the problem is that the pile is now huge
Big Data Big Noise
![Page 65: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/65.jpg)
The Good - The Statistical View
Statistics is predictive analytics’ fuel !The more data you have (Big Data) the
better your predictive models will perform
![Page 66: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/66.jpg)
Law of Large Numbers
![Page 67: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/67.jpg)
Law of Large Numbers
![Page 68: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/68.jpg)
Law of Large Numbers
![Page 69: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/69.jpg)
Law of Large Numbers
![Page 70: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/70.jpg)
Law of Large Numbers
![Page 71: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/71.jpg)
Law of Large Numbers
![Page 72: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/72.jpg)
Combining the Good & Bad
Data is a function of quality and quantity
Small Big
Low
High
Quantity
Quality
![Page 73: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/73.jpg)
Big Data Science - Summary
Big Data Big Numbers Big Opportunities Big Data is the buzziest technology nowadays
Data Scientists the ones that coax the treasures for their
companies, out of the big data Are multi-discipline skilled the new industry rock stars
![Page 74: The Rise of Big Data Science](https://reader033.fdocuments.us/reader033/viewer/2022052504/54c6ef124a7959c83b8b45a1/html5/thumbnails/74.jpg)
Thank You for your attention