By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA,...
Transcript of By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA,...
![Page 1: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/1.jpg)
By: Shrikant Gawande (Cloudera Certified )
![Page 2: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/2.jpg)
What is Big Data ?
For every 30 mins, a airline jet
collects 10 terabytes of sensor
data (flying time)
NYSE generates about one
terabyte of new trade data per
day to Perform stock trading
analytics to determine trends for
optimal trades.
![Page 3: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/3.jpg)
Facebook users spend 10.5 billion
minutes (almost 20,000 years) online
on the social network.
Facebook has an average of 3.2
billion likes and comments are posted
every day.
Facebook Example
![Page 4: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/4.jpg)
Twitter Example
Twitter has over 500 million registered users.
The USA, whose 141.8 million accounts represents
27.4 percent of all Twitter users, good enough to
finish well ahead of Brazil, Japan, the UK and
Indonesia.
79% of US Twitter users are more likely to
recommend brands they follow .
67% of US Twitter users are more likely to buy from
brands they follow .
57% of all companies that use social media for
business use Twitter.
![Page 5: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/5.jpg)
Hadoop is being used across industries
Industries using Hadoop
Source : Karmasphere
![Page 6: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/6.jpg)
Why to learn Big Data ?
![Page 7: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/7.jpg)
What Big Companies Have To Say ..
![Page 8: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/8.jpg)
Data Volume Is Growing Exponentially
Estimated Global Data Volume:
2011: 1.8 ZB
2015: 7.9 ZB
The world's information doubles every
two years
Over the next 10 years:
The number of servers worldwide
will grow by 10x
Amount of information managed by
enterprise data centers will grow by
50x
Number of “files” enterprise data
center handle will grow by 75x
Source: http://www.emc.com/leadership/programs/digital-
universe.htm,which was based on the 2011 IDC Digital Universe
Study
![Page 9: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/9.jpg)
IBM’s Definition
IBM’s definition –Big Data Characteristics
http://www-1.ibm.com/software/data/bigdata/
A collection of large and complex data sets which are difficult to process using common database
management tools or traditional data processing applications.
Big Data is the amount of data that is beyond the storage and the processing capabilities of a single
physical machine.
Data that has extra large volume, comes from variety of sources, variety of formats and comes at us
with a great velocity it normally referred as Big Data
![Page 10: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/10.jpg)
It’s more of unstructured Data than Structured Data
![Page 11: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/11.jpg)
A Traditional Approach Under Pressure
![Page 12: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/12.jpg)
Why Big Data ?
ERP
CRM Data ( few TBs)
Enterprise data
What Data We have been adding in last 3-4 Years
Customer Experience
Click Streams
Online Campaign
Banner Ads – capturing every click 100 n TBs
User Entered data
Search – In product search
Social media – to understand general sentiments
![Page 13: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/13.jpg)
Industry Use Cases Types of Data
Financial Services
New Account Risk Screens Text, Server Logs
Trading Risk Server Logs
Insurance Underwriting Geographic, Sensor, Text
Telecom
Call Details records (CDR) Machine, Geographic
Infrastructure Investment Machine, Server Logs
Real-Time Bandwidth Allocation Server Logs, Text, Social
Retail
360 Degree View of Customer ClickStream, Text
Localized, Personal Promotion Geographic
Website Optimization ClickStream
Manufacturing
Supply Chain and Logistics Sensor
Assembly Line Quality Assurance Sensor
Crowd sourced Quality Assurance Social
HealthCareUse Genomic in Medical Trials Structured
Monitor Patient Vitals in Real-Time Sensor
Pharmaceuticals
Recruit and Retain Patients for Drug Trails Social, Clickstream
Improve Prescription Adherence Social, Unstructured, Geographic
Oil and GasUnify Exploration and Production Data Sensor, Unstructured, Geographic
Monitor Rig Safety in Real Time Sensor, Unstructured
Common Business Applications
![Page 14: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/14.jpg)
How can we find products
that customers are interested
in BUT DON’T BUY ?
![Page 15: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/15.jpg)
Leveraging ALL Business Data
How to Extract Insights from 9TBs of Web Logs ?
How do you make
sense of this ?
![Page 16: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/16.jpg)
Leveraging ALL Business Data
How to Extract Insights from 9TBs of Web Logs ?
What users did when they come to our web site ?
Which product they viewed ?
Which product seen but not purchased ? Why ? New Offering based on past data?
In the First line User has seen some product by some particular ID ?
![Page 17: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/17.jpg)
Leveraging ALL Business Data
How to Extract Insights from 9TBs of Web Logs ? (Contd …
Visitor views 2nd
product- We want to do this not just for 1 customer but all the customers
![Page 18: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/18.jpg)
Hidden Treasure
Insight into data can provide Business Advantage.
Some key early indicators can mean Fortunes to Business.
More Precise Analysis with more data
New offerings to the customer
![Page 19: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/19.jpg)
Limitations of Existing Data Analytics Architecture
![Page 20: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/20.jpg)
Solution: A Combined Storage Computer Layer
![Page 21: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/21.jpg)
Differentiating factors
![Page 22: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/22.jpg)
Some of the Hadoop Users
![Page 23: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/23.jpg)
Why DFS ?
![Page 24: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/24.jpg)
What is Hadoop ?
Apache Hadoop is a framework that allows for the distributed processing
of large data sets across clusters of commodity computers using a simple
programming model.
It is an Open-source Data Management with scale-out storage &
distributed processing
![Page 25: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/25.jpg)
Hadoop Key Characteristics
![Page 26: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/26.jpg)
Hadoop History
![Page 27: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/27.jpg)
Hadoop Eco-System
![Page 28: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/28.jpg)
Hadoop Core Components
HDFS –Hadoop Distributed File System(Storage)
Distributed across “nodes”
Natively redundant
Name Node tracks locations.
MapReduce (Processing)
Splits a task across processors
“near” the data & assembles results
Self-Healing, High Bandwidth
Clustered storage
![Page 29: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/29.jpg)
Hadoop Core Components (contd.)
![Page 30: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/30.jpg)
HDFS Architecture
![Page 31: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/31.jpg)
Main Components of HDFS
NameNode
master of the system
maintains and manages the blocks which are
present on the DataNodes
DataNodes
slaves which are deployed on each machine and
provide the actual storage
responsible for serving read and write requests
for the clients
![Page 32: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/32.jpg)
NameNode and Datanode
![Page 33: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/33.jpg)
NameNode Meta Data
Meta-data in Memory
• The entire metadata is in main memory
• No demand paging of FS meta-data
Types of Metadata
• List of files
• List of Blocks for each file
• List of DataNode for each block
• File attributes, e.g. access time, replication factor
A Transaction Log
• Records file creations, file deletions. etc
![Page 34: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/34.jpg)
Storage : Name-Node and Data-Node.SProcessing : Job-Tracker and Task-Tracker.S
H1 H2 H3 H4
![Page 35: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/35.jpg)
Poll - 01
![Page 36: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/36.jpg)
Poll - 02
![Page 37: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/37.jpg)
Poll - 03
![Page 38: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/38.jpg)
Poll - 04
![Page 39: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/39.jpg)
Poll - 05
![Page 40: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/40.jpg)
![Page 41: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/41.jpg)
Hadoop Courses and its fees across
major training institutes…
![Page 42: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/42.jpg)
Hadoop Course fee at Cloudera
Cloudera Hadoop Training :
![Page 43: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/43.jpg)
Hadoop Course fee at HortonWorks and Edureka
$ 2,795 = Rs. 1,73,290
![Page 45: By: Shrikant Gawande · Twitter Example Twitter has over 500 million registered users. The USA, whose 141.8 million accounts represents 27.4 percent of all Twitter users, good enough](https://reader030.fdocuments.us/reader030/viewer/2022040717/5e23de283300aa0207648f21/html5/thumbnails/45.jpg)
Thank You …