Microsoft Big Data Essentials Module 1 - Introduction to Big Data
1.Big Data Introduction
Transcript of 1.Big Data Introduction
-
8/10/2019 1.Big Data Introduction
1/19
Big DataA SOFT INTRODUCTION OF BIG DATA
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
2/19
ContentsWhat is Big DataConventional Approaches
Problems with Conventional Approaches
Welcome to the world of Big Data
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
3/19
What is Big DataEvery day, world create 2.5 quintillion bytes of data so much that 90%of the data in the world today has been created in the last two yearsalone
Gartner defines Big Data as high volume , velocity and varietyinformation assets that demand cost-effective, innovative forms ofinformation processing for enhanced insight and decision making.
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
4/19
What is Big DataAccording to IBM, 80% of data captured today is unstructured , fromsensors used to gather climate information, posts to social media sites,digital pictures and videos, purchase transaction records, and cell phoneGPS signals, to name a few. All of this unstructured data is also Big Data.
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
5/19
Why Big DataHuge Competition in the Market: Retails Customer analytics Travel travel pattern of the customer Website Understand users navigation pattern, interest, conversion, etc
Sensors, satellite, geospatial Data
Military and intelligence
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
6/19
Essence of Big Data
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
7/19
VolumeToday we are living in the world of data. There are multiple factorscontributing in data growth
Huge volumes of data are generated from various sources:
Transaction based data (stored through years)Text, Images, Videos from Social Media
Increased amounts of data generated by sensors
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
8/19
VolumeTurn 12 terabytes of Tweets created each day into improved productsentiment analysis
Convert 350 billion annual meter readings to better predict powerconsumption
Turn billions of customer complaints to analyze root cause of customerchurn
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
9/19
VelocityAccording to Gartner, velocity "means both how fast data is beingproduced and how fast the data must be processed to meet demand."
Scrutinize 5 million trade events created each day to identify potentialfraud
Analyze customers searching/buying pattern and show themadvertisement of attractive offers in real time
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
10/19
Velocity (example)Take Googles example, about processing of the data:As soon as a blog is posted it comes into the search result.
If we search about traveling, shopping(electronics, apparels, shoes,
watch, etc.), job, etc. the relevant advertisement it provides us, whilebrowsing.
Even ads in the mail are highly content driven
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
11/19
VarietyData today comes in all types of formats from traditional databases tohierarchical data stores created by end users and OLAP systems, to textdocuments, email, meter-collected data, video, audio, stock ticker dataand financial transactions
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
12/19
VeracityBig Data Veracity refers to the biases, noise and abnormality in data. Isthe data that is being stored, and mined meaningful to the problembeing analyzed.
Veracity in data analysis is the biggest challenge when compares tothings like volume and velocity. Keep your data clean and processes tokeep dirty data from accumulating in your systems.
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
13/19
Conventional ApproachesStorage RDBMS (Oracle, DB2, MySQL, etc.) OS Filesystem
Processing SQL Queries Custom framework
C/C++
Python/Perl
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
14/19
Why Big Data TechnologiesConventional Approaches/Technologies are not able to solve currentproblems
They are good for certain use-cases
But they cannot handle the data in the range of peta-bytes
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
15/19
Problems with Conventional
ApproachesLimited Storage capacityLimited Processing capacity
No scalability
Single point of failure
Sequential Processing
RBMSs can handle structured data
Requires preprocessing of dataInformation is collected according to current business needs
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
16/19
Limited Storage capacityInstalled on single machineHave specified storage limits
Requires to archive the data again and again
Problems of reloading data back to the repository, according to thebusiness needs
Only process the data that can be stored on a single machine
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
17/19
Limited Processing capacityInstalled on single machineHave specified processing limits
Have certain no of processing elements (CPUs)
Not able to process the large amount of data efficiently
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
18/19
No scalabilityOne of biggest limitations of conventional RDBMs, is the no scalabilityWe cannot add more resources on the fly
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION
-
8/10/2019 1.Big Data Introduction
19/19
Thank You
COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION