1.Big Data Introduction

download 1.Big Data Introduction

of 19

Transcript of 1.Big Data Introduction

  • 8/10/2019 1.Big Data Introduction

    1/19

    Big DataA SOFT INTRODUCTION OF BIG DATA

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    2/19

    ContentsWhat is Big DataConventional Approaches

    Problems with Conventional Approaches

    Welcome to the world of Big Data

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    3/19

    What is Big DataEvery day, world create 2.5 quintillion bytes of data so much that 90%of the data in the world today has been created in the last two yearsalone

    Gartner defines Big Data as high volume , velocity and varietyinformation assets that demand cost-effective, innovative forms ofinformation processing for enhanced insight and decision making.

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    4/19

    What is Big DataAccording to IBM, 80% of data captured today is unstructured , fromsensors used to gather climate information, posts to social media sites,digital pictures and videos, purchase transaction records, and cell phoneGPS signals, to name a few. All of this unstructured data is also Big Data.

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    5/19

    Why Big DataHuge Competition in the Market: Retails Customer analytics Travel travel pattern of the customer Website Understand users navigation pattern, interest, conversion, etc

    Sensors, satellite, geospatial Data

    Military and intelligence

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    6/19

    Essence of Big Data

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    7/19

    VolumeToday we are living in the world of data. There are multiple factorscontributing in data growth

    Huge volumes of data are generated from various sources:

    Transaction based data (stored through years)Text, Images, Videos from Social Media

    Increased amounts of data generated by sensors

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    8/19

    VolumeTurn 12 terabytes of Tweets created each day into improved productsentiment analysis

    Convert 350 billion annual meter readings to better predict powerconsumption

    Turn billions of customer complaints to analyze root cause of customerchurn

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    9/19

    VelocityAccording to Gartner, velocity "means both how fast data is beingproduced and how fast the data must be processed to meet demand."

    Scrutinize 5 million trade events created each day to identify potentialfraud

    Analyze customers searching/buying pattern and show themadvertisement of attractive offers in real time

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    10/19

    Velocity (example)Take Googles example, about processing of the data:As soon as a blog is posted it comes into the search result.

    If we search about traveling, shopping(electronics, apparels, shoes,

    watch, etc.), job, etc. the relevant advertisement it provides us, whilebrowsing.

    Even ads in the mail are highly content driven

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    11/19

    VarietyData today comes in all types of formats from traditional databases tohierarchical data stores created by end users and OLAP systems, to textdocuments, email, meter-collected data, video, audio, stock ticker dataand financial transactions

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    12/19

    VeracityBig Data Veracity refers to the biases, noise and abnormality in data. Isthe data that is being stored, and mined meaningful to the problembeing analyzed.

    Veracity in data analysis is the biggest challenge when compares tothings like volume and velocity. Keep your data clean and processes tokeep dirty data from accumulating in your systems.

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    13/19

    Conventional ApproachesStorage RDBMS (Oracle, DB2, MySQL, etc.) OS Filesystem

    Processing SQL Queries Custom framework

    C/C++

    Python/Perl

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    14/19

    Why Big Data TechnologiesConventional Approaches/Technologies are not able to solve currentproblems

    They are good for certain use-cases

    But they cannot handle the data in the range of peta-bytes

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    15/19

    Problems with Conventional

    ApproachesLimited Storage capacityLimited Processing capacity

    No scalability

    Single point of failure

    Sequential Processing

    RBMSs can handle structured data

    Requires preprocessing of dataInformation is collected according to current business needs

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    16/19

    Limited Storage capacityInstalled on single machineHave specified storage limits

    Requires to archive the data again and again

    Problems of reloading data back to the repository, according to thebusiness needs

    Only process the data that can be stored on a single machine

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    17/19

    Limited Processing capacityInstalled on single machineHave specified processing limits

    Have certain no of processing elements (CPUs)

    Not able to process the large amount of data efficiently

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    18/19

    No scalabilityOne of biggest limitations of conventional RDBMs, is the no scalabilityWe cannot add more resources on the fly

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION

  • 8/10/2019 1.Big Data Introduction

    19/19

    Thank You

    COPYRIGHT CHIRAG AHUJA RESTRICTED CIRCULATION