Big data

13
Big Data A big step towards innovation, competition and productivity

description

 

Transcript of Big data

Page 1: Big data

Big DataA big step towards innovation, competition and

productivity

Page 2: Big data

Contents

Big Data DefinitionBig Data Definition Example of Big DataExample of Big Data Big Data VectorsBig Data Vectors Cost ProblemCost Problem Importance of Big DataImportance of Big Data Big Data growthBig Data growth Some Challenges in Big DataSome Challenges in Big Data Big Data ImplementationBig Data Implementation

Page 3: Big data

Big Data DefinitionBig Data Definition

Big data is used to describe a massive volume of both Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's structured and unstructured data that is so large that it's difficult to process using traditional database and difficult to process using traditional database and software techniques.software techniques.

In most enterprise scenarios the data is too big or it In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity.moves too fast or it exceeds current processing capacity.

The term big data is believed to have originated with The term big data is believed to have originated with Web search companies who had to query very large Web search companies who had to query very large distributed aggregations of loosely-structured data.distributed aggregations of loosely-structured data.

Page 4: Big data

An Example of Big DataAn Example of Big Data

An example of big data might be petabytes (1,024 An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and customer contact center, social media, mobile data and so on). The data is typically loosely structured data that so on). The data is typically loosely structured data that is often incomplete and inaccessible.is often incomplete and inaccessible.

When dealing with larger datasets, organizations face When dealing with larger datasets, organizations face difficulties in being able to create, manipulate, and difficulties in being able to create, manipulate, and manage big data. Big data is particularly a problem in manage big data. Big data is particularly a problem in business analytics because standard tools and business analytics because standard tools and procedures are not designed to search and analyze procedures are not designed to search and analyze massive datasets.massive datasets.

Page 5: Big data

Big Data vectorsBig Data vectors

Page 6: Big data

Cost problemCost problem

Cost of processing 1 Petabyte of data with 1000 nodes?Cost of processing 1 Petabyte of data with 1000 nodes? 1 PB = 101 PB = 101515

B = 1 million gigabytes = 1 thousand B = 1 million gigabytes = 1 thousand terabytesterabytes

9 hours for each node to process 500GB at rate of 9 hours for each node to process 500GB at rate of 15MB/S15MB/S

15*60*60*9 = 486000MB ~ 500 GB15*60*60*9 = 486000MB ~ 500 GB 1000 * 9 * 0.34$ = 3060$ for single run1000 * 9 * 0.34$ = 3060$ for single run 1 PB = 1000000 / 500 = 2000 * 9 = 18000 h /24 = 750 1 PB = 1000000 / 500 = 2000 * 9 = 18000 h /24 = 750

DayDay The cost for 1000 cloud node each processing 1PBThe cost for 1000 cloud node each processing 1PB 2000 * 3060$ = 6,120,000$2000 * 3060$ = 6,120,000$

Page 7: Big data

Importance of Big DataImportance of Big Data

Government: In 2012, the Obama administration Government: In 2012, the Obama administration announced the Big Data Research and Development announced the Big Data Research and Development Initiative.Initiative.

84 different big data programs spread across six 84 different big data programs spread across six departments.departments.

Private Sector: Wal-Mart handles more than 1 million Private Sector: Wal-Mart handles more than 1 million customer transactions every hour, which is imported into customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes databases estimated to contain more than 2.5 petabytes of data.of data.

Facebook handles 40 billion photos from its user base.Facebook handles 40 billion photos from its user base. Falcon Credit Card Fraud Detection System protects 2.1 Falcon Credit Card Fraud Detection System protects 2.1

billion active accounts world-wide.billion active accounts world-wide. Science: Large Synoptic Survey Telescope will generate Science: Large Synoptic Survey Telescope will generate

140 Terabyte of data every 5 days.140 Terabyte of data every 5 days.

Page 8: Big data

Large Hardon Colider 13 Petabyte data produced in Large Hardon Colider 13 Petabyte data produced in 2010.2010.

Medical computation like decoding human Genome.Medical computation like decoding human Genome. Social science revolutionSocial science revolution New way of science (Microscope example)New way of science (Microscope example)

Page 9: Big data

Technology Player in this fieldTechnology Player in this field

GoogleGoogle Oracle Oracle Microsoft Microsoft IBM IBM HadaptHadapt NikeNike YelpYelp NetflixNetflix DropboxDropbox ZipdialZipdial

Page 10: Big data

Big Data growthBig Data growth

Page 11: Big data

Some Challenges in Big DataSome Challenges in Big Data

While big data can yield extremely useful information, it While big data can yield extremely useful information, it also presents new challenges with respect to :also presents new challenges with respect to :

How much data to store ?How much data to store ? How much this will cost ?How much this will cost ? Whether the data will be secure ? andWhether the data will be secure ? and How long it must be maintained ?How long it must be maintained ?

Page 12: Big data

Implementation of Big DataImplementation of Big Data

Platforms for Large-scale Data AnalysisPlatforms for Large-scale Data Analysis : : The Apache Software Foundations' Java-based Hadoop The Apache Software Foundations' Java-based Hadoop

programming framework that can run applications on programming framework that can run applications on systems with thousands of nodes; andsystems with thousands of nodes; and

The MapReduce software framework, which consists of The MapReduce software framework, which consists of a Map function that distributes work to different nodes a Map function that distributes work to different nodes and a Reduce function that gathers results and resolves and a Reduce function that gathers results and resolves them into a single value. them into a single value.

Page 13: Big data

Thank You!!Thank You!!

By:By:

Harshita RachoraHarshita Rachora

Trainee Software Consultant Trainee Software Consultant

Knoldus Software LLPKnoldus Software LLP