Big data intro.pptx

7
1.Introduction to Big Data 1

Transcript of Big data intro.pptx

Page 1: Big data intro.pptx

1.Introduction to Big Data

1

Page 2: Big data intro.pptx

Glance at Data in Modern Era

2

Page 3: Big data intro.pptx

Data Classification

Structured

• Granular Queryability

• Tables with rows & columns

• Eg: RDBMS like SQL

• Contribution: 5%

Semi-Structured

• Spectrum between Structured & Unstructured

• Contains tags, schema contained within the data

• Eg:XML,JSON,NO SQL

Unstructured

• Not Queryable

• Eg: Audio,Videos,Text,Images,E-Mail

• Contribution : 80%

3

Page 4: Big data intro.pptx

Overview of Big Data

• What?

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.

• Why?

i. Data sets so complex and huge that it becomes tough to process by making use of traditional data processing methods.

ii. Warrants innovative solutions for a variety of new and existing data to provide real business benefits.

• Where?

i. Analyse for insights that lead to better decisions and strategic business moves.

ii. Processing large volumes or wide varieties of data remains merely a technological solution unless it is tied to business goals and objectives.

iii. Larger operational efficiencies, reduced risk and cost reductions.

iv. Reveal patterns, trends and associations related to human behavior and interactions.

v. Better understand consumer habits and target marketing campaigns

4

Page 5: Big data intro.pptx

Characteristics of Big Data

While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The concept gained momentum in the early 2000’swhen industry analyst Doug Laney articulated it “

5

Big Data

Velocity

Variety

Veracity

Volume

Volume : Data will grow from 4.4 zettabytes today to

around 44 zettabytes.

Velocity: By 2020, about 1.7 megabytes of new

information will be created every second for every human

being on the planet.

Variety: Smart phones will be shipped – all packed with

sensors capable of collecting all kinds of data, not to

mention the data the users create themselves.

Page 6: Big data intro.pptx

4 V’s

Volume

Enormous amount of data generated by machines, networks and human interaction on systems like social media.

Velocity

The pace at which data flows in from sources like business processes, machines, networks and human interaction with things like social media sites, mobile devices, etc. The flow of data is massive and continuous.

Variety

Variety refers to the many sources and types of data both structured and unstructured. Now data comes in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. This variety of unstructured data creates problems for storage, mining and analyzing data.

Veracity

Refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed

6

Page 7: Big data intro.pptx

7