Getting your head around big data

45
Getting your head around BIG

description

My talk on Big Data from Dallas Day of .NET 2014

Transcript of Getting your head around big data

Page 1: Getting your head around big data

Getting your head around

BIG Data

Page 2: Getting your head around big data

https://github.com/glennblockhttps://twitter.com/gblock

“I should be tweeting"

Page 3: Getting your head around big data

3

Make machine data accessible, usable and valuable to everyone.

Page 4: Getting your head around big data

Platform for Machine DataAny Machine Data

HA Indexes and Storage

Search and Investigation

Proactive Monitoring

Operational Visibility

Real-time Business Insights

CommodityServers

Online Services Web

Services

ServersSecurity GPS

Location

StorageDesktops

Networks

Packaged Applications

CustomApplicationsMessaging

TelecomsOnline

Shopping Cart

Web Clickstreams

Databases

Energy Meters

Call Detail Records

Smartphones and Devices

RFID

Page 5: Getting your head around big data

DATA

Page 6: Getting your head around big data

15,000 BC – PicturesLascaux, France

Page 7: Getting your head around big data

6000 BC – Symbols

Page 8: Getting your head around big data

3,500 BC – Language

Page 9: Getting your head around big data

1,275 BC – Papyrus

Page 10: Getting your head around big data

1st - 13th Century - Codex

Page 11: Getting your head around big data

13th Century – Movable type

Page 12: Getting your head around big data

15th Century – Printing press

Page 13: Getting your head around big data

19th to 20th century Babbage Analytical engine

Page 14: Getting your head around big data

1936 – Turing machine

Page 15: Getting your head around big data

1945 – ENIAC

Page 16: Getting your head around big data

1947 – The first bug

Page 17: Getting your head around big data

1977 - Arpanet

Page 18: Getting your head around big data

1990s Internet

Page 19: Getting your head around big data

Phones and Tablets

Page 20: Getting your head around big data

RFID

Page 21: Getting your head around big data

Cloud

Page 22: Getting your head around big data

Services

Page 23: Getting your head around big data

New consumer devices

23

Page 24: Getting your head around big data
Page 25: Getting your head around big data

90 percent of all the data in the world has been generated over the last two years

source: sciencedaily.com

Page 26: Getting your head around big data

Every day 2.5 quintillion bytes of data is generated

1 quintillion = 1 + 18 zeros!57.5 billion 32 GB iPads

source: storagenewsletter.com

Page 27: Getting your head around big data

2.7 zettabytes exist in the digital universe

1 zettabyte = 1 + 21 zeros!42zb = All human speech digitized

source: highscalability.com

Page 28: Getting your head around big data

How big is big?

Page 29: Getting your head around big data

That’s A LOT of data!

Page 30: Getting your head around big data

How do you harness it?

Page 31: Getting your head around big data

This is what big data is really about.

Page 32: Getting your head around big data

Asking questions andgetting answers

Page 33: Getting your head around big data

Massive amounts of data.

Machine generated

VOLUME

Page 34: Getting your head around big data

Data is coming from a multitude of sources

Mix of structured and un-structured (JSON, XML, CSV, Plain Text)

Need a way to store it and and query it

VARIETY

Page 35: Getting your head around big data

VARIETYLog filesActivity FeedsEmails

Device StreamsAudio FilesVideos

Page 36: Getting your head around big data

Data arrives at many different frequencies

Need to be able to process real time.

VELOCITY

Page 37: Getting your head around big data

Not all data that is stored is useful.

Need to identify the useful data

Need to wade through all the noise

VERACITY

Page 38: Getting your head around big data

SOLUTIONS

Page 39: Getting your head around big data

Map/Reducefunction map(String name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1)

function reduce(String word, Iterator partialCounts): // word: a word // partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum)

Page 40: Getting your head around big data

Hi scale and availability databases

Page 41: Getting your head around big data

Distributed processing of large datasets

Page 42: Getting your head around big data

Data Visualization and analysis

Page 43: Getting your head around big data

End to end tools

Page 44: Getting your head around big data

More information

www.mongodb.org www.memsql.com cassandra.apache.orghadoop.apache.org

www.tableausoftware.comwww.elasticsearch.orgsplunk.com

Page 45: Getting your head around big data

@gblock http://github.com/glennblock

http://www.flickr.com/photos/11812960@N04/4050576435