1 CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技 Big Data Philosophy Ben Jai...

Post on 17-Jan-2016

217 views 0 download

Transcript of 1 CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技 Big Data Philosophy Ben Jai...

1CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Big Data PhilosophyBen Jai

CEO, Hope Bay Technologies, Inc.

2CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

What is Big Data?

Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

Translation: DB2/Oracle/MSSQL/MySQL can’t handle it

3CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Difference from just Data?

Volume (amount of data) Velocity (speed of data in and out) Variety (range of data types and sources) Veracity (truthfulness of the data)

Translation: DB2/Oracle/MSSQL/MySQL can’t handle it

4CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Is that It?

Big, fast, unstructured, uncertain, so?

Conventional wisdom: I can solve the problem if my computer is 1000 times bigger and faster!• I have a question looking for an answer.

Unconventional wisdom: I can find the pattern if my data set is 1000 times bigger!• I have a data set looking for useful information.

5CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Example 1: Machine Translation

Conventional method: dictionary + grammar rules + exception database…• Good enough initially, but will not improve• Exceptions grow naturally and DB can’t follow

Statistical machine translation: compare same article in different languages and build statistical models• Need a LOT of data to see initial results but

will grow automatically without human knowledge of the languages

6CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

How big is BIG?

47.5

48.5

49.5

50.5

51.5

52.5

53.5

AE BLEU[%]

+weblm =219B words of web data!

BLEU% score – Arabic to English

7CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Example 2: Spell Check

Conventional method: compare words against dictionary and find minimal editing distance. But:• Proper names are not in the dictionary• Word may be in the dictionary but not what the

user wants• Can’t break ties

Google method: look at what others do• Britney Spears brittany spears, brittney spears,

britany spears, britny spears, briteny spears, britteny spears, brine spears, …

8CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

300 Ways to Spell Her Name

9CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Also Look at the Context

Kofi Annan is correct

Kofee Annan Kofi Annan

Kofee Shop Coffee Shop

10CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Example 3: Detecting Defaults

Conventional method: look at FICO score MIT method: use machine learning to look at

100’s of variables in customer activities

NTU professor used a similar method to reduce return rate in TV shopping channel

11CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Example 4: Who’s in Love?

12CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Amazing Results

Big Data System Architecture

Data Source

Data Source

Data Source

Data Source

Data Storage Platform

Data Analysis Platform

Data Analysis Application

13CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Levels of Big Data Processing

STORAGE is what you can ACCESS DATA is what you can PROCESS INFORMATION is what you can

UNDERSTAND INTELLIGENCE is what you can USE

14CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Amazing Results

Intelligence

Data Storage Platform

Data Analysis Platform

Data Storage

Data Analysis Tool

Data Analysis ApplicationData Information

Big Data System Components

Data Source

Data Source

Data Source

Data Source

15CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Who Can Use Big Data?

Those who owns a lot of data Those who can see the information Those who can convert the information

into intelligence Those who can monetize the intelligence

16CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

Who Can Make Money?

Those who owns a lot of data Those who can see the information Those who can convert the information

into intelligence Those who can monetize the intelligence

17CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

It’s Just the Beginning…

1952: UNIVAC I predicted the landslide victory of Dwight D. Eisenhawer

18CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技

BEN.JAI@HOPEBAYTECH.COM

Q&A