Post on 17-Jan-2016
1CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Big Data PhilosophyBen Jai
CEO, Hope Bay Technologies, Inc.
2CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
What is Big Data?
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
Translation: DB2/Oracle/MSSQL/MySQL can’t handle it
3CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Difference from just Data?
Volume (amount of data) Velocity (speed of data in and out) Variety (range of data types and sources) Veracity (truthfulness of the data)
Translation: DB2/Oracle/MSSQL/MySQL can’t handle it
4CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Is that It?
Big, fast, unstructured, uncertain, so?
Conventional wisdom: I can solve the problem if my computer is 1000 times bigger and faster!• I have a question looking for an answer.
Unconventional wisdom: I can find the pattern if my data set is 1000 times bigger!• I have a data set looking for useful information.
5CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Example 1: Machine Translation
Conventional method: dictionary + grammar rules + exception database…• Good enough initially, but will not improve• Exceptions grow naturally and DB can’t follow
Statistical machine translation: compare same article in different languages and build statistical models• Need a LOT of data to see initial results but
will grow automatically without human knowledge of the languages
6CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
How big is BIG?
47.5
48.5
49.5
50.5
51.5
52.5
53.5
AE BLEU[%]
+weblm =219B words of web data!
BLEU% score – Arabic to English
7CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Example 2: Spell Check
Conventional method: compare words against dictionary and find minimal editing distance. But:• Proper names are not in the dictionary• Word may be in the dictionary but not what the
user wants• Can’t break ties
Google method: look at what others do• Britney Spears brittany spears, brittney spears,
britany spears, britny spears, briteny spears, britteny spears, brine spears, …
8CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
300 Ways to Spell Her Name
9CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Also Look at the Context
Kofi Annan is correct
Kofee Annan Kofi Annan
Kofee Shop Coffee Shop
10CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Example 3: Detecting Defaults
Conventional method: look at FICO score MIT method: use machine learning to look at
100’s of variables in customer activities
NTU professor used a similar method to reduce return rate in TV shopping channel
11CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Example 4: Who’s in Love?
12CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Amazing Results
Big Data System Architecture
Data Source
Data Source
Data Source
Data Source
Data Storage Platform
Data Analysis Platform
Data Analysis Application
13CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Levels of Big Data Processing
STORAGE is what you can ACCESS DATA is what you can PROCESS INFORMATION is what you can
UNDERSTAND INTELLIGENCE is what you can USE
14CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Amazing Results
Intelligence
Data Storage Platform
Data Analysis Platform
Data Storage
Data Analysis Tool
Data Analysis ApplicationData Information
Big Data System Components
Data Source
Data Source
Data Source
Data Source
15CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Who Can Use Big Data?
Those who owns a lot of data Those who can see the information Those who can convert the information
into intelligence Those who can monetize the intelligence
16CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
Who Can Make Money?
Those who owns a lot of data Those who can see the information Those who can convert the information
into intelligence Those who can monetize the intelligence
17CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
It’s Just the Beginning…
1952: UNIVAC I predicted the landslide victory of Dwight D. Eisenhawer
18CC-BY-NC-SA 3.0 License, 2015 Hope Bay Technologies, Inc. 和沛科技
BEN.JAI@HOPEBAYTECH.COM
Q&A