Transcript of Introduction to Big Data. World Cup soccer 2014.07.05 (Money Today) : IoT + Bigdata German soccer...
- Slide 1
- Introduction to Big Data
- Slide 2
- World Cup soccer 2014.07.05 (Money Today) : IoT + Bigdata
German soccer Team
- Slide 3
- What is big data? Big data is the term for a collection of data
sets so large and complex that it becomes difficult to process
using on-hand database management tools or traditional data
processing applications.
- Slide 4
- Big Data is Every Where! Lots of data is being collected and
warehoused Web data, e-commerce purchases at department/ grocery
stores Bank/Credit Card transactions Social Network
- Slide 5
- Slide 6
- How much data? Google processes 20 PB a day (2008) Wayback
Machine has 3 PB + 100 TB/month (3/20 09) Facebook has 2.5 PB of
user data + 15 TB/day (4/ 2009) eBay has 6.5 PB of user data + 50
TB/day (5/2009 ) 640K ought to be en ough for anybody.
- Slide 7
- What does big data do?
- Slide 8
- Government In 2012, the Obama administration announced the Big
Data Research and Development Initiative, which explored how big
data could be used to address important problems faced by the
government.The initiative was composed of 84 different big data
programs spread across six departments.Obama administration Big
data analysis played a large role in Barack Obama's successful 2012
re-election campaign.Barack Obama2012 re-election campaign The
United States Federal Government owns six of the ten most powerful
supercomputers in the world.United States Federal Government The
Utah Data Center is a data center currently being constructed by
the United States National Security Agency. When finished, the
facility will be able to handle yottabytes of information collected
by the NSA over the Internet.Utah Data CenterUnited StatesNational
Security Agencyyottabytes
- Slide 9
- Business Amazon.com handles millions of back-end operations
every day, as well as queries from more than half a million
third-party sellers. The core technology that keeps Amazon running
is Linux-based and as of 2005 they had the worlds three largest
Linux databases, with capacities of 7.8 TB, 18.5 TB, and 24.7
TB.Amazon.com Walmart handles more than 1 million customer
transactions every hour, which is imported into databases estimated
to contain more than 2.5 petabytes (2560 terabytes) of data the
equivalent of 167 times the information contained in all the books
in the US Library of Congress.WalmartLibrary of Congress Facebook
handles 50 billion photos from its user base. FICO Falcon Credit
Card Fraud Detection System protects 2.1 billion active accounts
world-wide.FICO The volume of business data worldwide, across all
companies, doubles every 1.2 years, according to estimates.
Windermere Real Estate uses anonymous GPS signals from nearly 100
million drivers to help new home buyers determine their typical
drive times to and from work throughout various times of the
day.Windermere Real Estate
- Slide 10
- Examples of free big data use sites Google trends Google flue
Google correlate Social metrics insight
- Slide 11
- Bigdata in google trend
- Slide 12
- Movement of carts: Product display Bigdata case 12
- Slide 13
- Wild Fire in Korea(1991 2011 ) 13
- Slide 14
- Google Flue Service 14
- Slide 15
- Find Location for your business busienss 15
- Slide 16
- Crime Mapping in Sanfrancisco : 71% accuracy 16
- Slide 17
- Similar names for bigdata: Data sciences Business analytics
Data analytics Data mining business intelligence Machine
Learning
- Slide 18
- Slide 19
- Slide 20
- Case 1: A case on bigdata analysis MBA (Market Basket
Analysis)
- Slide 21
- 1). POS Data (1000 data) bananas plums, lettuce, tomatoes
celery, bean bean apples, carrots, tomatoes, potatoes potatoes bean
carrots bean apples, oranges, lettuce, tomatoes peaches, oranges,
celery, potatoes, bean beans oranges, lettuce, carrots, tomatoes
apples, bananas, plums, carrots, tomatoes, onions, bean apples,
potatoes lettuce, peas, beans.
- Slide 22
- 2). Association Rules as Output (Model) Only 55 rules satisfy
the specified constraints. tomatoes -> lettuce [Coverage=0.263
(263); Support=0.111 (111); Strength=0.422; Lift=1.94;
Leverage=0.0539 (53.9); p=2.35E-019] lettuce -> tomatoes
[Coverage=0.217 (217); Support=0.111 (111); Strength=0.512;
Lift=1.94; Leverage=0.0539 (53.9); p=2.35E-019] tomatoes ->
carrots [Coverage=0.263 (263); Support=0.085 (85); Strength=0.323;
Lift=1.85; Leverage=0.0390 (39.0); p=1.83E-012] carrots ->
tomatoes [Coverage=0.175 (175); Support=0.085 (85); Strength=0.486;
Lift=1.85; Leverage=0.0390 (39.0); p=1.83E-012].
- Slide 23
- 3). Graphic Representation
- Slide 24
- Relationship graph when the link is set to 0
- Slide 25
- Association Rule : Relationship graph when the link is set to 0
Graphic Representations of Association Rules
- Slide 26
- 6 Relationship graph when the distance is set by value -
network form
- Slide 27
- Application of MBA : product recommendation system
- Slide 28
- Case 2: SNS analysis
- Slide 29
- Social Network (http ://nexus.ludios.net/view/demo)
- Slide 30
- Analysis of Human Relations (NodeXL)
- Slide 31
- Friends Networks
- Slide 32
- Case 3. Bankruptcy Prediction The yearly financial data
collected by the Korea Credit Guarantee Fund. The data consist of
944 bankrupted corporations and 944 healthy (non- bankrupted)
corporations from the fiscal year 1999 to 2002. 32
- Slide 33
- List of financial variables selected VariableDefinition X13:
interest expenses to sales (interest expenses / sales) 100
X17:profit to sales (profit / sales) 100 X24:operating profit to
sales (operating profit / sales) 100 X27:ordinary profit to total
capital (ordinary profit / total capital) 100 X28:current
liabilities to total capital (current liabilities / total capital)
100 X103:growth rate of tangible assets (tangible assets at the end
of the year / tangible assets at the beginni ng of the 100) 100
X108: turnover of managerial assets sales / {total assets
(construction in progress + investment assets)} net financing cost
interest expenses interest incomes X127: net working capital to
total capital {(current assets current liabilities) / total
capital} 100 X129:growth rate of current assets (current assets at
the end of the year / current assets at the beginnin g of the year
100) 100 X140:ordinary income to net worth (ordinary income / net
worth) 100 33
- Slide 34
- Decision Tree Analysis 34
- Slide 35
- Case 4. Income Prediction For our study we selected the United
States Census (5%) 1990 Public Use Microsample data (Census 1990).
This data, which was divided into 18 files, contained the entire 5%
sample made public domain from the 1990 U.S. Census in STATA 6.0
format. Combined, these 18 files included about 4.5 million males
and 5 million females, totaling to 9.1 million records. Census 1990
- http://www.macalester.edu/econdata/United_State s/pums.html
http://www.macalester.edu/econdata/United_State s/pums.html 35
- Slide 36
- Data Sampling we converted the 18 data files into flat files;
then, using Java code, we merged these 18 flat files into a singe
file consisting of 9.1 million records with 85 variables
(approximately 1.5 GB in size). 36
- Slide 37
- Algorithm Analogy of Discovering the Complete Set of Rules
(Drawing the Perfect Picture via Coin Scrubbing) 37
- Slide 38
- The Repetitive Methodology of Merging New Rules into the Domain
Knowledge Base 38
- Slide 39
- The Relationship Between IRAs Accuracy Level and Number of
Iterations for This Study 39
- Slide 40
- Performance Comparison CHAIDCARTANNLRDASee5 This st udy Tool
UsedAnswer Tree (SPSS) Answer Tree (SPSS) Neural Conn ection (SPSS
) SPSS See5 (with default rul e) IRA Training Sa mple size 3.24m
10000300k Accuracy (2/3-1/3) 80.19580.30RBF:76.12 MLP 80.68
81.178.382.382.7 40
- Slide 41
- Mining tools Enterprise Miner (SAS) Clementine (SPSS) R Python
Many visualisation tools: Infographics etc Rapid miner Hadoop
Rhive
- Slide 42
- Future direction of bigdata
- Slide 43
- bigdata 2013 bigdata 2014
- Slide 44
- Google glass Mashup, bigdata, visualisation -> analysis of
commerce area
- Slide 45
- IoT Key: Smart & Intelligence
- Slide 46
- 3D Printer Healthy food, organ, face recommended?
- Slide 47
- Evolution of bigdata
- Slide 48
- cup
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Cup with Art
- Slide 59
- Slide 60
- Cup with emotion
- Slide 61
- Slide 62
- Slide 63
- Cup without cup