Thriving and surviving the Big Data revolution
-
Upload
guy-harrison -
Category
Data & Analytics
-
view
355 -
download
3
description
Transcript of Thriving and surviving the Big Data revolution
1 Global MarketingConfidential
REMINDER
Check in on the COLLABORATE mobile app
C
14
LV
207Surviving and thriving in the big data revolution
Guy Harrison
Executive Director RampD
Information Management Group
Dell Software
207Surviving and thriving in the big data revolution
Guy Harrison
Executive Director RampDInformation management group
3 Software Group
Introductions
Web guyharrisonnet Email guyharrisonsoftwaredellcom Twitter guyharrisonGoogle Plus httpswwwgooglecom+GuyHarrison1
4 Software Group
5 Software Group
6 Software Group
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
207Surviving and thriving in the big data revolution
Guy Harrison
Executive Director RampDInformation management group
3 Software Group
Introductions
Web guyharrisonnet Email guyharrisonsoftwaredellcom Twitter guyharrisonGoogle Plus httpswwwgooglecom+GuyHarrison1
4 Software Group
5 Software Group
6 Software Group
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
3 Software Group
Introductions
Web guyharrisonnet Email guyharrisonsoftwaredellcom Twitter guyharrisonGoogle Plus httpswwwgooglecom+GuyHarrison1
4 Software Group
5 Software Group
6 Software Group
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
4 Software Group
5 Software Group
6 Software Group
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
5 Software Group
6 Software Group
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
6 Software Group
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
8 Software Group
Dell and Quest ndash a brief history
9 Software Group
But Seriously
10 Software Group
What is Big Data
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
10 Software Group
What is Big Data
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
- 207Surviving and thriving in the big data revolution
- 207Surviving and thriving in the big data revolution (2)
- Introductions
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Dell and Quest ndash a brief history
- But Seriously
- What is Big Data
- Slide 11
- Instead - the industrial Revolution of data
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Data means more
- Big Data is the culmination of cloud social and mobile
- Not all upside
- Will Big Data kill retail
- Prevalence of Showrooming
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Some novel defences
- Web analytics for retail
- Connected Store
- Slide 33
- Why showrooming
- Itrsquos not enough to lay out products on tables
- Therersquos a similar story in every industry
- The Revolution is not over yet
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Data Input
- Slide 46
- Siri
- Slide 48
- Slide 49
- Brain Control
- Slide 51
- Slide 52
- Muze
- Slide 54
- Slide 55
- The instrumented human
- The instrumented world
- All of which accelerates what we call Big Data
- Big Database technologies
- Pioneers of Big Data
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Google Software Architecture
- Map Reduce
- Multi-stage Map-Reduce
- Schema on Read vs Schema on Write
- Hadoop Open Source Map-Reduce Stack
- Hadoop at Yahoo
- Slide 72
- Slide 73
- Hadoop ecosystem
- Hadoop 10 Architecture
- Hadoop 20 YARN
- Tez1
- HBase
- Hbase Data Model
- Hive
- Slide 81
- Slide 82
- Other SQL-like Hadoop Interfaces
- Pig
- Flume and SQOOP
- Berkeley Data Analytic Stack (BDAS)
- Meanwhile back at the Death Star
- Slide 88
- Oracle Exadata (X-2)
- Economies
- Oracle Big Data Appliance
- Big Data Appliance Software
- Generating competitive advantage through ldquoBig Data analyticsrdquo
- Collective Intelligence
- Slide 97
- Slide 98
- Slide 99
- Slide 100
- Slide 101
- Slide 102
- Slide 103
- Slide 104
- Google Flu Trends
- Slide 106
- Collective Intelligence outsmarts Artificial Intelligence
- Slide 108
- Slide 109
- Slide 110
- Slide 111
- Artificial Intelligence Strikes back
- Slide 113
- Slide 114
- Slide 115
- Slide 116
- Watson is big data AI
- Predictive Analytics
- Classification
- Clustering
- Supervised Machine Learning
- Unsupervised learning
- Slide 123
- Big Data Analytics
- Data Science is hard
- Data Scientists to the rescue
- Kitenga Analytics Suite
- Toad for Hadoop
- SharePlexreg for Hadoop
- Toad BI Suite
- Slide 131
- Dellrsquos offering was not completehellip
- Dell acquires Statsoft
- Slide 134
- Data Visualization
- Live scoring ndash integration into operational systems
- Industry and cross-industry packaged solutions
- For your business
- For your career
- Please complete the session evaluation on the mobile app We app
-