Cisco event 6 05 2014v3 wwt only
-
Upload
arthurhansen -
Category
Technology
-
view
114 -
download
0
description
Transcript of Cisco event 6 05 2014v3 wwt only
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 1
Leveraging Big Data to Create Value
June 5th, 2014
3
Agenda
12-12:30pm Registration and Lunch
12:30-12:40pm Welcome and Introductions -- Art Hansen
12:40-1:45pm Keynote Presentation -- Chris Ward, Brian Vaughan, James Bigger
1:45-2:20pm Hadoop in the Real World by MapR -- David Feldman
2:20-2:30pm Break
2:30-2:45pm Cisco Unified Computing System Rack Mount Servers for Big Data – Wade Ison
2:45-3:30pm Big Data Brainstorm Breakouts
3:30-4:30pm Refreshments, Q&A Session, and Conclusion
4:30pm Raffle Drawing for iPad
Big Data as a Competitive Strategy
Harvard’s Michael Porter:
1. Cost Leadership Strategy (Wal-Mart)
2. Differentiation Strategy (Southwest)
3. Innovation Strategy (Apple)
4. Operational Effectiveness Strategy (UPS)
5. Technology-based Competitive Strategy
What do we have that makes us different?
• Custom Apps• Process (Workflow)• Big Data• People• Culture
5
Big Data’s Financial Benefits
Gartner predicts that “Big Data will deliver transformational benefits to enterprises within 2 to 5 years, and by 2015 will enable enterprises adopting this technology to outperform competitors by 20% in every available financial metric
Goals for Today:
• High ROI less than a year• Must be applied to things that
are important to the business• Use of multiple patterns
encouraged• New ways of correlating data
that was formally not correlated• Remember Big Data patterns
usually require scale
• Understand Big Data Major Building Blocks
• Learn the major patterns• Understand how to introduce Big
Data into the enterprise in practical ways
• Identify a solid use case for Big Data
Tips for Winning:
WWT Big Data Leadership Team
20 years of management consulting and entrepreneurial experience. Expertise in financial services, insurance and telecom. Prior consulting experience with Opera Solutions and A. T. Kearney. Ph.D. in Physics from Oxford University.
James BiggerPrincipal Consultant
15 years in management consulting, analytics and software experience. Expertise in healthcare and insurance. Prior experience with Opera Solutions, Mitchell Madison Group and Broadlane.Ph.D. in Physics from Stanford University.
Brian VaughanPrincipal Consultant
20 years in management consulting and executive leadership. Expertise in retail, marketing, hospitality & financial services. Prior consulting experience with Opera Solutions and The Boston Consulting Group.BA from Princeton University, MBA from the University of Virginia Darden School of Business.
Chris WardPrincipal Consultant
Over 20 years of experience in a range of IT and security disciplines. Responsible for deploying large, secure, Hadoop-based platforms for the U. S. Government. 10 year of international experience implementing networking and virtual data center environmentsUndergraduate degree from AIU.
Matt DuBellPrincipal Systems Engineer
Over 7 Years of experience in management and analytics consulting. Led engagements in telecom at Opera Solutions. Previous experience performing predictive analytics for NASA and USAF at The Aerospace Corporation.Ph.D. in Mechanical Engineering from Pennsylvania State University.
Yoni MalchiEngagement Manager
18 years of analytics and software development experience. Expertise in financial services, healthcare, insurance, retail and marketing science. Prior analytics development experience at Opera Solutions, FICO and J.D. Power and Associates.Ph.D. in Physics from Stanford University..
Jason LuChief Scientist
Over 7 Years of management consulting and entrepreneurial experience. Expertize in financial services, travel, and retail sectors across US and Europe. Led Big Data strategy and analytical engagements at Opera Solutions.MSci in Astrophysics from the University of Cambridge.
Jamie MilneEngagement Manager
Over 8 years of experience in analytics consulting and delivery management. Ran engagements in wealth management, corporate security, marketing, education and transportation at Opera Solutions and IBM Global Business Services.BS in Mathematics from Georgetown University.
Chris InfantiEngagement Manager
Over 20 years of experience in enterprise datacenter, building innovative solutions in Big Data, storage, HPC, virtualization, data migration and enterprise applications. Formerly lead architect for NetApp's Big Data solutions, and led the development of the FlexPod select solutions. B.S. in Electrical Engineering.
Prem JainPrincipal Architect
9
Volume, Variety and Velocity of Data are ExplodingThe production of data is expanding at an astonishing rate. Drivers include the switch from analog to digital technologies and the creation of structured and unstructured data by individuals and companies via social media and the Web
• Every 60 Seconds: 98,000+ tweets 695,000 status updates 11 million instant messages 698,445 Google searches 168 million+ emails sent 1,820TB of data created 217 new mobile web users
• The need to process more data faster to respond to dynamic business trends has brought new requirements for database architectures
• We believe the industry stands at the cusp of the most significant revolution in database and, therefore, application architectures in the past 20 years.
VelocityVarietyVolume
2010 2015 20200
10
20
30
40
ZB Enterprise Managed Data
Enterprise Created Data
2009 2010 2011 2012 2013 20140
10
20
30
40
50
60
70
80
Unstructured data storage
Structured data storage
EB
Source: IDC, Gartner, EMC, Worldwide File-Based Storage 2010-2014 Forecast
Vendor Landscape Is Crowded and Growing
Data Sources& Capture
IT Infrastructure
Data Management & Integration
Analytics Platforms and Solutions
Analytics Services and Support
Data Vendors Infrastructure VendorsOpen Data Platforms
Proprietary Data Platforms
Extended infrastructure + data platforms
Systems Integrators
Specialized End-to-End Solutions
Analytics Service Providers
Vertical Analytics Solutions
Distributed File System and Processing LanguageCharacteristics• Parallel
storage/processing• Flexible programming
model• Horizontal scaling• Batch processing
Non-relational Key-Value Database Characteristics• Fast read/write• Real time query• Horizontal scaling• Simple programming
model • Dynamic schema
Column-Oriented Analytics DatabaseCharacteristics• Relational• Efficient compression• Optimized for fast
read of many/all records
In-Memory Database and ProcessingCharacteristics• Relational• Random Access• Extremely Fast
Enablement / Uses• Complex Event
Processing• Real Time Analytics• Potential to use a
common database for transactions and analytics
Enablement / Uses• Pre-processing of data
for analytics• ETL for transforming
unstructured data to structured
• Data summarization
Enablement / Uses• Real-time ingest• Rapid retrieval• Input to MapReduce
Enablement / Uses• On-Line Analytics
Processing (OLAP)• Data storage and
retrieval for advanced analytics
Foundational Emerging
Key Big Data Technologies
11
Hadoop NoSQL Columnar In-Memory
12
The Big Data Software StackThe big data ecosystem includes open source and proprietary distributions that span the stack from ingest through analytics
Job
Flow
USER/MACHINE WORKFLOW
Enterprise Structured Enterprise Unstructured 3rd Party Web/ Unstructured
Flexible interfaces:
TRANSFORM
ANALYTICS DATABASE
ANALYTICS
ACCESS/QUERIES
INGEST
FILE SYSTEM/DATABASE
MANAGEMENT
ColumnarIn MemoryParallel RDBMS
EMC/PIVOTAL HD / GREENPLUMHP/VERTICA/
CLOUDERAORACLE BIG DATA
EXADATA/EXALYTICSIBM INFOSPHERE
BIGINSIGHTSSAP HANA
TERRACOTTA BIGMEMORY
ZOOKEEPER
CLOUDERA
HORTONWOR
KS
MAPR
PIVOTALHD
HADOOPCASSANDRA
HBASEMONGODB
TEREDATA
NETEZZA
GREENPLUM
VERTICA
OLAPNatural LanguageCustom Analytics
Custom API’sSQL
OPEN SOURCECOMMERCIAL
OPEN SOURCE
Fast, Scalable
Provisioning Maintenance
Flexible, Compressed, Fast Read
Optimized for high vol reads
Interfaces to accept data
Real Time & Batch
HDFSNoSQL - Document - Key-Value - Wide Column
SQLPIG
HIVE
RPYTHON
SAS
SPSS
BatchStreaming
SQOOPFLUME
SPLUNKTALEND
LAYER PROPERTIES OPTIONS EXAMPLES OF PRODUCTS INTEGRATED OFFERINGS
MapReduce HADOOP
Parallel, Distributed
ODSDataWarehouse
CallCenter
ServerLogs Financial Demographic
OO
ZIE
DATA
ACQUIRE
ORGANIZE
ANALYZE
DECIDE
SOLUTIONS
MICROSTRATEGY
BUSINESS OBJECTS
COGNOS
ORACLE OBIEE
PLUS
13
Technology: Expanding the Traditional StackBig Data requires a technology stack that leverages existing infrastructure and introduces new technology for distributed parallel processing
Queries (SQL)
Relational Databases
Monolithic Hardware(few CPUs and network
computers)
“Shared Disk/Memory” Architecture
(centralized processing)
Direct Record Access or Queries
Monolithic Hardware(few CPUs and network
computers)
“Shared Disk/Memory” Architecture
(centralized processing)
NoSQLDatabase
Parallel Relational Database
DistributedFile
System
High-Performance Traditional Relational Database
MapReduce Programs
Distributed Hardware(multicore CPUs, multiple computers
connected via high-performance network)
“Shared Nothing” Architecture(distributed parallel processing)
INTERFACE
DATABASE/ DISTRIBUTED PROCESSING FRAMEWORK
HARDWARE
TRADITIONAL RELATIONAL DATABASE STACK STACK FOR THE NEW DATA
FOUNDATION
Source: IDC, CSC, Gartner
14
Business Need
Class of Analytics
Analytics: Translating Business Needs to MathRegardless of industry, many use cases translate into a limited class of “math problems” that big-data platforms (unlike transactional platforms) are optimized to solve at scale
Method Analytics Ready Stack
Hardware & Software
• Parallel
• Distributed
• Shared Nothing
• Columnar
• NoSQL
• In-Memory
• ARMA
• Decision Trees
• Genetic Algorithms
• Graph Theory
• Kalman Filter
• KNN
• Linear Regression
• Logistic Regression
• Matrix Factorization
• Monte Carlo
• Neural Networks
• Sorting
• Survival Time Analysis
• Visualization
• Regression
• Classification
• Clustering
• Forecasting
• Optimization
• Simulation
• Sparse Data Inference
• Anomaly Detection
• Natural Language Processing
• Intelligent Data Design
• Recommendation
• Risk Scoring
• Pricing
• Capacity Planning
• Cost Reduction
• Matching
• Retrieval
15
Defining The Business Opportunity Is The Starting PointThe power of “Big Data” lies in bringing together data in a timely fashion from sources within and external to the enterprise - structured and unstructured - to create a complete view of critical business issues, therefore enabling advanced analytics to unlock key insights that drive significant business value
Outcome
Analytics
Data
Technology
Clearly defined use cases with the potential to deliver significant value by distilling vast data into new, previously unknowable intelligence
Advanced machine learning techniques to analyze data and mine for insights to drive critical business decisions
Structured or unstructured, internal or external, requiring new methods of storage/integration
Emerging/new technology stacks using scalable, distributed architectures
Telematics is Transforming Auto Insurance
Big Data Use CaseCombine driving behavioral with actuarial data to create individualized risk models that more accurately predict claims losses that enables risk adjusted pricing to gain market share and increase margins
Business ImperativeTo gain profitable market share, insurance companies need to offer the lowest “risk adjusted” pricing possible to consumers
Methods• KNN• Linear Regression• SVD
Class of Analytics• Regression• Clustering• Anomaly Detection
• Sensors to capture routes, miles driven, time of day, braking patterns, driving speed
• Geospatial maps tied to database layers
Science & Data
HDFS
MapReduce
NoSQL
Data W/H
In databaseAnalytics
Data Marts
Technology
Data
16
C a s e S t u d yI n s u r a n c e
Predictive Maintenance
17
FTP over MESH
Data Logger
Data Logger
• One per truck• (Logs, Sensors, OEM
Alarms, VIMS Service Port)
Equipment Maintenance
Dispatch & Operator
Fuel, Oil Analysis, etc.
Hours
1 Urgent Component Problem
2 Critical Sensor Problem
Stratifying Alarms1
3 Important/Not Urgent Component/Sensor Problem
4 Not Important Component or Sensor Problem
5 Noise - Ignore
Data Logger
Data Driven Preventative Maintenance
Data/Analytics driven timing for preventative maintenance (e.g., oil changes) on individual Trucks
3
1 Urgent Component Problems
e.g., Engine, Transmission, Differentials, Torque Converters, Final Drives
Major Component Failure Model(s)2
Project Scope• 252 Trucks – 200
sensors per truck• 7 Mine sites• 10,000
readings/second
Data Integration• Integrating 15+ siloed data sources
in multiple file formats• 10 Terabytes of data• 3 year historical data ecosystem
Business Impact: Higher equipment up-time; reduced critical component failure; better preventative maintenance and increased labor productivity
C a s e S t u d yM i n i n g
Data Warehouse Augmentation: Value PropositionAugmenting the Data Warehouse with a less expensive Hadoop system will allow companies to free up valuable space on their DW systems to run faster queries and analysis, whilst storing large volumes of their data universe
WWT Hadoop Appliance Traditional Data Warehouse
Full Data UniverseCRM Social
MediaBilling
Web logsPayments
Scheduling
Cold Data Warm Data Hot Data
2. About 50% of data that is brought into a typical Data Warehouse system is rarely accessed: Cold Data
3. About 80% of the queries and reporting performed on Hot Data does not need to be at DW speeds
1. A significant amount of data is thrown out during the ETL process that may be valuable in the future
Traditional Data Warehouse
Full Data UniverseCRM Social
MediaBilling
Web logsPayments
Scheduling
Cold Data Warm Data
2. Store Cold Data in Hadoop, taking advantage of lower cost per TB
− Teradata: $17K− Hadoop: $2K
3. Continue to take advantage of DW agility and speed in real-time analysis and querying
1. Utilize additional Hadoop-based storage to store full data universe
− Files can be stored in natural format
Warm Data
Hot Data
Potential jumping-off point for Big Data Business Impact project
CURR
ENT
PRO
POSE
D
Integrating Many Data Sources To Provide Lift
Purchase History
ServiceHistory
Web Data
Campaign Metadata
Destination Word clouds
Partner Hotels
Profiled 100+m transactions for
millions of customers
Linked data for millions of customer
interactions and service records
Analyzed billions of page-views for
behavioral indicators
Extracted meaning from tens of
thousands of email campaigns
Mapped destinations to key “feature tags”
which explain selection
Geotagged tens of thousands of partner
hotels by understanding free
text description
C a s e S t u d yG l o b a l A i r l i n e
19
Time
Nov 2010
Dec 2010
Jan 2011
Feb2011
Mar2011
Apr2011
May2011
Jun2011
Jul2011
Aug2011
Sept2011
Hotel ExperienceFlight Car Rental HolidayCustomer Travel ProfileID= xxxx
0%10%20%30%40%50%60%70%80%90%
100%
% Offered
Upt
ake
% Lift
0%10%20%30%40%50%60%70%80%90%
100%
% Offered
Upt
ake
%
Time
Nov 2010
Dec 2010
Jan 2011
Feb2011
Mar2011
Apr2011
May2011
Jun2011
Jul2011
Aug2011
Sept2011
Hotel ExperienceFlight Car Rental HolidayCustomer Travel Profile: ID= xxxx
Typically social media tools focus on monitoring past/present activity. Predictive analytics allows users to identify important threads and intervene early, shifting the focus to future activity
• Details on particular themes or attributes
• Forecasts trend and a mechanism to intervene in attribute that are going viral
• Word cloud shows ongoing buzz and sentiment
• Tabular view shows emerging themes and sentiment, virality score and recommended time-window for action
Social Media AnalyticsC a s e S t u d yC o n s u m e r
E l e c t r o n i c s
20
Curriculum Management
Engine
Curriculum Management Engine
We designed a recommendation engine that generates a dynamic set of recommendations on a daily basis (over 1MM/day, from sales force handhelds, website, call centers) that learns and adapts to increase its ability to change behaviors over time through a Curriculum Management Engine
Plan for Smith Household:
Total Wallet = $600Aspiration: Achieve 60% share of wallet up from 40%How:• Habituate Pizza and Ice
Cream and Increase Frequency
• Move Into Dinner Entrees & Sides
• Move Into Higher Margin Breakfast Entrees
• Increase Frequency of Purchases
VISIT #1:1. Haven’t Bought In A While:
2. Others On My Route Like:
3. Would You Like Another?:
4. Just for You -- $1.00 Off
Household Response
VISIT #21. Would You Like Another?
2. Others On My Route Like:
3. No pizza; not yet consumed
4. Just For You
cNature of Recommendations
• Individuated Offers – Especially for You
• Cross-Sell/ Up-sell – Based on latent needs
• Reminders – Haven’t bought in a while
• Trials – Never tried but similar people like it
• Promotions – Being a loyal customer
Recommendations for Grocery Retailer’s Customers Delivered $100 million p.a. in EBIT
C a s e S t u d yF o o d G r o c e r
Using Internal and External Data with Advanced Analytics for Site Selection
• Comprehensive performance data– Fronts store / pharmacy sales– Customer and patient demographics– Local area demographic
• Web Scraping and Text Analytics– Neighborhood business profile– Competitor performance– Healthcare alternatives (ER, Urgent Care, PCPs)
Enriched Dataset
Advanced Analytics
• Non-linear, multivariate predictive models– Linear/Logistic Regression– Decision Trees (CART)– Random Forest– Gradient Boosting Machine – Neural Networks
• Incorporation of all data, including variables usually viewed as “qualitative”
Gravity Mapping
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
R = 0.75
M o d e l P e r f o r m a n c e
Predicted Patient Volume
Actual Patient Volume
+17%
Model Recommendation
0.83
Original Expansion
Plan
0.71
Potential Volume
I m p a c t
C a s e S t u d yR e t a i l P h a r m a c y
23
Designing Appropriate Reference ArchitecturesA reference architecture is a specific set of software and hardware components that together comprise an Analytics-Ready Infrastructure
USER/MACHINE WORKFLOW Visualization Forecasts Pricing Reports Alerts Scores Offers
NETWORK
LAYER DESCRIPTION EXAMPLES OF PRODUCTS
DATA
FILE SYSTEM/DATABASES
Enterprise Structured Enterprise Unstructured 3rd Party Web/ Unstructured
ODSDataWarehouse
CallCenter
ServerLogs Financial Demographic
CUSTOM ANALYTICS
ANALYTICS TOOLS
ANALYTICS DATABASES• Flexible, Compressed, Fast Read• Columnar, In Memory, Parallel
RDBMS
• High-level programming languages with packaged analytical modules
• Can be either general purpose or industry/function specific
• Services• Advanced models
• Parallel, Distributed• HDFS or NoSQL
• Interfaces to accept fast and varied data
“Analytics-Ready Infrastructure”
COMPUTE
STORAGE
INGEST
• 10Ge, low latency
• Commodity, rack mount• Purpose built servers• Internal JBOD, Direct Attached,
Network
SAS R PYTHON SPSS
VERTICA GREENPLUM TERADATA NETEZZA
EXADATA SAP HANA
CLOUDERA MAPR HORTONWORKS
PIVOTALHD
MARKLOGIC
DATATACTICS
ORACLE NOSQL
FLUME SQOOP TALEND VELOCIDATA
UCS-C240 UCS-C460 HP 380P HP SL4540
UCS 6200 NEXUS 2200 HP 5800 DELL FORCE10
JBOD SATA JBOD SSD E-SERIES ISILON
24
Deploying new technologies and combining with existing architecture
• How do we create an effective integrated Big Data stack?
• What new technologies do we need and how do they fit together?
Organizing for success • Where does Big Data fit? • What belongs in the BUs vs.
centralized?• Who is responsible for data
integrity?• Where do we find the critical
resources needed to deliver Big Data solutions?
Navigating a crowded and evolving vendor landscape
• How do we separate marketing hype from reality?
• Who should we use? Who can we trust
Defining the business value proposition
• What problem/opportunity are we pursuing?
• What is the value that can be created?
Four Major Big Data Challenges Facing Most CompaniesIn our meetings with customers, four issues are consistently brought up as a major challenges related to creating a big data capability that can effectively support the business units
Key Big Data
Challenges
Dual Approach to Delivering Big Data SolutionsWWT offers customers both strategic and tactical approaches to derive value from the application of Big Data analytics and technology
25
• Strategic Roadmap− Big Data Strategy− Use Case Design
• Use Case PoC− Analytics Development− Workflow Integration
• Data Warehouse Augmentation− ETL Offload− Data Lake Creation
• SAP HANA Implementation • Big Data Stack Build / Optimization• Production Support & Sustainment
BIG DATA BUSINESSIMPACT
Extract value from data to drive multiple Use Cases
BIG DATA TECHNOLOGY OPTIMIZATION
Accomplish data tasks, faster, cheaper, better
EXAMPLE SCALE OUT HARDWARE• Multiple Nexus 6000/
7000 Series switches• 5 – 50 Big Data racks• Cisco SAP HANA scale-out
(e.g. 8-16 UCS-B200)• Software scale-out
EXAMPLE STARTER KIT:Cisco SAP HANA Medium Appliance (2 UCS-C460)• Big Data Solution Stack:
o 2 UCS 6296PPo Each Big Data rack:
2 Nexus 2232PP 8-16 HP DL380 or SL4540, UCS-C240, etc.
o Initially: 1 – 2 rackso Software: MapR, E.
Service and Solution Offerings
26
• Develop a roadmap for implementing Big Data Use case exploration Data Governance,
Infrastructure and Analytics ownership
• Define high impact use cases
• Design and test appropriate reference architectures
Plan Design Pilot Scale
WWTOfferings
IndicativeInfra-
structure
• Create detailed description of selected pilot use cases Analytics Workflow
integration
• Test various reference architectures
• “Stand-up” reference architecture
• Design the pilot Success criteria Timeline Scope
• Identify and prepare data
• Build analytical models
• Design workflow
• Implement, manage and monitor
Analytics-Ready Infrastructure Solution Development
• Implement design changes from pilot learnings
• Invest in software development as necessary to improve UI
• Prepare ETL process for scale
• Build out infrastructure as required to support rollout
4. Production Support• Operationalizing POC• Infrastructure Sustainment• Training• Ongoing support
3. Proof of Concept• POC design• Analytical models• Customer data loaded,
processed and analyzed
1.Strategic Roadmap• Use case definition• Organizational alignment• Big Data Architecture high
level design
2. Big Data Stack Build• Detailed design Big Data
architecture and BOM• Procure, configure and
deploy Big Data stack
Advanced Technology Center (ATC)
COLLABORATIONENTERPRISE NETWORKS
SECURITY DATA CENTER
A highly collaborative, ecosystem to design, build, educate, demo & deploy advanced technology solutions for our customers & partners
Hands-on Access to over $50M in Equipment
• Point Product Demos• Tech. Training Sessions • EBCs / ATC Tours • Tech Days Demos• Customer Proof of Concepts• Reference Arch. Dev. • Product Training / PS• Version Upgrade Testing
• Version Upgrade Testing• Strategic Ref. Arch. Demo (RAD)• Product Comparison –Func. • Product Comparison – Perf.• Customer Access to Lab • Customer Environment• Workshop Demos• Early Field Trials / Beta Code • Certification
•Next GenerationNetworking•Nexus (7K, 5K, 3K & 2K)•Virtual Networking(Nexus 1000v)•OTV, LISP, Fabric Path•Layer 2 Extension•DR/BC Networking
• BYOD (Bring Your Own Device) & Secure Mobility
• Jukebox• ISE & RSA• ASA 1000v• VSG (Virtual Security
Gateway)• Cyber Security Solutions
• Unified Communications
• Tandberg Video• VXI (View &
XenDesktop)• WebEx, Call Center &
Collaboration Solutions• Phones, Backpacks &
Soft, Phone Clients• Telepresence & Business
Video
• Vblock, FlexPod & CloudSystem Matrix
• EMC & NetApp Storage• vSphere / XenServer• vCloud Director• VDI (View / XenDesktop)• Cisco CIAC & BMC CLM• EMC’s UIM & Cloupia • FAST MDC (Mobile Data
Center) Solutions
27
28
ATC Big Data Functions: OverviewThree functions of the ATC have been identified, which will support Sales (and other) processes
Function Description Usage
Proof of Concept
• Test customer solutions prior to full onsite implementation, e.g.
− Run Use Case analytical models and architectures on Big Data machines
− Create Big Data hardware/software stack, potentially with client data
• Mid-term project basis, to provide an environment for customer, based on a running engagement
Technology Comparison
• Compare Big Data solutions to provide insight into strengths and weaknesses of each
• Run “bake-offs” to gauge how well a full solution can be solved using certain components
• To test generic POCs, may be customer-driven
• Inform Big Data Team on best solutions
Field Demo • Showcase Big Data capabilities by hosting demos of WWT PoCs and analysis
− Run Use Case analytical models and architectures on Big Data machines
• Tool for sales calls and EBCs
Big Data Environment Set-up: ATC Reference Architectures
29
Four analytics-ready infrastructure stacks have been developed in the ATC to showcase Big Data technologies
DATAEnterprise Structured Enterprise Unstructured 3rd Party Web/ Unstructured
ODSDataWarehouse
CallCenter
ServerLogs Financial Demographic
STORAGE
REFERENCE ARCHITECTURE 1
NETWORK
FILE SYSTEM/DATABASES
ANALYTICS TOOLS
ANALYTICSDATABASES
COMPUTE
INGEST
REFERENCE ARCHITECTURE 2
HP Internal Local Storage
UCS – NetApp Direct Attached Storage
UCS 6296UP NEXUS 2232PP
UCS-C220M3
REFERENCE ARCHITECTURE 3
UCS – Isilon Network Storage
UCS 6296 NEXUS 2200
HAWQ HBASE
PIVOTALHD
UCS-C240
MICROSTRATEGYMICROSTRATEGY
REFERENCE ARCHITECTURE 4
SAP HANA
HITACHI
UCS B BLADES
JBOD SATA
HORTON
IMPALA
NEXUS 2200
HP DL 380
HBASE
R PYTHON R PYTHONR PYTHON
HITACHINETAPP E5460 ISILON
VELOCIDATA
VELOCIDATA
VELOCIDATA
MAPR
CLOUDERA CLOUDERAGEMFIRE
IMPALA HBASE
JAVA JAVA JAVA
In ProcessCurrent In Process
SPLUNK SPLUNK SPLUNK
HORTON MAPR HORTON MAPR
CLOUDERA
SAP HANA
VELOCIDATA SPLUNK
30
First Step: Big Data Workshop