Big Data Analytics - BI Academy · Enable students to design big data system architectures and...
Transcript of Big Data Analytics - BI Academy · Enable students to design big data system architectures and...
Big Data AnalyticsIntegrated Learning Concepts
Prof. Dr. Hendrik MethHdM, Stuttgart
Prof. Dr. Hendrik Meth
ABOUT ME
University of Mannheim
Diploma in Business Informatics
Icon, Karlsruhe
SAP APO & SAP BW-Consulting
Bosch, Stuttgart
Business Intelligence Consulting,
Product & Project Management
University of Mannheim
PHD in Business Informatics
BorgWarner ITSE
Manager Business Intelligence
Competence Center
1995-
2001
2004-
2010
2010-
2013
2013-
2016
2002-
2004
HdM Stuttgart
Professor
Big Data & Data Science
since 03.2016
2
Prof. Dr. Hendrik Meth
AGENDA
3
• Overlapping Concepts: BI vs. Big
Data Analytics
• Big Data Analytics: Teaching &
Research Program
• Blended Learning: Teaching vs.
Research
Prof. Dr. Hendrik Meth
WHAT IS BIG DATA?The four Vs
• Four dimensions to be differentiated
Source: Schroeck et al. 2012 – IBM Institute for Business Value
4
Prof. Dr. Hendrik Meth
BIG DATA ANALYTCS - Perspectives
Scale Up vs. Scale Out
Commercial Software vs. Open Source
Cloud vs. On Premise
Hot vs. Warm vs. Cold Storage
Monolithic vs. Hybrid
Architectures
In-Memory
(Frequent Access)
Flash Disk(Occasional Access)
Disk(Seldom Access)
Archi-tecture
Structured
Data
Archi-tectureUnstruc
turedData
Mono-lithicArchi-
tecture
Technology Process & Method
Organization
Clustering
Association Rules
Natural Language
Processing
Regression
Classification
Information Retrieval
Software Selection
Implementation Procedures
Data Science Process
Organization of Teams and
Competence Centers
Application Scenarios
5
Prof. Dr. Hendrik Meth
BIG DATA ANALYTCS –Market and Challenges
6
1 http://www.forbes.com/sites/louiscolumbus/2014/12/29/where-big-data-jobs-will-be-in-2015/#2caa75b9404a2 http://www.forbes.com/sites#/sites/gilpress/2015/04/30/the-supply-and-demand-of-data-scientists-what-the-surveys-say/#7e9131cd205e
$91k
$250k
Data Scientists’Median Salaries2
JuniorLevel
ManagementLevel
57%
43%
Enterprises Key Challenges-Analytical Skills2
No Lack of analytical skills
Lack of analytical skills
27%
73%
Enterprises Key Challenges-Cross-functional Integration2
Integration Data Scientistswith "traditional dataworkers" successfull
Integration Data Scientistswith "traditional dataworkers" NOT successfull
Increasing Demand in 20141 for…
90%
124%
SystemAnalysts
Project Managers
Prof. Dr. Hendrik Meth
Dat
a
Co
nsu
mp
tio
nD
ata
Pro
visi
on
ing
Consolidation Analytics
Dat
a
War
eho
use
Data
Mining
Ad-Hoc
ReportingR
epo
rtin
g
Data
Integration
Performance
Management Planning
Information
Hubs
Data
Visualization
Dat
a M
art
BI AND BDAOverlapping Concepts
Text
Mining
NL
P
Dimensional
ModelingETL
MD
X
In-Memory
Distributed File
Systems
Prediction
R
Machine Learning
NoS
QL
Lambda
Architecture
Data
Science
Business
IntelligenceBig Data Analytics
Multimedia
Str
eam
ing
Dat
a
Map
Red
uce
7
TEACHING PROGRAM
Prof. Dr. Hendrik Meth
BDA TEACHING PORTFOLIO
Bachelor
Master
Enable students to design big data system architectures and apply techniques for the analysis of high volume, heterogeneous data
BIG DATA ANALYTICS I
BIG DATA ANALYTICS II
ADVANCED DATA SCIENCE
PR
OJE
CT
GR
OU
PS
AN
D
CA
MP
US
CH
ALL
ENG
ES
DATA SCIENCEFOR ANALYSTS
9
DATABASES
ANALYTICAL INFORMATION
SYSTEMS
BUSINESS INTELLIGENCE
BUSINESS INTELLIGENCE APPLICATIONS
Prof. Dr. Hendrik Meth
BDA TEACHING PORTFOLIO
BIG DATA ANALYTICS I (Bachelor)
• Introduction and Key Concepts
• Architectures and Use Cases• Big Data Analytics
(Structured Data)• Implementation and Use of
Big Data Systems
BIG DATA ANALYTICS II (Bachelor)
• Introduction and Key Concepts
• Architectures and Use Cases• Big Data Analytics
(Unstructured Data)• Design of Big Data Systems
ADVANCED DATA SCIENCE (Bachelor)
• Advanced Use Cases, Methods and Technologies:
• Advanced Methods (Predictive Analytics, Regression Analysis,..
• Visual Analytics• R
PROJECT GROUPS AND CAMPUS CHALLENGES
(Master)
• Project Groups, e.g. implementing innovative artifacts based on packaged software solutions
• Campus Challenge, e.g. solving case studies defined by industry partners
DATA SCIENCEFOR ANALYSTS(Master)
Data Science with focus on analyst/ application perspectiveIntroduction and Key ConceptsData PreparationMethods and Techniques
10
Prof. Dr. Hendrik Meth
BIG DATA ANALYTICS I
EXERCISESBottom-Up Structure from Foundation to Advanced Level
Data Preparation and Exploration
Clustering, Classification, Association Analysis
Map-Reduce Method
Natural Language Processing
Time Series Analysis
Flare Charts
Text Mining, Web Mining, Natural Language Processing
BIG DATA ANALYTICS II
ADVANCED DATA SCIENCE
Exemplary Methods: Exemplary Technologies:
11
Prof. Dr. Hendrik Meth
PART-TIME MASTER PROGRAM Starting Autumn 2016
RESEARCH PROGRAM
Prof. Dr. Hendrik Meth
Technology & Architecture PerspectiveScale Up vs. Scale Out, Open Source vs. Commercial Software,
Cloud vs. On Premise, Monolithic vs. Hybrid Systems
Organizational and Process PerspectiveVendor vs. Customer, Developer vs. User
Design processes vs. Implementation and Use processes, Development of algorithms vs. Application of packaged solutions
User-Centered Design and Implementation
Big DataSystems
RESEARCH DIRECTION
14
Vision: Design and Implementation of User-Centered Big Data Solutions
Prof. Dr. Hendrik Meth
EXEMPLARY RESEARCH QUESTIONS
• How can enterprises be prepared for efficient Big Data Systems (BDS) implementation and use ?
• How can BDS be implemented in a User-Centered Approach?
• How can existing Business Intelligence and Big Data functionality be integrated on vendor and customer side?
15
BLENDING RESEARCH AND TEACHING
Prof. Dr. Hendrik Meth
EXAMPLE RESEARCH SETUP & QUESTION
• Setup: Thesis projects (bachelor / master) in co-operation university + industry partner
• Research question with practical relevance in Big Data / BI context
• General research question: How can the potential for Big Data technology investments be estimated upfront?– Project 1: Usage-based approach
– Project 2: Experimental approach
17
Prof. Dr. Hendrik Meth
Research Project1 - Setup
• Specific research question: How can the potential of Big Data technology investments be estimated based on existing usage data?
• Project participants & format• Co-operation of HS Worms and BorgWarner (Automotive Supplier)
• Bachelor Thesis Project
• Estimate the potential improvement rate resulting from a SAP HANA database migration within a SAP BW system based on SAP BW technical content (usage statistics)
18
Prof. Dr. Hendrik Meth
Research Project1 - Setup
• Data basis: Available usage data of BorgWarner BI system from a 6 month time frame
• Determine overall query runtimes and individual fractions– Database
– Application Server
– Network
– Client
• Compare results between two large applications: Sales Reporting and Supply Chain Reporting
Database Application
Server
Network Client
Execution Time Query:
19
Prof. Dr. Hendrik Meth
Research Project1 - Results
*Key Figures analyzed for all Report executions between 01/05/2014 and 19/12/2014 divided by GSM and Sales
Database
63%
Client / Network / App
Server
37%
Database
38%
Client / Network /
App Server
62%
SCM
Sales
Avg.
Runtime
: 124,80
seconds
Avg.
Runtime
: 7,71
seconds
20
Prof. Dr. Hendrik Meth
Research Project2 - Setup
• Main research question behind the study: Can the potential performance improvements of SAP HANA be realized in a data modelling and reporting setup comparable to BorgWarner’s system landscape ?
• Project participants & format• Co-operation of InES (University of Mannheim) and BorgWarner (Automotive Supplier)
• Master Thesis Project
• Compare three variants with regards to data loading / reporting performance Model-A: SAP BW 7.3 on relational database using LSA modeling approach
Model-B: SAP BW 7.3 on SAP HANA database using LSA modeling approach
Model-C: SAP BW 7.3 on SAP HANA database leveraging HANA-optimizedmodelling
21
Prof. Dr. Hendrik Meth
Research Project2 - Setup
• Create a data model similar to existing BorgWarner environment
• Utilize real-world data from BorgWarner along three cases:
Case A: 1 million records
Case B: 2 million records
Case C: 3.5 million records.
• Create different types of representative queries (for reporting)
• Run 5 different iterations
• Provide infrastructures in Big Data Innovation Center Magdeburg (BW on HANA / BW on relational database) and run evaluation in controlled lab environment.
22
Prof. Dr. Hendrik Meth
Research Project2 - Selected Results*:Loading Performance
Reporting Performance (simple / mid-complex queries):
* for Case C – 3.5 million data sets
23
Prof. Dr. Hendrik Meth
How can this kind of data be used?
Estimate Benefits of an IT investment
• Idea: Measure the amount of time which users are currently spending to wait on report results
• Estimate the share of time which would be additionally available to analysts after HANA has been implemented
• Multiply additional time with average hourly rate of an analyst
• Additional time (in h) * Hourly Rate (in EUR/h) = Performance Benefit (in EUR)
30.000 hours
10.000
hours
50€ / hour
500.000€
Example:
24
Prof. Dr. Hendrik Meth
Key Takeaways
Business Intelligence and Big Data Analytics provide extensive conceptual overlaps
Complementary teaching programs will be needed in order to fulfill the demands of industry, science and society
Ample potential for blended teaching/research projects in this area
12
3
25
Prof. Dr. Hendrik Meth
Questions?
26
Prof. Dr. Hendrik MethFaculty for Information and CommunicationBusiness Information Systems and Digital MediaBig Data and Data Science
STUTTGART MEDIA UNIVERSITYNobelstrasse 10DE-70569 Stuttgart / Germany
eMail: [email protected]