LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External...
Transcript of LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External...
A company of Daimler AG
LECTURE @DHBW: DATA WAREHOUSE
PART LX: PROJECT MANAGEMENTANDREAS BUCKENHOFER, DAIMLER TSS
ABOUT ME
https://de.linkedin.com/in/buckenhofer
https://twitter.com/ABuckenhofer
https://www.doag.org/de/themen/datenbank/in-memory/
http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/
https://www.xing.com/profile/Andreas_Buckenhofer2
Andreas BuckenhoferSenior DB [email protected]
Since 2009 at Daimler TSS Department: Big Data Business Unit: Analytics
ANDREAS BUCKENHOFER, DAIMLER TSS GMBH
Data Warehouse / DHBWDaimler TSS 3
“Forming good abstractions and avoiding complexity is an essential part of a successful data architecture”
Data has always been my main focus during my long-time occupation in the area of data integration. I work for Daimler TSS as Database Professional and Data Architect with over 20 years of experience in Data Warehouse projects. I am working with Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things, experiment, and program every day.
I share my knowledge in internal presentations or as a speaker at international conferences. I'm regularly giving a full lecture on Data Warehousing and a seminar on modern data architectures at Baden-Wuerttemberg Cooperative State University DHBW. I also gained international experience through a two-year project in Greater London and several business trips to Asia.
I’m responsible for In-Memory DB Computing at the independent German Oracle User Group (DOAG) and was honored by Oracle as ACE Associate. I hold current certifications such as "Certified Data Vault 2.0 Practitioner (CDVP2)", "Big Data Architect“, „Oracle Database 12c Administrator Certified Professional“, “IBM InfoSphere Change Data Capture Technical Professional”, etc.
DHBWDOAG
Contact/Connect
As a 100% Daimler subsidiary, we give
100 percent, always and never less.
We love IT and pull out all the stops to
aid Daimler's development with our
expertise on its journey into the future.
Our objective: We make Daimler the
most innovative and digital mobility
company.
NOT JUST AVERAGE: OUTSTANDING.
Daimler TSS Data Warehouse / DHBW 4
INTERNAL IT PARTNER FOR DAIMLER
+ Holistic solutions according to the Daimler guidelines
+ IT strategy
+ Security
+ Architecture
+ Developing and securing know-how
+ TSS is a partner who can be trusted with sensitive data
As subsidiary: maximum added value for Daimler
+ Market closeness
+ Independence
+ Flexibility (short decision making process,
ability to react quickly)
Daimler TSS 5Data Warehouse / DHBW
Daimler TSS
LOCATIONS
Data Warehouse / DHBW
Daimler TSS China
Hub Beijing
10 employees
Daimler TSS Malaysia
Hub Kuala Lumpur
42 employeesDaimler TSS IndiaHub Bangalore22 employees
Daimler TSS Germany
7 locations
1000 employees*
Ulm (Headquarters)
Stuttgart
Berlin
Karlsruhe
* as of August 2017
6
• After the end of this lecture you will be able to
• Understand lifecycle of DWH projects
WHAT YOU WILL LEARN TODAY
Data Warehouse / DHBWDaimler TSS 7
LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 8
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging Layer(Input Layer)
OLTP
OLTP
Core Warehouse
Layer(Storage
Layer)
Mart Layer(Output Layer)
(Reporting Layer)
Integration Layer
(Cleansing Layer)
Aggregation Layer
Metadata Management
Security
DWH Manager incl. Monitor
Top Down (Inmon)
Bottom Up (Kimball)
Top-Down (Inmon)
• Comprehensive approach regarding available data
• Design Core Warehouse Layer = integrated data model first considering all requirements
• Design data marts afterwards
Bottom-Up (Kimball)
• Approach focusing on fast delivery of first results
• Design one data mart first
• Next Marts are modeled afterwards usually using Kimball architecture
• conformed dimensions to integrate different data marts / fact tables
TOP-DOWN VS BOTTOM-UP APPROACH
Data Warehouse / DHBWDaimler TSS 9
TOP-DOWN VS BOTTOM-UP APPROACHADVANTAGES AND DISADVANTAGES
Data Warehouse / DHBWDaimler TSS 10
Top-Down (Inmon) Bottom-Up (Kimball)
☺ Core Warehouse Layer is designed optimal ☺ Early involvement of end users
☺ Data from Core Warehouse Layer is reused in many Marts
☺ Fast results
Time-consuming approach with high preparatory effort
Focus on single Marts leads to risk that overall view is lost, esp. properly designed Core Warehouse Layer
High risk with changing requirements Data often not reused but inconsistently copied across Marts
Both approaches have their down-sides
• Top-Down takes enormous initial effort to build data model for Core Warehouse Layer
• Bottom-Up is risky as central / integrated focus is lost
→Think big, start small
• Think Big: Design conceptual data model for Core Warehouse Layer covering whole enterprise
• Start small: Implement physical data model for Core and Mart Layer in iterations by each business department
THINK BIG, START LOCAL
Data Warehouse / DHBWDaimler TSS 11
• DWH is not a product
• DWH databases are more complex with different layers and data models
• Data first, code is secondary
• Data quality is a major concern
• Data integration is a challenging objective
• Business need difficult to justify quantitatively
WHAT’S DIFFERENT IN DWH PROJECTS?
Data Warehouse / DHBWDaimler TSS 12
WHY DO DWH PROJECTS FAIL?
Data Warehouse / DHBWDaimler TSS 13
AGILITY IN THE DWH: CASE STUDY@BOSCH
Data Warehouse / DHBWDaimler TSS 14
Source: https://www.informatik-aktuell.de/management-und-recht/projektmanagement/eine-konkrete-geschichte-der-agilitaet-im-data-warehouse.html
Define 3-5 criteria for the evaluation of an ETL tool
How does a relational DBMS (like Oracle, DB2, MS SQL Server) meet these requirements?
EXERCISE
Data Warehouse / DHBWDaimler TSS 15
• Supplier profile
• Support
• HW/SW requirements
• License / maintenance Costs
• Usability
• Reliability
• Performance and scalability
• Multi-tenant
• Interfaces
• Scheduling
EXERCISE - DEFINE 5 CRITERIA FOR THE EVALUATION OF AN ETL TOOL
Data Warehouse / DHBWDaimler TSS 16
• RDBMS provide many of the functionalities but additional programming required
• RDBMS are often used for ETL/ELT by programming with SQL, PL/SQL, SQLT, etc
EXERCISE - HOW DOES A RELATIONAL DBMS MEET THESE REQUIREMENTS?
Data Warehouse / DHBWDaimler TSS 17
ETL Tool Manual ETL
Informatica, Talend, Oracle ODI, etc. SQL, PL/SQL, SQLT, etc.
Separate license No additional license
Workflow, error handling, and restart/recovery functionality included
Workflow, error handling, and restart/recovery functionality must be implemented manually
Impact analysis and where-used (lineage) functionality available
Impact analysis and where-used (lineage) functionality difficult
Faster development, easier maintenance Slower development, more difficult maintenance
Additional (Tool-) Know How required Know How often available
Daimler TSS GmbHWilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99
[email protected] / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSSDomicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle
Data Warehouse / DHBWDaimler TSS 18
THANK YOU
Feasibility study Analysis Design Implementation TestOperations and maintenance
PROJECT PHASESSMALL ITERATIONS INSTEAD OF LONG PHASES
Data Warehouse / DHBWDaimler TSS 19
Gartner:
…a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and consumers across an organization. The goal of DataOps is to create predictable delivery and change management of data, data models and related artifacts. DataOps uses technology to automate data delivery with the appropriate levels of security, quality and metadata to improve the use and value of data in a dynamic environment.
DATA OPS
Data Warehouse / DHBWDaimler TSS 20
Source: https://blogs.gartner.com/nick-heudecker/hyping-dataops/
HYPE CYCLE FOR DATA MANAGEMENT
Data Warehouse / DHBWDaimler TSS 21
• DataOps is a new way of working and collaborating (same with DevOps)
• DataOps collaboration typically occurs between technical and non-technical staff compared to DevOps
• Language barrier between these two parties (e.g. skills mismatch)
• Therefore required is a core enabler like data literacy
• Data literacy is the ability to understand data, to build knowledge from data, and to communicate information/meaning to others
• DataOps can 't be achieved by buying tools
DATAOPS IS ABOUT ORGANIZATIONAL CHANGE
Data Warehouse / DHBWDaimler TSS 22
Source: https://blogs.gartner.com/nick-heudecker/hyping-dataops/
Organizational team that coordinate and standardize DWH activities within an (end user) organization
• Define standards and create BI portfolio (e.g. which tools/products to use)
• Create DWH architecture and govern BI activities
• Establish processes for business and IT interaction
• Monitor DWH/BI market for new trends
• Determine skills and experience of Business users
BICC: BI CENTER OF EXCELLENCE
Data Warehouse / DHBWDaimler TSS 23
4-QUADRANT MODEL (RONALD DAMHOF)
Data Warehouse / DHBWDaimler TSS 24
Source: http://prudenza.typepad.com/files/english---the-data-quadrant-model-interview-ronald-damhof.pdf