Post on 15-Dec-2015
© 2014 Wellesley Information Services. All rights reserved.
What you need to know to get the most out of your
Enterprise Data Warehouse
Dr. Bjarne BergComerit
2
What We’ll Cover …
• Introduction• EDW architectural options
Federated Data Warehouse Centralized Data Warehouse Distributed Data Warehouse
• Data Integration challenges Masterdata Transaction data conversion Data cleansing
• The IT "Jail"• Creating The Support Organization• The top 10 EDW pitfalls • Wrap-up
3
We will take a look at the pros and cons of EDW architectural options, including federated, centralized, and distributed EDW models, and explore when each approach is appropriate.
See options for consolidating different master and transactional data.
Weigh your options for building a centralized or a decentralized EDW support organization.
Examine the top 10 pitfalls companies face when implementing SAP NetWeaver BW as their EDW and how to overcome them.
In this session.
4
A Quick Definition: BI Vs. Data Warehousing
Data warehousing is the act of extracting, transferring, transforming, storing and retrieval of data for reporting and analytical purposes.
Business Intelligence (BI) is a terminology for applications that uses data stores for analytical purposes.
KEY CONCEPT:BI applications are not required to run
on top of data warehouses, but the majority does
KEY CONCEPT:BI applications are not required to run
on top of data warehouses, but the majority does
5
Before you start: Know your DW Governance model
Know your organization before attempting a
centralized EDW effort - do you have enough sponsorship
to make real changes?
Know your organization before attempting a
centralized EDW effort - do you have enough sponsorship
to make real changes?
Many EDW efforts fail, due to the IT governance changes needed to be successful.
EDWs rarely succeeds in businesses modeled as federal, feudal, IT duopoly or anarchy.
6
What We’ll Cover …
• Introduction• EDW architectural options
Federated Data Warehouse Centralized Data Warehouse Distributed Data Warehouse
•Data Integration challengesMasterdataTransaction data conversionData cleansing
•The IT "Jail"•Creating The Support Organization•The top 10 EDW pitfalls •Wrap-up
A Logical Enterprise DW Architecture
Metadata
DataExtractionIntegration
andCleansingProcesses
Custom Developed Applications
DataMining
Statistical Programs
Query Access Tools
Data Resource Management and Quality Assurance
SummarizedData
SegmentedData Subsets
Functional Area
Summation
Marketingand Sales
Purchasing
CorporateInformation
Product Line
Location
PurchasingSystems
InvoicingSystems
GeneralLedger
External DataSources
Other InternalSystems
Translate
Attribute
Calculate
Derive
Summarize
Synchronize
Source Data ExtractOperationalData Store Transform
DataWarehouse BI Applications
Source: Bjarne Berg, “Introduction to Data Warehousing”
8
The Federated Data Warehouse (FDW) Architecture
Metadata
SAP BOBJ OLAP
Universes
Ad-HocWebi
OLAPAnalysis
DashboardsXcelcius
Batch reportsCrystal
Data Resource Management and Quality Assurance
Security
Training
User Support
Projects
Ad-hoc
Synchronization
IT Developed Semantic Layer
IT Support & Development
Business Driven BI Applications
IT Driven Data Warehouses
BEx Explorer
SAP BWA
Data Warehouse(s)Data Warehouse(s)
DW ODSs
DWStar-schemas
SAP BW(s)SAP BW(s)
SAP DSOs
SAPBW InfoCubes
SAP BOBJ SQL
Universes
Direct Connections
Custom and 3rd party
SAP BOBJ Data Services
BPC
External Applications
Financial Report center
Enterprise Portal
SalesReport center
ManufacturingReport center
HRReport center
Partner facingReport center
Ad-HocReport center
Customer facingReport center
Users
Employees
Customers
Partners
9
Federated Data Warehouse (FDW) Architecture
• Federated Data Warehouses are best in very large organization where development is separated by geography, organizational boundaries, or where multiple data warehouses exists due to mergers & acquisitions.
• To make FDWs successful, there needs to be a rapid convergence to standardized technologies. This include:
Same type of databases and support pack levels (costs and compatibility)Same technical platforms Hardware, Backups and Archiving (costs)Shared Portal and user interface strategy (reduced training and support)Shared security design and centralized administration (risk management)
If the data is federated you gain faster response time to business needs, can execute multiple projects in parallel, and work 24/7 across the globe. But without any standardization, it can also be very costly.
10
The Centralized Data Warehouse (CDW) Architecture
Metadata
SAP BOBJ OLAP
Universes
Ad-HocWebi
OLAPAnalysis
DashboardsXcelcius
Batch reportsCrystal
Data Resource Management and Quality Assurance
Security
Training
User Support
Projects
Ad-hoc
Synchronization
IT Developed Semantic Layer
IT Support & Development
Business Driven BI Applications
IT Driven Data Warehouses
BEx Explorer
SAP BWA
SAP BWSAP BW
SAP DSOs
SAPBW InfoCubes
SAP BOBJ SQL
Universes
Direct Connections
Custom and 3rd party
SAP BOBJ Data Services
BPC
External Applications
Financial Report center
Enterprise Portal
SalesReport center
ManufacturingReport center
HRReport center
Partner facingReport center
Ad-HocReport center
Customer facingReport center
Users
Employees
Customers
Partners
OLTP sourcesOLTP sources
SAP ECCSiebel, JDE
OracleOthers
11
Centralized Data Warehouse (CDW) Architecture
• Centralized Data Warehouses are great for small and mid-size data warehouses (less than 15-40Tb). There are great benefits in terms of the ease to mange upgrades, support packs, enforcing development standards, transport control, master data management and the overall total cost of ownership
• To make CDWs successful, there needs to be: Adequate funding of hardware, application servers, database servers Serious consideration should be made to move BI and reporting to BWA Focus on using the database capacity on storage and data loads-- not queries No direct reporting from DSOs (takes too much system resources) Broadcasting , caching and performance tuning is a dedicated support effort A plan for data partitioning and archiving needs to be in-place as soon as the
system exceeds 5-8 TB.
If the data is centralized it is faster to develop new solutions for the business and merging from different data sources are easier
12
The De-centralized Data Warehouse (DDW) Architecture
Metadata
SAP BOBJ OLAP
Universes
Ad-HocWebi
OLAPAnalysis
DashboardsXcelcius
Batch reportsCrystal
Data Resource Management and Quality Assurance
Security
Training
User Support
Projects
Ad-hoc
Synchronization
IT Developed Semantic Layer
IT Support & Development
Business Driven BI Applications
IT Driven Data Warehouses
BEx Explorer
SAP BWA
SAP BW(s)SAP BW(s)
SAP DSOs
SAPBW InfoCubes
SAP BOBJ SQL
Universes
Direct Connections
Custom and 3rd party
SAP BOBJ Data Services
BPC
External Applications
Financial Report center
Enterprise Portal
SalesReport center
ManufacturingReport center
HRReport center
Partner facingReport center
Ad-HocReport center
Customer facingReport center
Users
Employees
Customers
Partners
SAP BW(s)SAP BW(s)
SAP DSOs
SAPBW InfoCubes
13
De-centralized Data Warehouse (DDW) Architecture
• A Decentralized Data Warehouses makes sense if there are logical division between business units, geographies and little shared reporting I.e. in a conglomerate organization with diverse business units.
• The benefits of DDWs include the flexibility of the FDW with the technology standardization and lower cost of ownership of the CDW. To make DDWs successful, there needs to be:
A formal Masterdata Management (MDM) strategy with clearly defined standards
A rule based data cleaning and data integration plan for centralized reporting
A shared hardware location to keep costs lower
Tight integration with upgrades, support packs and interface standards
With DDWs there is a risk of creating stove-pipe data marts that cannot be integrated at the corporate level without very high costs.
14
Recommendations CDW, FDW and DDW Architectures
Federated Data Warehouse (FDW)
Centralized Data Warehouse (CDW)
De-centralized Data Warehouse (DDW)
Best for very large organization where development is separated by geography, organizational boundaries, or where multiple data warehouses exists due to mergers & acquisitions.
Best for small and mid-size data warehouses in organizations.
If there are logical division between business units, geographies and little shared reporting I.e. in a conglomerate organization with diverse business units.
Max. Size Virtually unlimited 40+ Tb Virtually unlimited
Use same type of databases, ETL tools and support levels (costs & compatibility)
Adequate funding of hardware, application servers, database servers
A formal Masterdata Management (MDM) strategy with clearly defined standards
Use the same O/S, Hardware, Backups and Archiving systems (costs)
Implement BWA A rule based data cleaning and data integration plan for centralized reporting
Shared Portal and user interface strategy (reduced training and support)
Use the database capacity on data loads not queries
Use a shared hardware location to keep support costs lower
Shared security design and centralized administration (information risk management)
Direct reporting from DSOs should not be allowed
Tight integration with upgrades, support packs and interface standards
Performance tuning should be a dedicated support team effort
Issues
Without any standardization, it can be very costly.
Performance can be poor. An archiving plan is essential when the system exceeds 5-8 Tb.
There is a risk of creating stove-pipe data marts that cannot be integrated at the corporate level without very high costs.
Success factors
Organization
15
What We’ll Cover …
• Introduction• EDW architectural options
Federated Data Warehouse Centralized Data Warehouse Distributed Data Warehouse
•Data Integration challengesMasterdataTransaction data conversionData cleansing
•The IT "Jail"•Creating The Support Organization•The top 10 EDW pitfalls •Wrap-up
1616
The 3-Tiers of Information Management
For all data warehouses 60-80% of the effort is to move, store, retrieve and integrate data from various source systems.
Information management is six distinct efforts. Therefore, several tools exists with different capabilities
Applications
ERP, SCM, CRM
Business Intelligence
Data Synchronization & Migration
Performance Management
Information Management
Data Federation
Data Integration
Text Analysis
Metadata Mgmt.
Masterdata Mgmt.
Data Quality
Structured UnstructuredData Data
RDBMS
ERP
RDBMS
ERP
Notes
Web
Docs
17
The BI Data Services Architecture
Data integration in an EDW can be done with ETL tools like SAP BOBJ Data Services. The tool architectural can be illustrated in terms of source data, process and target data.
ProcessData
ValidationData
CleansingData
Auditing
Data Profiling
SourceData
PeopleSoft
Oracle Apps
Data Services Engine
Siebel
SAP R/3
Oracle DB
SAP BI NetWeaver
SQL DB
DB2
XML
Files
Mainframe Excel
OthersSAP ECC
TargetDataTargetData
PeopleSoft
Oracle Apps
Siebel
SAP R/3
Oracle DB
SAP BI NetWeaver
SQL DB
DB2
XML
Files
Mainframe Excel
OthersSAP ECC
Reconciliation Between Systems
The majority of time spent on maintaining a complex EDW is the time spent on reconciliation of the data
You have to prove that the data in the warehouse is equal to the data you extracted, or your financial reporting systems will have no credibility.
You are also legally required to have a reconciliation process that can be tracked, if you use the warehouse for financial reporting to external entities.
Reconciliation Between Systems- Dashboards
Many companies invest in developing manual control queries, while others use reconciliation products that are powered by SAP NetWeaver
An example of a reconciliation Dashboard built on SAP BW. In this example:
1. A reconciliation memo was written on Feb. 1st
2. PCA reconciliation between BW and R/3 failed on Feb. 16th
2020
Interesting use for SAP NetWeaver BI
Using an ETL Tool like BOBJ Data Services you can consolidate data from many source systems, cleanse and integrate them before you send it to the EDW. This avoids complex logic.
Source systems- Oracle- JDE- Peoplesoft- Baan- Siebel- Custom- Hyperion- Other.
21
Data Cleansing Capabilities
The ValidationValidation allows you to create rules for cleaning data prior to loading it to the system. You can have a pass rule and an 'Action on Failure' that can provide complex logic.
22
Data Cleansing Capabilities
The AuditThe Auditing selection allows you to take complex actions when the data quality is poor.
You can:
1. Send an email to an administrator
2. Load the data to a table for later correction
3. Modify the data through scripts
4. Create custom functions for your own processing logic
Universal Data Cleansing: Example of Enhanced Party Masterdata
Source: SAP AG, 2009
You can also add new items such as geocodes for visualization in SAP BI I.e. maps
You can add new characteristics to the data such as:
1) Legal tax jurisdictions 2) Census track ID3) Block group ID4) Insurance rating territories5) Tax authority name6) Tax authority FIPS codes7) Longitude & Latitude8) City type9)...
GREAT FEATURE: The Census track ID allows you to analyze your customers and partners using government census information
Universal Data Cleansing: Customer Aggregating & Discovery
A common way to look at customer data is by Households instead of single records.
BOBJ DQ allows you to look at customer's addresses and create shared master records, customer mapping keys, aggregating data (i.e. aggregated sales data for the household), check "no-call" lists, examining churn (apparent customer turn-over).
You can also integrating all master data from many records into a single "super record" that contains all the unique master data you have about a single customer or partner.
Universal Data Cleansing: Data integration & BAS
The Business Address Service (BAS) feature can:
1) Use Postal reference files from 190 countries to clean address, including suggestion lists
2) Data scans and searches in SAP for duplicate records using partial user input.
26
What We’ll Cover …
• Introduction• EDW architectural options
Federated Data Warehouse Centralized Data Warehouse Distributed Data Warehouse
•Data Integration challengesMasterdataTransaction data conversionData cleansing
•The IT "Jail"•Creating The Support Organization•The top 10 EDW pitfalls •Wrap-up
27
IT cannot hold BI ‘hostage’ with long delivery times and slow responses to changing user demands.
The only way to be successful is to provide flexible data structures and cleansed, integrated data to the business and let the business groups take over the BI development.
So what is needed is a stronger emphasis on scalable, fast IT solutions and a ramp up of BI capabilities of the business units.
Separate the Data Warehouse from the BI solutions
Keeping BI front-end solutions such as Webi, Visual Composer and Analysis in the hands of IT instead of the business will create
inflexible systems that are unlikely to succeed.
28
What We’ll Cover …
• Introduction• EDW architectural options
Federated Data Warehouse Centralized Data Warehouse Distributed Data Warehouse
•Data Integration challengesMasterdataTransaction data conversionData cleansing
•The IT "Jail"•Creating The Support Organization•The top 10 EDW pitfalls •Wrap-up
BI Support Organization — Big Picture
You need to separate the operations of BI systems from the project work
If there is no support organization, the BI system quickly becomes an orphan when the project ends
Without a support org. there is a risk that future BI projects are delayed sincethe project team has to support previous projects
The BI Help Desk — Level 1 Support
The first level support should be done by Power Users in the organization
You will have to train these resources, empower them to make changes, and leverage them as much as possible, even when it is easy to “jump to solutions”
Query related support tickets from a central location/Web site should be routed to the
power users in each department.
The power user can escalate the ticket to Level- 2 support if he/she is unable to resolve it.
The BI Help Desk — Level 2 Support
The second level support is used for issues that are not related to queries, presentations, reports, and formatting
This include data loads, performance, security, availability, training schedules, etc.
This is addressed by the central support team
Some support ticket types are always routed to Level 2 support.
It is important to have a generic email address for Level 2 support that is not related to an individual. Emails to this address should not be deleted.
TrainingProject Stack
Break fix and Production stack
Break-Fix - Splitting Projects & Support Environments
By Introducing a Break-Fix (BWB) environment, the support team can correct break-fixes and move code into the Testing environment (BWQ) and Production environment (BWP) without impacting the project team
Transports can be captured in the buffer and moved to the Development environment (BWD) on a periodic basis
BWD
BWS
BWT
BWB BWQ BWP
The Break-Fix and production stack as well as the training environment is owned by the support team.
The project teams own the development and Sandbox environments (BWS and BWD).
33
What We’ll Cover …
• Introduction• EDW architectural options
Federated Data Warehouse Centralized Data Warehouse Distributed Data Warehouse
•Data Integration challengesMasterdataTransaction data conversionData cleansing
•The IT "Jail"•Creating The Support Organization•The top 10 EDW pitfalls •Wrap-up
Pitfall #1: Lack of Reasonable SLA with EDW Support Team
Some examples of reasonable performance include:
1. 90% of all queries run under 20 seconds2. System is available 98% of the time3. Data loads are available at 8am — 99% of the time4. User support tickets are answered within 30 minutes
(first response)5. User support tickets are closed within 48 hours — 95% of the time.6. System is never unavailable for more than 72 hrs — including
upgrades, service packs, and disaster recovery7. Delta backups are done each 24 cycle and system backups are
done every weekend
More EDW Pitfalls….
Pitfall #2: Jack-of-all-trades Master of none….BI is complex with many different tools and technologies. Don’t rely
on a single person with no specialized skills. Make each person responsible for a focused technology/task.
Pitfall #3: An army of ‘Architects’ who don’t understand Technology.Have one ‘architect’ – quality is more important than quantityArchitecture is technical by nature. PowerPoints only gets you a
small part of the way.The BI architect should know the technology better than anyone in
the room and be able to design solutions.
More EDW Pitfalls….
Pitfall #4: Not separating the Support Team from the Project team Keeping the ‘lights-on’ is a core focus area. Many EDWs fail because of lack of training, production and user
support, and by having nobody around to do continuous improvements.
Pitfall #5: A Firm Belief in Monolithic Data WarehousesGoogle runs on over 500,000 servers, why must your data warehouse
run on one?Divide and concur when the performance becomes a too-large
problem.You don’t need a monolithic castle, but storage & performance
More EDW Pitfalls….
Pitfall #6: Analysis Paralysis.You will never have perfect EDW requirements – get over it….The business will change and so will the BI system. Change is a sign
of success not failures (people who cares wants to make it better).Not moving forward and keep analyzing is a costly decision…
Pitfall #7: A Single User Interface will solve all my EDW problems..There are no magic bullets. Most companies need 2-3 end user tools.Start with OLAP, then continue with ad-hoc querying, and finalize with
dashboards. All other tools are great, but not a starting point.Remember you first crawled and walked before you ran.
More EDW Pitfalls….
Pitfall #8: Enforce EDW StandardsStandards are not a word document buried in a file cabinet If you allow ‘exceptions’ the standards quickly become meaningless. It costs to keep your house clean, but data management and data
integration will benefit greatly from it. Remember: “the road to hell is paved with good intentions” - unknown.
Pitfall #9: Keep Your EDW Support Team motivatedThe average application developer stays on the job for 47 months, the
average support person is only there for 25 months! It is very expensive to use the support team as a training ground for
technical staff and it hurts performance.Make the support team a ‘cool’ place to work with flexible hours and defined career paths.
Final EDW Pitfall.
Pitfall #10: Not Creating a ‘BI Technology Advisory Board’ for the EDW
Use ad-hoc best practice advise from external experts on an periodic basis.
If you are struggling with something, there are many others who have ‘cracked the nut’ already – leverage their experiences.
Attend BI conferences, take good notes and leverage the many experts at the booths, the speakers and the forums.
You are not alone, but your team needs to get ‘plugged into’ the many EDW and Technology on-line communities.
40
What We’ll Cover …
• Introduction• EDW architectural options
Federated Data Warehouse Centralized Data Warehouse Distributed Data Warehouse
•Data Integration challengesMasterdataTransaction data conversionData cleansing
•The IT "Jail"•Creating The Support Organization•The top 10 EDW pitfalls •Wrap-up
41
Resources
• Fundamentals of Data Warehouses by Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, and
Panos Vassiliadis
• Implementing Enterprise Data Warehousing: A Guide for Executives by Alan Schlukbier
42
7 Key Points to Take Home
• There are more than one way to architect an EDW. However, you need to make sure your BI solution is designed, not evolutionary
• Consider FDW and DDWs when data volumes are extremely high or your company just underwent a merger or acquisition
• Make the front-end independent from the backend
• Formalize a data integration strategy with MDM and Reconsolidation as key focus areas
• Invest in people, not just technology –Great support staff is key to EDW success
• Create a BI technology advisory board and have periodic meetings
43
Your Turn!
How to contact me:Dr. Bjarne Berg
BergB@LRU.edu
Disclaimer
SAP, R/3, mySAP, mySAP.com, SAP NetWeaver®, Duet™, PartnerEdge, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are the trademarks of their respective companies. Wellesley Information Services is neither owned nor controlled by SAP.