DWH Overview
Transcript of DWH Overview
-
8/17/2019 DWH Overview
1/35
An Introduction toAn Introduction to
Data WarehousingData Warehousing
• Adil SiddiquiAdil Siddiqui• [email protected]@tcs.com
-
8/17/2019 DWH Overview
2/35
-
8/17/2019 DWH Overview
3/35
O%&ectivesO%&ectives
• At the end of this lesson' (ou will )now *At the end of this lesson' (ou will )now *+ What is Data WarehousingWhat is Data Warehousing
+ !he evolution of Data Warehousing !he evolution of Data Warehousing+ Need for Data WarehousingNeed for Data Warehousing+ O !" #s Warehouse ApplicationsO !" #s Warehouse Applications+
Data marts #s Data WarehousesData marts #s Data Warehouses
+ Operational Data $toresOperational Data $tores+ Overview of Warehouse ArchitectureOverview of Warehouse Architecture
-
8/17/2019 DWH Overview
4/35
What is a Data Warehouse ,What is a Data Warehouse ,
A data warehouse is a A data warehouse is a subject-oriented,sub
ject-oriented, integrated,inte
grated, nonvolatile,nonvolatile, time-variant time-variant collection of data in support ofcollection of data in support of
management's decisions.management's decisions.
- WH Inmon- WH Inmon
WH Inmon - Regarded As Father Of Data WarehousingWH Inmon - Regarded As Father Of Data Warehousing
Data stored forhistorical period.Data is populated inthe data warehouseon daily/weeklybasis dependingupon there uirement.
Data stored forhistorical period.Data is populated inthe data warehouseon daily/weekly
basis dependingupon therequirement.
an I see creditreport fromAccounts! Salesfrom marketingand open orderreport from orderentry for thiscustomer
an I see creditreport fromAccounts! Salesfrom marketingand open orderreport from orderentry for thiscustomer
Data frommultiplesources isintegrated fora sub"ect
Data from
multiplesources isintegrated fora sub"ect
Identicalqueries will gi#esame results atdi$erent times.Supportsanalysisrequiring
historical data
Identicalqueries will gi#esame results atdi$erent times.Supportsanalysisrequiringhistorical data
-
8/17/2019 DWH Overview
5/35
$u%&ect-Oriented-$u%&ect-Oriented-Characteristics of a DataCharacteristics of a DataWarehouseWarehouse
uotes
eads
Orders
"rospects
Operational
DataWarehouse
Customers "roducts
Regions !ime
Focus is on Subject Areas rather than ApplicationsFocus is on Subject Areas rather than Applications
-
8/17/2019 DWH Overview
6/35
Non-volatile -Non-volatile -
Characteristics of a DataCharacteristics of a DataWarehouseWarehouse
Operational DataWarehouse
replace change
insert
changeinsert
delete load
read onl(access
Integrated Vie Is !he "ssence Of A Data WarehouseIntegrated Vie Is !he "ssence Of A Data Warehouse
-
8/17/2019 DWH Overview
7/35
!ime #ariant - !ime #ariant -
Characteristics of a DataCharacteristics of a DataWarehouseWarehouse
Operational DataWarehouse
Current #alue data• time hori.on * /0-10 da(s• )e( ma( not have element of
time
$napshot data• time hori.on * 2-30 (ears• )e( has an element of time• data warehouse storeshistorical data
Data Warehouse !#picall# Spans Across !imeData Warehouse !#picall# Spans Across !ime
-
8/17/2019 DWH Overview
8/35
-
8/17/2019 DWH Overview
9/35
-
8/17/2019 DWH Overview
10/35
Evolution of DataEvolution of Data
WarehousingWarehousing1960 - 1985 : MIS Era
• Unfriendly
• Slow
• Dependent on IS programmers
• Inflexible
• Analysis limited to defined reports
Focus on ReportingFocus on Reporting
-
8/17/2019 DWH Overview
11/35
Evolution of DataEvolution of Data
WarehousingWarehousing1985 - 1990 : Querying Era
• Adhoc, unstructured access to corporate data
• SQL as interface not scalable
• annot handle complex analysis
Focus on Online $uer#ingFocus on Online $uer#ing
-
8/17/2019 DWH Overview
12/35
Evolution of DataEvolution of Data
WarehousingWarehousing1990 - 20xx : Analysis Era
• !rend Analysis
• "hat If #
• $o%ing A%erages• ross Dimensional omparisons
• Statistical profiles
• Automated pattern and rule disco%ery
Focus on Online Anal#sisFocus on Online Anal#sis
-
8/17/2019 DWH Overview
13/35
Need for Data WarehousingNeed for Data Warehousing• 5etter %usiness intelligence for end-users5etter %usiness intelligence for end-users• Reduction in time to locate' access' andReduction in time to locate' access' and
anal(.e informationanal(.e information
• Consolidation of disparate information sourcesConsolidation of disparate information sources• $trategic advantage over competitors$trategic advantage over competitors• 6aster time-to-mar)et for products and6aster time-to-mar)et for products and
servicesservices• Replacement of older' less-responsive decisionReplacement of older' less-responsive decision
support s(stemssupport s(stems• Reduction in demand on 7$ to generate reportsReduction in demand on 7$ to generate reports
-
8/17/2019 DWH Overview
14/35
O !" #s WarehouseO !" #s Warehouse
Operational SystemOperational System Data WarehouseData Warehouse Transaction ProcessingTransaction Processing Query ProcessingQuery Processing
Time SensitiveTime Sensitive History OrientedHistory Oriented
Operator ViewOperator View Managerial ViewManagerial View
Organized y transactionsOrganized y transactions!Order" #nput" #nventory$!Order" #nput" #nventory$
Organized y su %ect !&ustomer"Organized y su %ect !&ustomer"Product$Product$
'elatively smaller data ase'elatively smaller data ase
(arge data ase size(arge data ase size
Many concurrent usersMany concurrent users 'elatively )ew concurrent users'elatively )ew concurrent users
Volatile DataVolatile Data *on Volatile Data*on Volatile Data
Stores all dataStores all data Stores relevant dataStores relevant data
*ot +le,i le*ot +le,i le +le,i le+le,i le
-
8/17/2019 DWH Overview
15/35
Capacit( "lanningCapacit( "lanning
P r o c e s s
i n g P o w e r
Time of day
%rocessing &oad %ea's During the (eginning and "nd of Da#%rocessing &oad %ea's During the (eginning and "nd of Da#
-
8/17/2019 DWH Overview
16/35
E8amples Of $omeE8amples Of $ome
ApplicationsApplications !arget Mar)eting !arget Mar)etingMar)et $egmentationMar)et $egmentation5udgeting5udgeting
Credit Rating AgenciesCredit Rating Agencies6inancial Reporting and Consolidation6inancial Reporting and Consolidation
Mar-et .as-et /nalysis 0Mar-et .as-et /nalysis 0 POS /nalysisPOS /nalysis
&hurn /nalysis&hurn /nalysisPro)ita ility ManagementPro)ita ility Management
1vent trac-ing1vent trac-ing
Manufacturers Manufacturers Manufacturers Manufacturers
Customers Customers Customers Customers
Retailers Retailers Retailers Retailers
http://sheks/stuff/Data%20Warehousing%20Example%20-%20Amazon.htmhttp://sheks/stuff/Data%20Warehousing%20Example%20-%20Amazon.htmhttp://sheks/stuff/Data%20Warehousing%20Example%20-%20Amazon.htm
-
8/17/2019 DWH Overview
17/35
Do we need a separateDo we need a separatedata%ase ,data%ase ,
• O !" and data warehousing re9uire two ver(O !" and data warehousing re9uire two ver(di:erentl( con4gured s(stemsdi:erentl( con4gured s(stems
• 7solation of "roduction $(stem from 5usiness7solation of "roduction $(stem from 5usiness7ntelligence $(stem7ntelligence $(stem• $igni4cant and highl( varia%le resource$igni4cant and highl( varia%le resource
demands of the data warehousedemands of the data warehouse
• Cost of dis) space no longer a concernCost of dis) space no longer a concern• "roduction s(stems not designed for 9uer("roduction s(stems not designed for 9uer(
processingprocessing
-
8/17/2019 DWH Overview
18/35
Data MartsData Marts• Enterprise wide data warehousing pro&ects have aEnterprise wide data warehousing pro&ects have a
ver( large c(cle timever( large c(cle time• ;etting consensus %etween multiple parties ma(;etting consensus %etween multiple parties ma(
also %e di
-
8/17/2019 DWH Overview
19/35
Data MartsData Marts
• $u%&ect or Application Oriented$u%&ect or Application Oriented5usiness #iew of Warehouse5usiness #iew of Warehouse
+ uic) $olution to a speci4c 5usinessuic) $olution to a speci4c 5usiness"ro%lem"ro%lem
+ 6inance' Manufacturing' $ales etc>6inance' Manufacturing' $ales etc>
+ $maller amount of data used for$maller amount of data used forAnal(tic "rocessingAnal(tic "rocessing
A &ogical Subset of !he )omplete Data WarehouseA &ogical Subset of !he )omplete Data Warehouse
-
8/17/2019 DWH Overview
20/35
Data Warehouses or DataData Warehouses or DataMartsMarts
%or companies interested in changing their corporate%or companies interested in changing their corporatecultures or integrating separate departments! ancultures or integrating separate departments! an
enterpriseenterprise
wide approach makes sense.wide approach makes sense.
ompanies that want a quick solution to a speci&companies that want a quick solution to a speci&c
businessbusiness
problem are better ser#ed by a standalone data mart.problem are better ser#ed by a standalone data mart.
Some companies opt to build a warehouseSome companies opt to build a warehouse
incrementally!incrementally!
data mart by data mart.data mart by data mart.
A &ogical Subset of !he )omplete Data WarehouseA &ogical Subset of !he )omplete Data Warehouse
-
8/17/2019 DWH Overview
21/35
Data Warehouse and DataData Warehouse and Data
MartMart DataDataWarehouseWarehouse
Data MartsData Marts
$cope$cope • Application NeutralApplication Neutral• Centrali.ed' $haredCentrali.ed' $hared• CrossCross
O5=enterpriseO5=enterprise
• $peci4c$peci4cApplicationApplicationRe9uirementRe9uirement• O5'O5'departmentdepartment• 5usiness5usiness"rocess Oriented"rocess Oriented
DataData"erspe"erspectivective
• ?istorical Detailed?istorical Detaileddatadata• $ome summar($ome summar(
• Detailed @someDetailed @somehistor(histor(• $ummari.ed$ummari.ed
$u%&ect$u%&ect • Multiple su%&ectMultiple su%&ectareasareas • $ingle "artial$ingle "artialsu%&ectsu%&ect
-
8/17/2019 DWH Overview
22/35
Data Warehouse and DataData Warehouse and Data
MartMart DataDataWarehouseWarehouse
Data MartsData Marts
DataData$ources$ources • Man(Man(• Operational= E8ternalOperational= E8ternalDataData
• 6ew6ew• Operational'Operational'e8ternal datae8ternal data
7mplement7mplement !ime !ime6rame6rame
• 1-3B months for 4rst1-3B months for 4rststagestage
• Multiple stageMultiple stageimplementationimplementation
• -3 months-3 months
CharacterisCharacteristicstics
• 6le8i%le' e8tensi%le6le8i%le' e8tensi%le• Dura%le=$trategicDura%le=$trategic• Data orientationData orientation
• Restrictive' nonRestrictive' none8tensi%lee8tensi%le• $hort life=tactical$hort life=tactical• "ro&ect"ro&ect
-
8/17/2019 DWH Overview
23/35
Warehouse or Mart 6irst ,Warehouse or Mart 6irst ,
DataData Warehouse +irstWarehouse +irst Data Mart )irstData Mart )irst E8pensiveE8pensive Relativel( cheapRelativel( cheap
arge development c(clearge development c(cle Delivered in / monthsDelivered in / months
Change management isChange management isdi
-
8/17/2019 DWH Overview
24/35
O !" $(stems #s DataO !" $(stems #s Data
WarehouseWarehouse Remember Between OLTP and Data Warehouse systems
users are different
data content is different,
data structures are different
hardware is different *nderstanding !he Differences Is !he +e#*nderstanding !he Differences Is !he +e#
-
8/17/2019 DWH Overview
25/35
Operational Data $tore -Operational Data $tore -De4nitionDe4nition
&
A
'perational
DSS
Data"arehouse
'DS
-
8/17/2019 DWH Overview
26/35
Operational Data $tore - De4nitionOperational Data $tore - De4nition
AA sub"ect orientedsub "ect oriented !! integratedinte grated !!#olatile#olatile !! current #aluedcurrent #alued datadata
store containing only corporatestore containing only corporatedetailed datadetailed data
Data stored only or!urrent "eriod# $ld
Data is eit%erar!%i&ed or 'o&ed to
Data (are%ouse
)an I see !reditre"ort ro'
A!!ounts* Salesro' 'ar+eting
and o"en order
re"ort ro'order entry ort%is !usto'er
Identi!al ,ueries 'aygi&e di erent results
at di erent ti'es#Su""orts analysisre,uiring !urrent
data
Data ro' 'ulti"lesour!es is integrated
or a su .e!t
-
8/17/2019 DWH Overview
27/35
Operational Data $toreOperational Data $tore
• !he OD$ applies onl( to the world of !he OD$ applies onl( to the world ofoperational s(stems>operational s(stems>
• !he OD$ contains current valued and !he OD$ contains current valued andnear current valued data>near current valued data>• !he OD$ contains almost e8clusivel( !he OD$ contains almost e8clusivel(
all detail dataall detail data• !he OD$ re9uires a full function' !he OD$ re9uires a full function'update' record oriented environment>update' record oriented environment>
-
8/17/2019 DWH Overview
28/35
Operational Data $toreOperational Data $tore• 6unctions of an OD$6unctions of an OD$
+ Converts Data'Converts Data'+ Decides Which Data of Multiple $ources 7s theDecides Which Data of Multiple $ources 7s the
5est'5est'+ $ummari.es Data'$ummari.es Data'+ Decodes=encodes Data'Decodes=encodes Data'+ Alters the Fe( $tructures'Alters the Fe( $tructures'+ Alters the "h(sical $tructures'Alters the "h(sical $tructures'+ Reformats Data'Reformats Data'+ 7nternall( Represents Data'7nternall( Represents Data'+ Recalculates Data>Recalculates Data>
-
8/17/2019 DWH Overview
29/35
Di:erent )inds ofDi:erent )inds of
7nformation Needs7nformation Needs• CurrentCurrent
• RecentRecent
• ?istorical?istorical
• CurrentCurrent
• RecentRecent
• ?istorical?istorical
Is t%is 'edi!ine a&aila lein sto!+
(%at are t%e tests t%is"atient %as !o'"leted so
ar
/as t%e in!iden!e ou er!ulosis in!reased in
last 5 years in Sout%ernregion
-
8/17/2019 DWH Overview
30/35
O !" #s OD$ #s DW?O !" #s OD$ #s DW?&haracte&haracteristicristic
O(TPO(TP ODSODS DataDataWarehouseWarehouse
/udience /udience OperatingOperatingPersonnelPersonnel
/nalysts /nalysts Managers andManagers andanalystsanalysts
Data accessData access #ndividual#ndividualrecords"records"transactiontransactiondrivendriven
#ndividual#ndividualrecords"records"transaction ortransaction oranalysis drivenanalysis driven
Set o) records"Set o) records"analysis drivenanalysis driven
Data contentData content &urrent" real0&urrent" real0timetime &urrent and&urrent andnear0currentnear0current HistoricalHistorical
DataDataStructureStructure
DetailedDetailed Detailed andDetailed andlightlylightlysummarizedsummarized
Detailed andDetailed andSummarizedSummarized
DataDataor anizationor anization
+unctional+unctional Su %ect0orientedSu %ect0oriented Su %ect0orientedSu %ect0oriented
-
8/17/2019 DWH Overview
31/35
O !" #s OD$ #s DW?O !" #s OD$ #s DW?&haracteristic&haracteristic O(TPO(TP ODSODS DataDataWarehouseWarehouse Data redundancyData redundancy *on0redundant within*on0redundant within
system2 3nmanagedsystem2 3nmanagedredundancy amongredundancy amongsystemssystems
SomewhatSomewhatredundant withredundant withoperationaloperationaldata asesdata ases
Managed redundancyManaged redundancy
Data updateData update +ield y )ield+ield y )ield +ield y )ield+ield y )ield &ontrolled atch&ontrolled atch
Data ase sizeData ase size ModerateModerate ModerateModerate (arge to very large(arge to very large
DevelopmentDevelopmentMethodology Methodology
'e4uirements driven"'e4uirements driven"structuredstructured
Data driven"Data driven"somewhatsomewhatevolutionaryevolutionary
Data driven"Data driven"evolutionaryevolutionary
PhilosophyPhilosophy Support day0to0daySupport day0to0dayoperationoperation
Support day0to0Support day0to0day decisions 5day decisions 5operationaloperationalactivitiesactivities
Support managing theSupport managing theenterpriseenterprise
-
8/17/2019 DWH Overview
32/35
!(pical Data Warehouse !(pical Data Warehouse
ArchitectureArchitecture
OperationalSystems6Data
SelectExtract
Transform
Integrate
Maintain
DataPreparation
Middleware6 /P#
DataWarehouse
Metadata
1#S 6DSS
Query Tools
O(/P6'O(/P
We .rowsers
Data Mining
DataMarts
,ulti-tiered Data Warehouse ithout ODS,ulti-tiered Data Warehouse ithout ODS
( l h
-
8/17/2019 DWH Overview
33/35
!(pical Data Warehouse!(pical Data WarehouseArchitectureArchitecture
OperationalSystems6Data
Select
Extract
Transform
Integrate
Maintain
DataPreparation
DataMarts
DataWarehouse
Metadata
ODS
Metadata
Select
Extract
Transform
Load
DataPreparation
,ulti-tiered Data Warehouse ith ODS,ulti-tiered Data Warehouse ith ODS
-
8/17/2019 DWH Overview
34/35
-
8/17/2019 DWH Overview
35/35
Thank You