Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS
Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online...
-
date post
18-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online...
![Page 1: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/1.jpg)
Advanced Querying
OLAP
Data Warehousing
![Page 2: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/2.jpg)
Database Applications
• Transaction processing– Online setting– Supports day-to-day operation of business
• Decision support– Offline setting– Strategic planning (statistics)
![Page 3: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/3.jpg)
Transaction Processing
Transaction processing
• Operational setting• Up-to-date = critical
• Simple data
• Simple queries
Flight reservations
• ticket sales• do not sell a seat
twice• reservation, date,
name• Give flight details of X
List flights to Y
![Page 4: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/4.jpg)
Transaction Processing
• Database must support– simple data
• tables
– simple queries• select from where …
– consistency & integrity CRITICAL– concurrency
• Relational databases, Object-Oriented, Object-Relational
![Page 5: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/5.jpg)
Decision Support
Decision support
• Off-line setting• « Historical » data• Summarized data• Different databases
• Statistical queries
Flight company
• Evaluate ROI flights• Flights of last year• # passengers on line L• Passengers, fuel costs,
maintenance info• Average % of seats
sold/month/destination
![Page 6: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/6.jpg)
A decision support DB that is maintained separately from the organization’s operational databases.
Why Separate Data Warehouse?• High performance for both systems
– DBMS— tuned for OLTP • access methods, indexing, concurrency control, recovery
– Warehouse—tuned for OLAP • complex OLAP queries, multidimensional view, consolidation.
• Different functions and different data– Missing data: Decision support requires historical data which
operational DBs do not typically maintain– Data consolidation: DS requires consolidation (aggregation,
summarization) of data from heterogeneous sources– Data quality: different sources typically use inconsistent data
representations, codes and formats which have to be reconciled
Data Warehouse
![Page 7: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/7.jpg)
Three-Tier Architecture
DataWarehouse
ExtractTransformLoadRefresh
OLAP Engine
Monitor&
IntegratorMetadata
Data Sources Front-End Tools
Serve
Data Marts
Operational DBs
other
sources
Data Storage
OLAP Server
Analysis
Query/Reporting
Data Mining
ROLAPServer
![Page 8: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/8.jpg)
OLAP
• OLAP = OnLine Analytical Processing– Online = no waiting for answers
• OLAP system = system that supports analytical queries that are dimensional in nature.
![Page 9: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/9.jpg)
This Lecture
• Examples of decision support queries
• Data Cubes– Conceptual data model– Typical operations
• Implementation– ROLAP vs MOLAP– Indexing structures
• SQL:1999 support for OLAP
![Page 10: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/10.jpg)
Examples of Queries
• Flight company: evaluate ticket sales– give total, average, minimal, maximal amount– per date: week, month, year– by destination/source port/country/continent– by ticket type– by # of connections– …
![Page 11: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/11.jpg)
Characteristics
• One special attribute: amount measure
• Other attributes: select relevant regions dimensions
Different levels of generality (month, year, …)
hierarchies
• Measure data is summarized: sum, min, max, average
aggregations
![Page 12: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/12.jpg)
Supermarket example
• Evaluate the sales of products– Product cost in $– Customer: ID, city, state, country, – Store: chain, size, location, – Product: brand, type, …– …
• What are the measure and dimensional attributes, where are the hierarchies?
measure
Dim.hierarchies
![Page 13: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/13.jpg)
Why dimensions?
customer
store
product
Cost in $
• Multidimensional view on the data
![Page 14: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/14.jpg)
Cross Tabulation
• Cross-tabulations are highly useful– Sales of clothes JuneAugust ‘06
Blue Red Orange Total
June 51 25 158 234
July 58 20 120 198
August 65 22 51 138
Total 174 67 329 570
Product: color
Date:month,JuneAugust2006
![Page 15: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/15.jpg)
Data cubes
• Extension of Cross-Tables to multiple dimensions
• Conceptual notion
Blue Red Orange Total
June 51 25 158 234
July 58 20 120 198
August 65 22 51 138
Total 174 67 329 570
Dimensions
Data Points/1st level of aggregation
Aggregatedw.r.t. X-dim
Aggregatedw.r.t. Y-dim
Aggregatedw.r.t. X and Y
![Page 16: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/16.jpg)
Data Cubes
Date
Produ
ct
Cou
ntr
ysum
sum TV
VCRPC
1Qtr 2Qtr 3Qtr 4Qtr
Ireland
France
Germany
sum
![Page 17: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/17.jpg)
Data Cubes
• Base cuboid = n-dimensional cube with n number of dimensions
• The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid
• The lattice of cuboids forms a data cube
![Page 18: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/18.jpg)
Lattice of Cuboidsall
product, date, country
product date country
product, date product, countrydate, country
![Page 19: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/19.jpg)
Operations with Data Cubes
Scenario:
• Before starting the analysis task:– what data?
• select a few relevant dimensions• define hierarchy• aggregation functions of interest
– Pre-materialize• load data• compute counts/max, min, avg, … on beforehand
![Page 20: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/20.jpg)
Operations with Data Cubes
• What operations can you think of an analyst might find useful? (e.g., store)
![Page 21: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/21.jpg)
Operations with Data Cubes
• What operations can you think of that an analyst might find useful? (e.g., store)– only look at stores in the Netherlands– look at cities instead of individual stores– look at the cross-table for product-date– restrict analysis to 2006, product O1– go back to a finer granularity at the store level
![Page 22: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/22.jpg)
Roll-Up
• Move in one dimension from a lower granularity to a higher one– store city– cities country– product product type
![Page 23: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/23.jpg)
Drill-down
• Move in one dimension from a higher granularity to a lower one– city store– country cities– product type product
• Drill-through:– go back to the original, individual data records
![Page 24: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/24.jpg)
Pivoting
• Change the dimensions that are “displayed”; select a cross-tab.– look at the cross-table for product-date– display cross-table for date-customer
![Page 25: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/25.jpg)
Slice & dice
• Select a part of the cube by restricting one or more dimensions– restrict analysis to “city = Eindhoven”
![Page 26: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/26.jpg)
Summary of Concepts
• Cube: Multidimensional view on data– dimensional attributes– measure attribute
• Operations:– roll-up/drill-down– pivoting– slice and dice
![Page 27: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/27.jpg)
Implementation
• To make query answering more efficient: consolidate (materialize) aggregations
• Obvious implementation: multidimensional array.– Fast lookup: cell(prod. p, date d, prom. pr):
• look up index of p1, index of d, index of pr:
index = (p x D x PR) + (d x PR) + pr
![Page 28: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/28.jpg)
Implementation
• Multidimensional array– obvious problem: sparse data
can easily be solved, though.
Example:
binary search tree, key on index
hash table.
![Page 29: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/29.jpg)
Implementation
• However: very quickly people were confronted with the Data Explosion Problem
Consolidating the summaries blows up the data enormously !
Reasons are often misunderstood and confusing.
![Page 30: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/30.jpg)
Data Explosion Problem
• Why?
Suppose:– n dimensions, every dimension has d values– dn possible tuples.
– Number of cells in the cube: (d+1)n
– So, this is not the problem
![Page 31: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/31.jpg)
Data Explosion Problem
• Why?
Suppose– n dimensions, every dimension has d values– every dimension has a hierarchy
– most extreme case: binary tree
2d possibilities/dimension
![Page 32: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/32.jpg)
Data Explosion Problem
• Why?
Suppose– n dimensions, every dimension has d values– every dimension has a hierarchy– most extreme case: binary tree
2d possibilities/dimension 2n x dn cells
Only partial explanation (factor 2n comes from an extremely pathological case)
![Page 33: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/33.jpg)
Data Explosion Problem
• Why?– The problem is that most data is not dense,
but sparse.– Hence, not all dn combinations are possible.
Example: 10 dimensions with 10 values– 10 000 000 000 possibilities
Suppose « only » 1 000 000 are present
![Page 34: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/34.jpg)
Data Explosion Problem
Example: 10 dimensions with 10 values– 10 000 000 000 possibilities
Suppose « only » 1 000 000 are present
Every tuple increases count of 210 cells !
With hierarchies: effect even worse!
If every hierarchy has 5 items:
510 = 9 765 625 cells!
![Page 35: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/35.jpg)
View Selection Problem• Suffices to precompute some aggregates, and
compute others on demand.– aggregate on (item-name, color) from an aggregate
on (item-name, color, size) – For all but a few “non-decomposable” aggregates
such as median
• Several optimizations for computing multiple aggregates– Compute aggregate on (item-name, color) from an
aggregate on (item-name, color, size)
– Compute aggregates on (item-name, color, size), (item-name, color) and (item-name) in single DB sort
![Page 36: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/36.jpg)
View Selection Problemall
product, date, country
product date country
product, date product, countrydate, country
![Page 37: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/37.jpg)
View Selection Problemall
product, date, country
product date country
product, date product, countrydate, country
Which views to select:hard research problem !
![Page 38: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/38.jpg)
Implementation
Nowadays systems can be divided in three categories:– ROLAP (Relational OLAP)
• OLAP supported on top of a relational database
– MOLAP (Multi-Dimensional OLAP)• Use of special multi-dimensional data structures
– HOLAP: (Hybrid)• combination of previous two
![Page 39: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/39.jpg)
ROLAP
• Cubes can easily be represented in relational tables: special value “all”
Month Prod. Cust. PriceJan p1 c1 10Jan p2 c1 8Jan p1 c2 10Feb p1 c1 9
…all p1 c1 102Jan all c1 18Jan p1 all 1 230all all c1 4 235
…all all all 1 253 458
![Page 40: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/40.jpg)
ROLAP
• Typical database scheme:– star schema
• fact table is central• links to dimensional tables
– Extensions:• snowflake schema
– dimensions have hierarchy/extra information attached
• Star constellation– multiple star schemas sharing dimensions
![Page 41: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/41.jpg)
Example of a Star Schema
Order NoOrder No
Order DateOrder Date
Customer NoCustomer No
Customer Customer NameName
Customer Customer AddressAddress
CityCity
SalespersonIDSalespersonID
SalespersonNaSalespersonNameme
CityCity
QuotaQuota
OrderNOOrderNO
SalespersonIDSalespersonID
CustomerNOCustomerNO
ProdNoProdNo
DateKeyDateKey
CityNameCityName
QuantityQuantity
Total Price
ProductNOProductNO
ProdNameProdName
ProdDescrProdDescr
CategoryCategory
CategoryDescriptionCategoryDescription
UnitPriceUnitPrice
DateKeyDateKey
DateDate
CityNameCityName
StateState
CountryCountry
OrderOrder
CustomerCustomer
SalespersoSalespersonn
CityCity
DateDate
ProductProduct
Fact TableFact Table
![Page 42: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/42.jpg)
Example of a Snowflake Schema
Order NoOrder No
Order DateOrder Date
Customer NoCustomer No
Customer Customer NameName
Customer Customer AddressAddress
CityCity
SalespersonIDSalespersonID
SalespersonNaSalespersonNameme
CityCity
QuotaQuota
OrderNOOrderNO
SalespersonIDSalespersonID
CustomerNOCustomerNO
ProdNoProdNo
DateKeyDateKey
CityNameCityName
QuantityQuantity
Total Price
ProductNOProductNO
ProdNameProdName
ProdDescrProdDescr
CategoryCategory
CategoryCategory
UnitPriceUnitPrice
DateKeyDateKey
DateDate
MonthMonth
CityNameCityName
StateState
CountryCountry
OrderOrder
CustomerCustomer
SalespersoSalespersonn
CityCity
DateDate
ProductProduct
Fact TableFact Table
CategoryNaCategoryNameme
CategoryDeCategoryDescrscr
MontMonthh
YearYear YearYear
StateNameStateName
CountryCountry
CategoryCategory
StateState
MonthMonth
YearYear
![Page 43: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/43.jpg)
branch_keybranch_namebranch_type
time_keydayday_of_the_weekmonthquarteryear
Measures
Branch
Time
item_keyitem_namebrandtypesupplier_key
Item
location_keystreetcityProvince/streetcountry
Location
Sales Fact Table
Avg_sales
Euros_sold
Unit_sold
Location_key
Branch_key
Item_key
Time_key
shipper_keyshipper_namelocation_keyshipper_type
shipper
unit_shipped
Euros_sold
to_location
from_location
shipper_key
Item_key
Time_key
Shipping Fact TableMultiple fact tables share dimension tables
Example of Fact Constellation
![Page 44: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d225503460f949f8896/html5/thumbnails/44.jpg)
SQL 1999 support for OLAP
• see other set of slides