Diseño fisico indices_2

DATA WAREHOUSING Physical Design

Provide efficient access to relevant records

Based on values of particular attribute(s)

Same idea as index in back of a book An index is a “thin” copy of a relation

Not all columns from the relation are included The index is sorted in a particular way

Index supports efficient lookup Useful when filters are selective

Avoid scanning rows that will be filtered out

Indexes organized based on some search key Column (or set of columns) whose values are used to access the index

Organization can be sorting or hashing Index is built for some relation

One index entry per record in the relation Index consists of <Value, RID> pairs

Value = value of the search key for this record

RID = record identifier ▪ Tells the DBMS where the record is stored

▪ Usually (page number, offset in page)

Traditional Access Methods

B-trees, hash tables, R-trees, grids, …

Popular in Warehouses

Covering indexes

Multi column indexes

join indexes

bit map indexes

5

Idea behind fact index: Thinner version of fact table Index takes up less space than fact table Fewer I/Os required to scan it

Index has 1 index entry per fact table row Regardless of how many columns are in the

index

Sometimes an index has all the data you need Allows index-only query plan Not necessary to access the actual tuples Such an index is called a covering index

SELECT COUNT(*) FROM R WHERE A=5 Use index on A Count number of <5,RID> entries No need to look up records referenced by RIDs

Multi-column indexes are very useful in data warehousing We say such an index has a composite key

Example: B-Tree index on (A,B) Search key is (A,B) combination Index entries sorted by A value Entries with same A value are sorted by B value Called a lexicographic sort

SELECT SUM(B) FROM R WHERE A=5 Our (A,B) index covers this query!

Coverage vs. size trade-off More attributes in search key → index covers more queries More attributes in search key → index takes up more disk space

11

Advantages

efficient computation of joins involving first index columns (or all columns)

Disadvantages

useful only for specific join combinations

▪ for general usage, it is necessary to store a high number of indices

required space may be significant

▪ joins always involve the fact table

12

Cust Region Type

C1 Asia Retail

C2 Europe Dealer

C3 Asia Dealer

C4 America Retail

C5 Europe Dealer

RecID Retail Dealer

1 1 0

2 0 1

3 0 1

4 1 0

5 0 1

RecIDAsia Europe America

1 1 0 0

2 0 1 0

3 1 0 0

4 0 0 1

5 0 1 0

Base table Index on Region Index on Type

Query:

Get customer with region = „Asia‟ AND type = “Dealer”

Good if domain cardinality small Most useful for attributes with low or

medium cardinality ▪ Not good for something like LastName

13

Index intersection plans with bitmap indexes are fast Just perform bitwise AND! Index intersection with B-Trees requires a

join

Save space for low-cardinality attributes As compared to a B-Tree or Hash index

Bit vectors can be compressed Compression Pros and Cons

Reduce storage space → reduce number of I/Os required Need to compress/uncompress → increase CPU work

required Each compression scheme negotiates this trade-off

differently Operate directly on compressed bitmap → improved

performance

16

Bit matrix which precomputes the join between a dimension and the fact table

one column for each dimension RID

one row for each fact table RID

cell (i,j) is 1 if fact table tuple i joins dimension tuple j, 0 otherwise

Indexing dimensions attributes frequently involved in selection predicates if domain cardinality is high, then B-tree index if domain cardinality is low, then bitmap index

Indices for join indexing only foreign keys in the fact table is rarely

appropriate star join index should be used with caution (column order

issue) bitmapped join index is suggested (if available)

Indices for group by use materialized views

Diseño fisico indices_2

Documents

Transcript of Diseño fisico indices_2