Cost Based Oracle

40

Click here to load reader

Transcript of Cost Based Oracle

Page 1: Cost Based Oracle

1

Santosh Kangane – Performance DBAsantoshkangane.blogspot.inwww.linkedin.com/pub/santosh-kangane/1b/78b/6b

COST BASED ORACLE

Page 2: Cost Based Oracle

2

Background of CBO

Oracle Database Memory Structures

Table Scans

CPU Costing Model

Index and Clustering Factor

Dynamic Sampling and Histogram

Query transformation

Join Methodologies

Highlights

There are NO thumb Rules in Oracle. Different Versions of Oracle and data Patterns Drives the SQL performance !!!!

This presentation is just to introduce as to How CBO workouts the SQL plans That probably will help you to find what is suitable for given SQL.

How you write a SQL, it matters !!!!

Page 3: Cost Based Oracle

3

• Traditional : Simple counting of read requests

• System statistics (1) : Accounting for size and time of read requests• System statistics (2) : Accounting for size and time of read requests and

CPU costs• System statistics (3) : Accounting for size and time of read requests and

CPU costs and caching.

Cost Based Optimization : Evolution

Page 4: Cost Based Oracle

4

Query Execution

Page 5: Cost Based Oracle

5

Three different layers of operational complexity

• First : The execution plan tells you what the optimizer thinks is going to happen at run time, and produces a cost based on this model.

• Second : The execution engine starts up and executes the model dictated by the optimizer- but the actual mechanism is not always identical to the model (alternatively, the model is not a good description of what actually happens).

• Finally: there are cases where the resources required to execute the model vary dramatically with the way in which the incoming data happens to be distributed.

In other words, the optimizer’s execution plan may not be exactly the run-time execution path, and the speed of the run-time execution path may be affected by an unlucky choice of data.

Page 6: Cost Based Oracle

6

When CBO makes errors

• Some inappropriate assumptions are built into the cost model.• The relevant statistics about the data distribution are available, but

misleading.• The relevant statistics about the data distribution are not available.• The performance characteristics of the hardware are not known.• The current workload is not known.• There are bugs in the code.

Page 7: Cost Based Oracle

7

Oracle Database Memory Structure

Page 8: Cost Based Oracle

8

Oracle : SQL Execution activity

Page 9: Cost Based Oracle

9

Table Scans

1. db_file_multiblock_read_count : Multiple block read count allowed in single IO operation. Is always set to the maximum allowed by the operating system by

default. Oracle Calculates the IO cost based on the MBRC and system

statistics. When Oracle executes the tablescan, how many blocks does it try

to read in a multiblock read? Is it the value of MBRC or something else?

Answer: Oracle still tries to use the actual value for db_file_multiblock_read_count—scaled up or down if we are reading from a tablespace with a nondefault block size.

If workload statistics are not gathered then system uses non workload stats.

Page 10: Cost Based Oracle

10

2. Full Table Scan: When performing a full table scan, it reads the blocks of the table into

buffers and puts them on the LRU end (instead of the MRU end) of the LRU list.

This is because a fully scanned table usually is needed only briefly, so the blocks should be moved out quickly to leave more frequently used blocks in the cache.

You can control this default behavior on a table-by-table basis. To specify that blocks of the table are to be placed at the MRU end of

the list during a full table scan, use the CACHE clause when creating or altering a table or cluster.

You can specify this behavior for small lookup tables (Keep pool) or large static historical tables(Recycle pool) to avoid I/O on subsequent accesses of the table.

Transactional table MUST not be added to CACHE.

Table Scans

Page 11: Cost Based Oracle

11

3. Parallel Execution: (/*+ PARALLEL */ or ALTER TABLE <<T1>> PARALLEL)I. The user session or shadow process takes on the role of a

coordinator, often called the query coordinator and necessary number of parallel slave processes.

II. The SQL statement is executed as a sequence of operations. While the parallel slaves are executing, the query coordinator performs any portion of the work that cannot be executed in parallel.

III. Finally, the query coordinator returns results to the user.

Parallel scans use direct path reads to bypass the data buffer and read blocks directly into local (PGA) memory.

A parallel query will first issue a segment checkpoint to get all dirty blocks status for the segment written to disk before it reads.

This could lead to a performance problem in rare cases that mixed a large data buffer, a busy OLTP system, and parallel execution for reports.

Table Scans

Page 12: Cost Based Oracle

12

Costing Model

Page 13: Cost Based Oracle

13

What is Cost ?

According to the CPU costing model:Cost = (

#SRds * sreadtim +#MRds * mreadtim +#CPUCycles / cpuspeed

) / sreadtimwhere,

#SRDs - number of single block reads#MRDs - number of multi block reads#CPUCycles - number of CPU Cyclessreadtim - single block read timemreadtim - multi block read timecpuspeed - CPU cycles per second

Translated, The cost is the time spent on single-block reads, plus the time spent on multi block reads, plus the CPU time required, all divided by the time it takes to do a single-block read.

Page 14: Cost Based Oracle

14

Optimizer statistics are a collection of data that describe more details about the database and the objects in the database.

Optimizer statistics includes :Table statistics

Number of rowsNumber of blocksAverage row length

Column statisticsNumber of distinct values (NDV) in columnNumber of nulls in columnData distribution (histogram)Average Column length

Index statisticsNumber of leaf blocksLevelsClustering factor

System statisticsI/O performance and utilizationCPU performance and utilization

Statistics

USER_TABLES / DBA_TABLES

USER_TAB_COLUMNS / DBA_TAB_COLUMNS

USER_INDEXES

DBA_TAB_STATISTICS

Page 15: Cost Based Oracle

15

user_tab_col_statistics.num_nulls

user_tab_col_statistics.num_distinct

user_tab_histograms.low_value

user_tab_histograms.high_value

user_tables.num_rows

So, user_tab_col_statistics.Density

In an audience of 1,200 people. How many of them do you think were born in December?

Cardinality (Selectivity)

Optimizer is thinking ....

user_tab_col_statistics.num_nulls = 120

user_tab_col_statistics.num_distinct =12

user_tab_histograms.low_value = 1

user_tab_histograms.high_value = 12

user_tables.num_rows = 1,200

So, user_tab_col_statistics.Density = 1/12

= 0

= 12

= 1

= 12

= 1,200

= 1/12

Cardinality = num_rows / num_distinct (If no histogram is prepared )

Cardinality = num_rows * density (If histogram is prepared )

Cardinality = 100 Cardinality = 90

And What if 120 of them don’t remember their birth date ?

/13?

• Base selectivity = 1/12 (from density or from 1/num_distinct)

• num_nulls = 120 & num_rows = 1200

• Adjusted selectivity = Base selectivity * (num_rows - num_nulls)

num_rows

• Adjusted selectivity = (1/12) * ((1200 - 120)/1200) = 0.075

• Adjusted cardinality = Adjusted selectivity * num_rows

• Adjusted cardinality = 0.075 * 1200 = 90

Page 16: Cost Based Oracle

16

Cardinality (Selectivity)

Basic formulae of Selectivity :

The selectivity of (predicate1 AND predicate2) = Selectivity of (predicate1) * Selectivity of

(predicate2). The selectivity of (predicate1 OR predicate2) = selectivity of (predicate1) + selectivity of (predicate2)

minus selectivity of (predicate1 AND predicate2) ... otherwise, you’ve counted the overlap twice. The selectivity of (NOT predicate1) = 1 – selectivity of (predicate1) Range Selectivity = “required range” divided by “total available range”

Cardinality : Cardinality = Selectivity * num_rows

Page 17: Cost Based Oracle

17

Index : B+ Tree

Cost = blevel + ceiling(leaf_blocks * effective index selectivity) + ceiling(clustering_factor * effective table selectivity)

Translated, Cost to reach to, the Level of index node from root in Balanced B-Tree + number of leaf blocks to be walk through to get rowids + cost to access table.

B+ Tree Index : To Scan the index segment oracle uses Binary tree travels algorithm. The depth of B+ Tree varies from 1 to 4. Mostly its 2 or 3 and Max 4 for larger table

index

Page 18: Cost Based Oracle

18

Index : Bit Map

Conceptualize a bitmap index as a two-dimensional array

Useful for following scenarios :1. Low cardinality : If the number of distinct values of a column is less than 1% of

the number of rows in the table, or if the values in a column are repeated more than 100 times, then the column is a candidate for a bitmap index.

2. No or little insert/update : Updating bitmap indexes take a lot of resources. Most suitable for Materialized view or Data warehouse where data is refreshed/loaded + indexed once and there after used for query purpose only. Not suitable for OLTP systems due to memory and CPU resource requirements.

Page 19: Cost Based Oracle

19

Index: Clustering Factor

Clustering factor resolution logic does not consider the Cache hit, Hence some time gives unrealistic values.

Page 20: Cost Based Oracle

20

Index: Clustering Factor

SELECT /*+cursor_sharing_exactdynamic_sampling(0)no_monitoringno_expandindex (t," SHIPMST_DPUDATE")noparallel_index(t,"AWBMST_DPUDATE")

*/ Sys_op_countchg(substrb(t.rowid,1,15),&m_history) as clfFROM SHIPMST T; WHERE DPUDATE IS NOT NULL;

• &m_history represent the number of block visited most recently. This value should be freelists value of table or 16 in case of ASSM tablespace. • use dbms_stats.get_index_stats and dbms_stats.set_index_stats to set the correct value of Clustering factor.• This should be used only for critical indexes where you find default method of oracle giving you a wrong statistics and it not recommended to apply for all index always.

Page 21: Cost Based Oracle

21

Index: Selection

1. Range-based predicate (e.g., col1 between 1 and 3) would reduce the benefit of later columns in the index. Any predicates based on columns appearing after the earliest range-based predicate would be ignored when calculating the effective index selectivity—although they would still be used in the effective table selectivity—and this could leave Oracle with an unreasonably high figure for the cost of that index. Columns that usually appeared with a range-based predicate toward the end of the index definition.

2. For improving the compressibility of an index, Put the least selective (most repetitive) columns first.

3. Re-arranging the column sequence in the index changes the clustering_factor and Index Selectivity...

4. Appearance of column sequence in there WHERE clause does not affect the index selectivity.

5. If Index in unique in nature then do create them as UK index instead of normal index, other wise Optimizer has to refer Histogram and figure out the index uniqueness. That changes optimizers access path option.

Page 22: Cost Based Oracle

22

Dynamic Sampling

How and when will DS be use?During the compilation of a SQL statement, the optimizer decides whether to use DS or not by considering whether the available statistics are sufficient to generate a good execution plan. If,1. Available table statistics are not enough.2. When the statement contains a complex predicate expression and

extended statistics are not available.

OPTIMIZER_DYNAMIC_SAMPLING Parameter defines the number of blacks to read for Sampling. It can have value from 0 to 10. Default is 2. More the value if DS parameter, more time it takes to compile the SQL Statement.

Page 23: Cost Based Oracle

23

Histogram

Histograms feature in Oracle helps optimizer to determine how data is skewed (distributed) with in the column

Advantage of Histogram : 1. Histograms are useful for Oracle optimizer to choose the right access

method in a table.2. It is also useful for optimizer to decide the correct table join order. When

we join multiple tables, histogram helps to minimize the intermediate result set. Since the smaller size of the intermediate result set will improve the performance.

Page 24: Cost Based Oracle

24

Query transformation

1. Join elimination (JE) : A technique in which one or more tables can be eliminated from the

execution plan without altering functional behaviour.

Page 25: Cost Based Oracle

25

Query transformation

1. Join elimination (JE) :Example 2 : Elimination by Constraints or reference,Following table T1 and T2 has FK constraints over T1.N1 and T2.N2 and SQL is selecting data only from T2 table. So, No need to check if the Key is exists in T1 and join can be eliminated safely…

Page 26: Cost Based Oracle

26

Query transformation

2. Subquery Unnesting : Subqueries can be unnested in to a join. Subquery is unnested in to a view and then joined to other row sources. In this listing, a correlated subquery is moved in to a view VW_SQ_1, unnested and then joined using Hash Join technique.

Oracle executes a sub query once for each distinct value from driving table and Hashes it. For subsequent rows it will use the same results from Hash table instead of re-executing sub query.

Page 27: Cost Based Oracle

27

Query transformation : Transitive Closure

Page 28: Cost Based Oracle

28

IN Vs EXISTS , NOT IN Vs NOT EXISTS

What to use ? Why ? Depends !

Why there is Myth ...... Up to oracle 8i optimizer has suffer through the cardinality approximation issue.

select * from AUDIENCE where month_no in ( 6, 7, 8);

Actual problem : Internally, the optimizer will convert a predicate in to OR clauses. The error that 8i suffers from is that after splitting the list into separate

predicates, it applies the standard algorithm for multiple disjuncts (the technical term for OR’ed predicates) and Fails to process it correctly.

sel(A or B or C) = sel(A) + sel(B) + sel(C) – Sel(A)sel(B) – Sel(B)sel(C) – sel(C)sel(A) + Sel(A)Sel(B)Sel(C)

Good News, Oracle 9i/10g onwords this issue has been address perfectly !!!But if you have large set of values then its good to have table type instead of IN LIST as Table type will allow CBO to rewrite the SQL and use Semi-Nested loop Join for better data retrieval.

How ??? Stay tuned......

Page 29: Cost Based Oracle

29

IN Vs EXISTS

Both the SQLs gives same Explain plan and Elapse time in this case...Then Who and When ??? If the main body of your query is highly selective, then an EXISTS clause might be more

appropriate to semi-join to the target table. if the main body of your query is not so selective (over join Predicate) and the subquery (the

target of the semi-join) is more selective, then an IN clause might be more appropriate.Good News, This rule is valid only for 8i (or earlier) systems. 9i and later versions it doesn’t matter much both works same in most of the cases. Still you will have to check for specific cases. IN and EXISTS uses Semi – Join technique. A “semi-join” between two tables returns rows

from the first table where one or more matches are found in the second table. If you have to select the data from only one table and no columns from second table, then

use semi-join (IN or EXISTS ) method instead conventional joins. As semi-join searches second table for first occurrence of matching values and stops there.

Page 30: Cost Based Oracle

30

NOT IN Vs NOT EXISTS

Handling of NULL values :• If a sub query returns NULL values, NOT IN condition evaluates to False and

returns zero rows, as it could not compare NULL values.• NOT EXISTS checks for non existence of the row, so it able to return rows even

if sub query has NULL values.

Anti - Join :• An “anti-join” between two tables returns rows from the first table where no

matches are found in the second table.• NOT IN and NOT EXISTS uses Anti – Join access path method.• Consider NOT IN, if sub query never returns NULL values and whether the

query might benefit from a merge or hash anti-join.• Else use NOT EXISTS.

Page 31: Cost Based Oracle

31

Joins – Access Path Techniques

1. Nested Loop2. Hash Join3. Merge Join4. SEMI Join5. ANTI Join

How Oracle select access path in a conventional join ? The nested loops algorithm is desirable when the predicates on the first table

are very selective and the join columns in the second table are selectively indexed.

The merge and hash join algorithms, are more desirable when predicates are not very selective or the join columns in the second table are not selectively indexed. And if both the data set are of large size.

Page 32: Cost Based Oracle

32

Joins – Nested Loop (old mechanism)

for r1 in (select rows from table_1 where colx = {value}) loop for r2 in (select rows from table_2 that match current row from table_1) loop

output values from current row of table_1 and current row of table_2 end loopend loop

T1

T2

Page 33: Cost Based Oracle

33

Joins – Nested Loop (New mechanism)

Index Access

1. Mechanism finds the first row in the outer table, traverses the index, and stops in the leaf block, picking up just the relevant rowids for the inner table.

2. Repeats this for all subsequent rows in the outer table.3. When all the target rowids have been found, the engine can sort them and then visit

the inner table in a single pass, working along the length of the table just once, picking the rows in whatever order they happen to appear.

4. optimizer_index_caching is used to adjust the cost calculation for index blocks of the inner table in nested loops and for the index blocks used during in-list iterator. It is not used in the calculation of costs for simple index unique scans or range scans into a single table.

Page 34: Cost Based Oracle

34

Joins – Hash Join

1. Build Table : It acquire one data set and convert it into the equivalent of an inmemory single-table hash cluster (assuming it have enough memory : hash_area_size) using an internal hashing function on the join column(s) to generate the hash key.

2. Probe Table : Start to acquire data from the second table, applying the same hashing function to the join column(s) as it read each row, and checking to see whether it can locate a matching row in the in-memory hash cluster.

Types of Hash Join:- Optimal Hash Join- Onepass Hash Join- Multipass Hash Join

Page 35: Cost Based Oracle

35

Joins – Hash Join (Optimal hash join)

(hash_area_size := 1024 = 10% PGA size)

Cost = Cost of acquiring data from the build table + Cost of acquiring data from the probe table + Cost of performing the Hashing and Matching

Page 36: Cost Based Oracle

36

Joins – Hash Join (Onepass hash join )

Page 37: Cost Based Oracle

37

Joins – Merge Join

Table 1

Sort

Table 2

SortMerge Merge

To next step

Page 38: Cost Based Oracle

38

Conclusions

1. Understand your data2. Data distribution is important3. Think about your parameters4. Choose your Index Rightly 5. Help Oracle with the truth

There are NO Thumb Rules But, They are the different options provided by Oracle to tune SQL !!!

Page 39: Cost Based Oracle

39

References

1. Cost-Based Oracle, Book by : Jonathan Lewis2. http://docs.oracle.com/cd/B10500_01/server.920/a96524/c08memor.htm3. http://www.dba-oracle.com/art_otn_cbo.htm4. http://docs.oracle.com/cd/B28359_01/server.111/b28318/memory.htm5. http://blogs.oracle.com/optimizer/entry/dynamic_sampling_and_its_impact

_on_the_optimizer6. http://orainternals.wordpress.com/2010/05/01/query-transformation-part-1

/7. http://www.dbspecialists.com/files/presentations/semijoins.html

Page 40: Cost Based Oracle

40

Thank You !

santoshkangane.blogspot.com