Analytics ioug 2011

92
Analyzing Your Data with Analytic Functions Carl Dudley University of Wolverhampton, UK UKOUG Council Oracle ACE Director [email protected]

Transcript of Analytics ioug 2011

Page 1: Analytics ioug 2011

Analyzing Your Data with Analytic FunctionsAnalyzing Your Data with Analytic Functions

Carl DudleyUniversity of Wolverhampton, UK

UKOUG CouncilOracle ACE Director

[email protected]

Page 2: Analytics ioug 2011

2Carl Dudley University of Wolverhampton, UK

IntroductionIntroduction

Working with Oracle since 1986

Oracle DBA - OCP Oracle7, 8, 9, 10

Oracle DBA of the Year – 2002

Oracle ACE Director

Regular Presenter at Oracle Conferences

Consultant and Trainer

Technical Editor for a number of Oracle texts

UK Oracle User Group Council

Member of IOUC

Day job – University of Wolverhampton, UK

Page 3: Analytics ioug 2011

3Carl Dudley University of Wolverhampton, UK

Overview of Analytic Functions

Ranking Functions

Partitioning

Aggregate Functions

Sliding Windows

Row Comparison Functions

Analytic Function Performance

Analyzing Your Data with Analytic Functions

Page 4: Analytics ioug 2011

4Carl Dudley University of Wolverhampton, UK

Analytic FunctionsAnalytic Functions

New set of functions introduced in Oracle 8.1.6

— Analytic functions or Window functions

Intended for OLAP (OnLine Analytic Processing) or data warehouse purposes

Provide functionality that would require complex conventional SQL programming or other tools

Advantages

— Improved performance• The optimizer “understands” the purpose of the query

— Reduced dependency on report generators and client tools— Simpler coding

Page 5: Analytics ioug 2011

5Carl Dudley University of Wolverhampton, UK

Analytic Function CategoriesAnalytic Function Categories

The analytic functions fall into four categories

Ranking functionsAggregate functionsRow comparison functionsStatistical functions

The Oracle documentation describes all of the functions

Processed as the last step before ORDER BY

— Work on the result set of the query— Can operate on an intermediate ordering of the rows— Actions can be based on :

• Partitions of the result set• A sliding window of rows in the result set

Page 6: Analytics ioug 2011

6Carl Dudley University of Wolverhampton, UK

Processing SequenceProcessing Sequence

There may be several intermediate sort steps if required

Rows

Output

WHEREevaluation

GROUPINGHAVING

evaluationIntermediate

ordering

Analyticfunction

FinalORDER BY

Analytic process

Page 7: Analytics ioug 2011

7Carl Dudley University of Wolverhampton, UK

The Analytic ClauseThe Analytic Clause

Syntax :

<function>(<arguments>) OVER(<analytic clause>)

The enclosing parentheses are required even if there are no arguments

RANK() OVER (ORDER BY sal DESC)

Page 8: Analytics ioug 2011

8Carl Dudley University of Wolverhampton, UK

Sequence of ProcessingSequence of Processing

Being processed just before the final ORDER BY means :

— Analytic functions are not allowed in WHERE and HAVING conditions• Allowed only in the final ORDER BY clause

Ordering the final result set

— OVER clause specifies sort order of result set before analytic function is computed

— Can have multiple analytic functions with different OVER clauses, requiring multiple intermediate sorts

— Final ordering does not have to match ordering in OVER clause

Page 9: Analytics ioug 2011

9Carl Dudley University of Wolverhampton, UK

Analytic FunctionsAnalytic Functions

Overview of Analytic Functions

Ranking Functions

Partitioning

Aggregate Functions

Sliding Windows

Row Comparison Functions

Analytic Function Performance

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO----- ------- --------- ----- ----------- ----- ----- ------ 7934 MILLER CLERK 7782 23-JAN-1982 1300 10 7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 7839 KING PRESIDENT 17-NOV-1981 5000 10 7369 SMITH CLERK 7902 17-DEC-1980 800 20 7876 ADAMS CLERK 7788 12-JAN-1983 1100 20 7566 JONES MANAGER 7839 02-APR-1981 2975 20 7902 FORD ANALYST 7566 03-DEC-1981 3000 20 7788 SCOTT ANALYST 7566 09-DEC-1982 3000 20 7900 JAMES CLERK 7698 03-DEC-1981 950 30 7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 7654 MARTIN SALESMAN 7698 28-SEP-1981 1250 1400 30 7844 TURNER SALESMAN 7698 08-SEP-1981 1500 0 30 7499 ALLEN SALESMAN 7698 20-FEB-1981 1600 300 30 7698 BLAKE MANAGER 7839 01-MAY-1981 2850 30

DEPTNO DNAME LOC------ -------------- -------- 10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATIONS BOSTON

The emp and dept Tables

emp

dept

Page 10: Analytics ioug 2011

10Carl Dudley University of Wolverhampton, UK

Example of RankingExample of Ranking

Ranking with ROW_NUMBER

— No handling of ties

• Rows retrieved by the query are intermediately sorted on descending salary for the analysis

SELECT ROW_NUMBER() OVER( ORDER BY sal DESC) rownumber ,sal ,enameFROM empORDER BY sal DESC;

ROWNUMBER SAL ENAME--------- ---- ----- 1 5000 KING 2 3000 SCOTT 3 3000 FORD 4 2975 JONES 5 2850 BLAKE 6 2450 CLARK 7 1600 ALLEN 8 1500 TURNER 9 1300 MILLER 10 1250 WARD 11 1250 MARTIN 12 1100 ADAMS 13 950 JAMES 14 800 SMITH

— If the final ORDER BY specifies the same sort order as the OVER clause only one sort is required

— ROW_NUMBER is different from ROWNUM

Page 11: Analytics ioug 2011

11Carl Dudley University of Wolverhampton, UK

Different Sort Order in Final ORDER BYDifferent Sort Order in Final ORDER BY

If the OVER clause sort is different from the final ORDER BY

— An extra sort step is required

SELECT ROW_NUMBER() OVER( ORDER BY sal DESC) rownumber ,sal ,enameFROM empORDER BY ename;

ROWNUMBER SAL ENAME--------- ---- ------ 12 1100 ADAMS 7 1600 ALLEN 5 2850 BLAKE 6 2450 CLARK 3 3000 FORD 13 950 JAMES 4 2975 JONES 1 5000 KING 11 1250 MARTIN 9 1300 MILLER 2 3000 SCOTT 14 800 SMITH 8 1500 TURNER 10 1250 WARD

Page 12: Analytics ioug 2011

12Carl Dudley University of Wolverhampton, UK

Multiple Functions With Different Sort OrderMultiple Functions With Different Sort Order

Multiple OVER clauses can be used

SELECT ROW_NUMBER() OVER(ORDER BY sal DESC) sal_n ,sal ,ROW_NUMBER() OVER(ORDER BY comm DESC NULLS LAST) comm_n ,comm ,enameFROM empORDER BY ename;

Page 13: Analytics ioug 2011

13Carl Dudley University of Wolverhampton, UK

RANK and DENSE_RANKRANK and DENSE_RANK

ROW_NUMBER increases even if several rows have identical values

— Does not handle ties

RANK and DENSE_RANK handle ties

— Rows with the same value are given the same rank— After the tie value, RANK skips numbers, DENSE_RANK does not

Ranking using analytic functions has better performance, because the table is not read repeatedly

Page 14: Analytics ioug 2011

14Carl Dudley University of Wolverhampton, UK

RANK and DENSE_RANK (continued)RANK and DENSE_RANK (continued)

SELECT ROW_NUMBER() OVER(ORDER BY sal DESC) rownumber ,RANK() OVER(ORDER BY sal DESC) rank ,DENSE_RANK() OVER(ORDER BY sal DESC) denserank ,sal ,enameFROM emp ORDER BY sal DESC,ename;

ROWNUMBER RANK DENSERANK SAL ENAME--------- ---- ---------- ----- ------ 1 1 1 5000 KING 2 2 2 3000 FORD 3 2 2 3000 SCOTT 4 4 3 2975 JONES 5 5 4 2850 BLAKE 6 6 5 2450 CLARK 7 7 6 1600 ALLEN 8 8 7 1500 TURNER 9 9 8 1300 MILLER 10 10 9 1250 MARTIN 11 10 9 1250 WARD 12 12 10 1100 ADAMS 13 13 11 950 JAMES 14 14 12 800 SMITH

Multiple OVER clauses may be used specifying different orderings

Page 15: Analytics ioug 2011

15Carl Dudley University of Wolverhampton, UK

Analytic Function in ORDER BYAnalytic Function in ORDER BY

Analytic functions are computed before the final ordering— Can be referenced in the final ORDER BY clause— An alias is used in this case

SELECT RANK() OVER( ORDER BY sal DESC) sal_rank ,sal ,enameFROM empORDER BY sal_rank ,ename;

SAL_RANK SAL ENAME-------- ---- ------ 1 5000 KING 2 3000 FORD 2 3000 SCOTT 4 2975 JONES 5 2850 BLAKE 6 2450 CLARK 7 1600 ALLEN 8 1500 TURNER 9 1300 MILLER 10 1250 MARTIN 10 1250 WARD 12 1100 ADAMS 13 950 JAMES 14 800 SMITH

Page 16: Analytics ioug 2011

16Carl Dudley University of Wolverhampton, UK

WHERE ConditionsWHERE Conditions

Analytic (window) functions are computed after the WHERE condition and hence not available in the WHERE clause

SELECT RANK() OVER(ORDER BY sal DESC) rank ,sal ,enameFROM empWHERE RANK() OVER(ORDER BY sal DESC) <= 5ORDER BY rank WHERE RANK() OVER(ORDER BY sal DESC) <= 5 *ERROR at line 5:ORA-30483: window functions are not allowed here

Page 17: Analytics ioug 2011

17Carl Dudley University of Wolverhampton, UK

WHERE Conditions (continued)WHERE Conditions (continued)

Use an inline view to force the early processing of the analytic

SELECT *FROM (SELECT RANK() OVER(ORDER BY sal DESC) rank ,sal ,ename FROM emp)WHERE rank <= 5ORDER BY rank ,ename;  RANK SAL ENAME---------- ---------- ---------- 1 5000 KING 2 3000 FORD 2 3000 SCOTT 4 2975 JONES 5 2850 BLAKE

— Inline view is processed before the WHERE clause

Page 18: Analytics ioug 2011

18Carl Dudley University of Wolverhampton, UK

Grouping, Aggregate Functions and AnalyticsGrouping, Aggregate Functions and Analytics

Rank the departments by number of employees

SELECT deptno ,COUNT(*) employees ,RANK() OVER(ORDER BY COUNT(*) DESC) rankFROM empGROUP BY deptnoORDER BY employees ,deptno;

DEPTNO EMPLOYEES RANK------ ---------- --------- 10 3 3 20 5 2 30 6 1 Analytic functions are illegal in the HAVING clause— The workaround is the same; use an inline view— Ordering subclause may not reference a column alias

Page 19: Analytics ioug 2011

19Carl Dudley University of Wolverhampton, UK

Analytic FunctionsAnalytic Functions

Overview of Analytic Functions

Ranking Functions

Partitioning

Aggregate Functions

Sliding Windows

Row Comparison Functions

Analytic Function Performance

Page 20: Analytics ioug 2011

20Carl Dudley University of Wolverhampton, UK

PartitioningPartitioning

Analytic functions can be applied to logical groups within the result set rather than the full result set— Partitions

— PARTITION BY specifies the grouping— ORDER BY specifies the ordering within each group— Not connected with database table partitioning

If partitioning is not specified, the full result set behaves as one partition

NULL values are grouped together in one partition, as in GROUP BY

Can have multiple analytic functions with different partitioning subclauses

... OVER(PARTITION BY mgr ORDER BY sal DESC)

Page 21: Analytics ioug 2011

21Carl Dudley University of Wolverhampton, UK

Partitioning ExamplePartitioning Example

Rank employees by salary within their manager

SELECT ename ,mgr ,sal ,RANK() OVER(PARTITION BY mgr ORDER BY sal DESC) m_rankFROM empORDER BY mgr ,m_rank; ENAME MGR SAL M_RANK---------- ---------- ---------- ----------SCOTT 7566 3000 1FORD 7566 3000 1

ALLEN 7698 1600 1TURNER 7698 1500 2WARD 7698 1250 3MARTIN 7698 1250 3JAMES 7698 950 5

MILLER 7782 1300 1

ADAMS 7788 1100 1

JONES 7839 2975 1BLAKE 7839 2850 2CLARK 7839 2450 3

SMITH 7902 800 1

KING 5000 1

Page 22: Analytics ioug 2011

22Carl Dudley University of Wolverhampton, UK

Result Sets With Different PartitioningResult Sets With Different Partitioning

Rank the employees by salary within their manager, within the year they were hired, as well as overall

SELECT ename ,sal ,manager ,RANK() OVER(PARTITION BY mgr ORDER BY sal DESC) m_rank ,TRUNC(TO_NUMBER(TO_CHAR(date_hired,'YYYY'))) year_hired ,RANK() OVER(PARTITION BY TRUNC(TO_NUMBER(TO_CHAR(date_hired,'YYYY')) ORDER BY sal DESC) d_rank ,RANK() OVER(ORDER BY sal DESC) rankFROM empORDER BY rank ,ename;

Page 23: Analytics ioug 2011

23Carl Dudley University of Wolverhampton, UK

Result Sets With Different Partitioning (continued)Result Sets With Different Partitioning (continued)

ENAME SAL MGR M_RANK YEAR_HIRED D_RANK RANK------- ---- ---- ---------- ---------- ---------- ----------KING 5000 1 1981 1 1FORD 3000 7566 1 1981 2 2SCOTT 3000 7566 1 1987 1 2JONES 2975 7839 1 1981 3 4BLAKE 2850 7839 2 1981 4 5CLARK 2450 7839 3 1981 5 6ALLEN 1600 7698 1 1981 6 7TURNER 1500 7698 2 1981 7 8MILLER 1300 7782 1 1982 1 9MARTIN 1250 7698 3 1981 8 10WARD 1250 7698 3 1981 8 10ADAMS 1100 7788 1 1987 2 12JAMES 950 7698 5 1981 10 13SMITH 800 7902 1 1980 1 14

Page 24: Analytics ioug 2011

24Carl Dudley University of Wolverhampton, UK

Hypothetical RankHypothetical Rank

Rank a specified hypothetical value (2999) in a group ('what-if' query)

SELECT RANK(2999) WITHIN GROUP (ORDER BY sal DESC) H_S_rank ,PERCENT_RANK(2999) WITHIN GROUP (ORDER BY sal DESC) PR ,CUME_DIST(2999) WITHIN GROUP (ORDER BY sal DESC) CDFROM emp;

H_S_RANK PR CD-------- ---------- ---------- 4 .214285714 .266666667

3/14 4/15

SELECT deptno ,RANK(20,'CLERK') WITHIN GROUP (ORDER BY deptno DESC,job ASC) H_D_J_rankFROM empGROUP BY deptno;

DEPTNO H_D_J_RANK ------ ---------- 10 1 20 3 30 7

A clerk in 20 would be higher than anyone in 10

A clerk would be third in ascending joborder in department 20 (below analysts)

A clerk in 20 would be lower than anyone in 30 (6 employees)

Page 25: Analytics ioug 2011

25Carl Dudley University of Wolverhampton, UK

Frequent Itemsets (dbms_frequent_itemset)Frequent Itemsets (dbms_frequent_itemset)

Typical question— When a customer buys product x, how likely are they to also buy product y?SELECT CAST(itemset AS fi_char) itemset ,support ,length ,total_tranxFROM TABLE(DBMS_FREQUENT_ITEMSET.FI_TRANSACTIONAL( CURSOR(SELECT TO_CHAR(sales.cust_id) ,TO_CHAR(sales.prod_id) FROM sh.sales ,sh.products WHERE products.prod_id = sales.prod_id AND products.prod_subcategory = 'Documentation'), 0.5, 2, 3, NULL, NULL));

ITEMSET SUPPORT LENGTH TOTAL_TRANX-------------------------------------- --------- ---------- -----------FI_CHAR('40', '41') 3692 2 6077FI_CHAR('40', '42') 3900 2 6077FI_CHAR('40', '45') 3482 2 6077FI_CHAR('41', '42') 3163 2 6077FI_CHAR('40', '41', '42') 3141 3 6077

mimimum items in set

maximum items in set

2 or 3 items per set

Number of instances

exclude items

include itemsNumber of Different customers

Minimum fraction of different 'Documentation' customers having this combination

Page 26: Analytics ioug 2011

26Carl Dudley University of Wolverhampton, UK

Frequent Itemsets (continued)Frequent Itemsets (continued)

Need to create type to accommodate the set— Ranking functions can be applied to the itemsetCREATE TYPE fi_char AS TABLE OF VARCHAR2(100);

Itemsets containing certain items can be included/excluded

,CURSOR(SELECT * FROM table(fi_char(40,45))),CURSOR(SELECT * FROM table(fi_char(42)))

Include any sets involving 40 or 45

Exclude any sets involving 42

— Ranking functions can be applied to the itemset

SELECT COUNT(DISTINCT cust_id) FROM sales WHERE prod_id BETWEEN 40 AND 45;

COUNT(DISTINCTCUST_ID)---------------------- 6077

The total transactions (TOTAL_TRANX) is the number of different customers involved with any product within the set of products under examination

prod_ids for'Documentation'

Page 27: Analytics ioug 2011

27Carl Dudley University of Wolverhampton, UK

Plan of Itemset QueryPlan of Itemset Query

Only one full table scan of sales--------------------------------------------------------------------------------|Id | Operation | Name |Rows |--------------------------------------------------------------------------------| 0| SELECT STATEMENT | | 8|| 1| FIC RECURSIVE ITERATION | | || 2| FIC LOAD ITEMSETS | | || 3| FREQUENT ITEMSET COUNTING | | 8|| 4| SORT GROUP BY NOSORT | | || 5| BITMAP CONVERSION COUNT | | || 6| FIC LOAD BITMAPS | | || 7| SORT CREATE INDEX | | 500|| 8| BITMAP CONSTRUCTION | | || 9| FIC ENUMERATE FEED | | || 10| SORT ORDER BY | |43755||*11| HASH JOIN | |43755|| 12| TABLE ACCESS BY INDEX ROWID| PRODUCTS | 3 ||*13| INDEX RANGE SCAN | PRODUCTS_PROD_SUBCAT_IX | 3 || 14| PARTITION RANGE ALL | | 918K|| 15| TABLE ACCESS FULL | SALES | 918K|| 16| TABLE ACCESS FULL | SYS_TEMP_0FD9D6605_153B1EE| |--------------------------------------------------------------------------------

Page 28: Analytics ioug 2011

28Carl Dudley University of Wolverhampton, UK

Applying Analytics to Frequent ItemsetsApplying Analytics to Frequent Itemsets

SELECT itemset, support, length, total_tranx, rnk FROM (SELECT itemset, support, length, total_tranx ,RANK() OVER (PARTITION BY length ORDER BY support DESC) rnk FROM (SELECT CAST(ITEMSET AS fi_char) itemset ,support ,length ,total_tranx FROM TABLE(dbms_frequent_itemset.fi_transactional (CURSOR(SELECT TO_CHAR(sales.cust_id) ,TO_CHAR(sales.prod_id) FROM sh.sales ,sh.products WHERE products.prod_id = sales.prod_id AND products.prod_subcategory = 'Documentation') ,0.5 ,2 ,3 ,NULL ,NULL))))WHERE rnk < 4;

ITEMSET SUPPORT LENGTH TOTAL_TRANX RNK-------------------------------- ---------- ---------- ----------- ----------FI_CHAR('40', '42') 3900 2 6077 1FI_CHAR('40', '41') 3692 2 6077 2FI_CHAR('40', '45') 3482 2 6077 3

FI_CHAR('40', '41', '42') 3141 3 6077 1

Page 29: Analytics ioug 2011

29Carl Dudley University of Wolverhampton, UK

Analytic FunctionsAnalytic Functions

Overview of Analytic Functions

Ranking Functions

Partitioning

Aggregate Functions

Sliding Windows

Row Comparison Functions

Analytic Function Performance

Page 30: Analytics ioug 2011

Window

Partition (first) or entire result set

Partition (second)

OVER (ORDER BY col_name)ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Default value for window setting -produces an expanding window

Expanding WindowsExpanding Windows

Page 31: Analytics ioug 2011

Partition (first) or entire result setOVER (ORDER BY col_name)ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING

Produces a sliding window

Window5 ROWS

Partition (second)

3 ROWS

Sliding WindowsSliding Windows

Page 32: Analytics ioug 2011

32Carl Dudley University of Wolverhampton, UK

Aggregate FunctionsAggregate Functions

Aggregate functions can be used as analytic functions

— Must be embedded in the OVER clause

Analytic aggregate values can be easily included within row-level reports

— Analytic functions are applied after computation of result set— Optimizer often produces a better execution plan

Aggregate level is determined by the partitioning subclause

— Similar effect to GROUP BY clause— If no partitioning subclause, aggregate is across the complete result set

Page 33: Analytics ioug 2011

33Carl Dudley University of Wolverhampton, UK

Aggregate Functions – the OVER ClauseAggregate Functions – the OVER Clause

Could easily include row-level data— e.g. ename and sal

SELECT deptno ,AVG(sal) FROM emp GROUP BY deptno;

DEPTNO AVG(SAL)---------- ---------- 30 1566.66667 20 2175 10 2916.66667

SELECT deptno ,AVG(sal) OVER (PARTITION BY deptno) avg_dept ,AVG(sal) OVER () avg_all FROM emp;

DEPTNO AVG_DEPT AVG_ALL---------- ---------- ---------- 10 2916.66667 2073.21429 10 2916.66667 2073.21429 10 2916.66667 2073.21429 20 2175 2073.21429 20 2175 2073.21429 20 2175 2073.21429 20 2175 2073.21429 20 2175 2073.21429 30 1566.66667 2073.21429 30 1566.66667 2073.21429 30 1566.66667 2073.21429 30 1566.66667 2073.21429 30 1566.66667 2073.21429 30 1566.66667 2073.21429

Analytic aggregatescause no reduction in rows

No subclause

Page 34: Analytics ioug 2011

34Carl Dudley University of Wolverhampton, UK

Analytic versus Conventional SQL PerformanceAnalytic versus Conventional SQL Performance

The requirement— Data at different levels of grouping

ENAME SAL DEPTNO AVG_DEPT AVG_ALL------ ---- ------ ---------- ----------CLARK 2450 10 2916.66667 2073.21429KING 5000 10 2916.66667 2073.21429MILLER 1300 10 2916.66667 2073.21429JONES 2975 20 2175 2073.21429FORD 3000 20 2175 2073.21429ADAMS 1100 20 2175 2073.21429SMITH 800 20 2175 2073.21429SCOTT 3000 20 2175 2073.21429WARD 1250 30 1566.66667 2073.21429TURNER 1500 30 1566.66667 2073.21429ALLEN 1600 30 1566.66667 2073.21429JAMES 950 30 1566.66667 2073.21429BLAKE 2850 30 1566.66667 2073.21429MARTIN 1250 30 1566.66667 2073.21429

Average sal per department

Overall average sal

Page 35: Analytics ioug 2011

35Carl Dudley University of Wolverhampton, UK

Conventional SQL PerformanceConventional SQL Performance

SELECT r.ename,r.sal,g.deptno,g.ave_dept,a.ave_allFROM emp r ,(SELECT deptno,AVG(sal) ave_dept FROM emp GROUP BY deptno) g ,(SELECT AVG(sal) ave_all FROM emp) a WHERE g.deptno = r.deptnoORDER BY r.deptno;-----------------------------------------------| Id | Operation | Name | Rows |-----------------------------------------------| 0 | SELECT STATEMENT | | 15 || 1 | MERGE JOIN | | 15 || 2 | SORT JOIN | | 3 || 3 | NESTED LOOPS | | 3 || 4 | VIEW | | 1 || 5 | SORT AGGREGATE | | 1 || 6 | TABLE ACCESS FULL| EMP | 14 || 7 | VIEW | | 3 || 8 | SORT GROUP BY | | 3 || 9 | TABLE ACCESS FULL| EMP | 14 ||* 10 | SORT JOIN | | 14 || 11 | TABLE ACCESS FULL | EMP | 14 |-----------------------------------------------

1M row emp table :

48.35 seconds230790 consistent gets

Page 36: Analytics ioug 2011

36Carl Dudley University of Wolverhampton, UK

Analytic Function PerformanceAnalytic Function Performance

SELECT ename,sal,deptno ,AVG(sal) OVER (PARTITION BY deptno) ave_dept ,AVG(sal) OVER () ave_allFROM emp;

-------------------------------------------| Id | Operation | Name | Rows |-------------------------------------------| 0 | SELECT STATEMENT | | 14 || 1 | WINDOW SORT | | 14 || 2 | TABLE ACCESS FULL| EMP | 14 |-------------------------------------------

1M row emp table :

21.20 seconds76930 consistent gets

Page 37: Analytics ioug 2011

37Carl Dudley University of Wolverhampton, UK

Aggregating Over an Ordered Set of Rows – Running TotalsAggregating Over an Ordered Set of Rows – Running Totals

The ORDER BY clause creates an expanding window (running total) of rowsSELECT empno ,ename ,sal ,SUM(sal) OVER(ORDER BY empno) run_totalFROM emp5ORDER BY empno;

EMPNO ENAME SAL RUN_TOTAL----- ------ ---- --------- 7369 SMITH 800 800 7499 ALLEN 1600 2400 7521 WARD 1250 3650 7566 JONES 2975 6625 7654 MARTIN 1250 7875 7698 BLAKE 2850 10725 7782 CLARK 2450 13175 7788 SCOTT 3000 16175 7839 KING 5000 21175 7844 TURNER 1500 22675 7876 ADAMS 1100 23775 7900 JAMES 950 24725 7902 FORD 3000 27725 7934 MILLER 1300 29025 : : : :

-------------------------------|Id| Operation | Name|-------------------------------| 0| SELECT STATEMENT | || 1| WINDOW SORT | || 2| TABLE ACCESS FULL| EMP5|-------------------------------

emp table of 5000 rows0.07 seconds33 consistent getsNo index necessary

Page 38: Analytics ioug 2011

38Carl Dudley University of Wolverhampton, UK

Running Total With Conventional SQL (1)Running Total With Conventional SQL (1)

Self-join solution

SELECT e1.empno ,e1.sal ,SUM(e2.sal) FROM emp5 e1, emp5 e2WHERE e2.empno <= e1.empno GROUP BY e1.empno, e1.salORDER BY e1.empno;

-------------------------------------------------| Id | Operation | Name |-------------------------------------------------| 0 | SELECT STATEMENT | || 1 | SORT GROUP BY | || 2 | MERGE JOIN | || 3 | SORT JOIN | || 4 | TABLE ACCESS BY INDEX ROWID| EMP5 || 5 | INDEX FULL SCAN | PK_EMP5||* 6 | SORT JOIN | || 7 | TABLE ACCESS FULL | EMP5 |-------------------------------------------------

13.37 seconds

66 consistent gets

Page 39: Analytics ioug 2011

39Carl Dudley University of Wolverhampton, UK

Running Total With Conventional SQL (2)Running Total With Conventional SQL (2)

Subquery in SELECT list solution – column expression

SELECT empno ,ename ,sal ,(SELECT SUM(sal) sumsal FROM emp5 WHERE empno <= b.empno) aFROM emp5 bORDER BY empno;

-----------------------------------------------| Id | Operation | Name |-----------------------------------------------| 0 | SELECT STATEMENT | || 1 | SORT AGGREGATE | || 2 | TABLE ACCESS BY INDEX ROWID| EMP5 ||* 3 | INDEX RANGE SCAN | PK_EMP5|| 4 | TABLE ACCESS BY INDEX ROWID | EMP5 || 5 | INDEX FULL SCAN | PK_EMP5|-----------------------------------------------

4.62 seconds

97948 consistent gets

Page 40: Analytics ioug 2011

40Carl Dudley University of Wolverhampton, UK

Aggregate Functions With PartitioningAggregate Functions With Partitioning

Find average salary of employees within each manager

— Use PARTITION BY to specify the grouping

SELECT ename, mgr, sal ,ROUND(AVG(sal) OVER(PARTITION BY mgr)) avgsal ,sal - ROUND(AVG(sal) OVER(PARTITION BY mgr)) diffFROM emp;

ENAME MGR SAL AVGSAL DIFF---------- ------- ---------- ---------- ----------SCOTT 7566 3000 3000 0FORD 7566 3000 3000 0

ALLEN 7698 1600 1310 290WARD 7698 1250 1310 -60JAMES 7698 950 1310 -360TURNER 7698 1500 1310 190MARTIN 7698 1250 1310 -60

MILLER 7782 1300 1300 0

ADAMS 7788 1100 1100 0

JONES 7839 2975 2758 217CLARK 7839 2450 2758 -308BLAKE 7839 2850 2758 92

SMITH 7902 800 800 0

KING 5000 5000 0

Page 41: Analytics ioug 2011

41Carl Dudley University of Wolverhampton, UK

SELECT deptno ,SUM(sal) ,SUM(SUM(sal)) OVER () Totsal ,SUM(SUM(sal)) OVER (ORDER BY deptno) Runtot_deptno ,SUM(SUM(sal)) OVER (ORDER BY SUM(sal)) Runtot_sumsalFROM empGROUP BY deptnoORDER BY deptno;

DEPTNO SUM(SAL) TOTSAL RUNTOT_DEPTNO RUNTOT_SUMSAL------ -------- ------ ------------- ------------- 10 8750 29025 8750 8750

20 10875 29025 19625 29025

30 9400 29025 29025 18150

Analytics on AggregatesAnalytics on Aggregates

Analytics are processed last

+ sum(20)

+ sum(20)

+ sum(30)

+ sum(30)

Page 42: Analytics ioug 2011

42Carl Dudley University of Wolverhampton, UK

Aggregate Functions and the WHERE clauseAggregate Functions and the WHERE clause

Analytic functions are applied after production of the complete result set

— Rows excluded by the WHERE clause are not included in the aggregate value

Include only employees whose name starts with a ‘S’ or ‘M’

— The average is now only for those rows starting with 'S' or 'M'

SELECT ename ,sal ,ROUND(AVG(sal) OVER()) avgsal ,sal - ROUND(AVG(sal) OVER()) diffFROM empWHERE ename LIKE 'S%' OR ename LIKE 'M%';

ENAME SAL AGSAL DIFF------ ---- ----- -----SMITH 800 1588 -788MARTIN 1250 1588 338SCOTT 3000 1588 1412MILLER 1300 1588 -288

Page 43: Analytics ioug 2011

43Carl Dudley University of Wolverhampton, UK

RATIO_TO_REPORTRATIO_TO_REPORT

Each row’s fraction of total salary can easily be found when the total salary value is available

— Example: sal/SUM(sal) OVER()— The function RATIO_TO_REPORT performs this calculation

SELECT ename ,sal ,SUM(sal) OVER() sumsal ,sal/SUM(sal) OVER() ratio ,RATIO_TO_REPORT(sal) OVER() ratio_repFROM emp;

Page 44: Analytics ioug 2011

44Carl Dudley University of Wolverhampton, UK

RATIO_TO_REPORT (continued)RATIO_TO_REPORT (continued)

The query on the previous slide gives this result

ENAME SAL SUMSAL RATIO RATIO_REP---------- ------- ---------- ---------- ----------SMITH 800 29025 .027562446 .027562446ALLEN 1600 29025 .055124892 .055124892WARD 1250 29025 .043066322 .043066322JONES 2975 29025 .102497847 .102497847MARTIN 1250 29025 .043066322 .043066322BLAKE 2850 29025 .098191214 .098191214CLARK 2450 29025 .084409991 .084409991SCOTT 3000 29025 .103359173 .103359173KING 5000 29025 .172265289 .172265289TURNER 1500 29025 .051679587 .051679587ADAMS 1100 29025 .037898363 .037898363JAMES 950 29025 .032730405 .032730405FORD 3000 29025 .103359173 .103359173MILLER 1300 29025 .044788975 .044788975

Page 45: Analytics ioug 2011

45Carl Dudley University of Wolverhampton, UK

Analytic FunctionsAnalytic Functions

Overview of Analytic Functions

Ranking Functions

Partitioning

Aggregate Functions

Sliding Windows

Row Comparison Functions

Analytic Function Performance

Page 46: Analytics ioug 2011

46Carl Dudley University of Wolverhampton, UK

Sliding WindowsSliding Windows

The OVER clause can have a sliding window subclause — Not permitted without ORDER BY subclause

— Specifies size of window (set of rows) to be processed by the analytic function

— Window defined relative to current row • Slides through result set as different rows become current

Size of window is governed by ROWS or RANGE

— ROWS • physical offset, a number of rows relative to the current row

— RANGE • logical offset, a value interval relative to value in current row

Syntax for sliding window :

— BETWEEN <starting point> AND <ending point>

Page 47: Analytics ioug 2011

47Carl Dudley University of Wolverhampton, UK

Sliding Windows Example Sliding Windows Example

For each employee, show the sum of the salaries of the preceding, current, and following employee (row)

— Window includes current row as well as the preceding and following ones— Must have order subclause for “preceding” and “following” to be meaningful— First row has no preceding row and last row has no following row

SELECT ename ,sal ,SUM(sal) OVER(ORDER BY sal DESC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) sal_windowFROM empORDER BY sal DESC ,ename;

Page 48: Analytics ioug 2011

48Carl Dudley University of Wolverhampton, UK

Sliding Windows Example (continued)Sliding Windows Example (continued)

Calculation: 

=5000+3000=5000+3000+3000=3000+3000+2975=3000+2975+2850=2975+2850+2450=2850+2450+1600=2450+1600+1500=1600+1500+1300=1500+1300+1250=1300+1250+1250=1250+1250+1100=1250+1100+950=1100+950+800=950+800

ENAME SAL SAL_WINDOW---------- ---------- ----------KING 5000 8000FORD 3000 11000SCOTT 3000 8975JONES 2975 8825BLAKE 2850 8275CLARK 2450 6900ALLEN 1600 5550TURNER 1500 4400MILLER 1300 4050MARTIN 1250 3800WARD 1250 3600ADAMS 1100 3300JAMES 950 2850SMITH 800 1750

Page 49: Analytics ioug 2011

49Carl Dudley University of Wolverhampton, UK

Partitioned Sliding WindowsPartitioned Sliding Windows

Partitioning can be used with sliding windows

— A sliding window does not span partitions

SELECT ename ,job ,sal ,SUM(sal) OVER(PARTITION BY job ORDER BY sal DESC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) sal_windowFROM empORDER BY job ,sal DESC ,ename;

Page 50: Analytics ioug 2011

50Carl Dudley University of Wolverhampton, UK

ENAME JOB SAL SAL_WINDOW---------- --------- ---------- ----------FORD ANALYST 3000 6000SCOTT ANALYST 3000 6000

MILLER CLERK 1300 2400ADAMS CLERK 1100 3350JAMES CLERK 950 2850SMITH CLERK 800 1750

JONES MANAGER 2975 5825BLAKE MANAGER 2850 8275CLARK MANAGER 2450 5300

KING PRESIDENT 5000 5000

ALLEN SALESMAN 1600 3100TURNER SALESMAN 1500 4350MARTIN SALESMAN 1250 4000WARD SALESMAN 1250 2500

Partitioned Sliding Windows (continued)Partitioned Sliding Windows (continued)

Calculation=3000+3000 =3000+3000

=1300+1100=1300+1100+950=1100+950+800=950+800

=2975+2850=2975+2850+2450=2850+2450

=5000

=1600+1500=1600+1500+1250=1500+1250+1250=1250+1250

Page 51: Analytics ioug 2011

51Carl Dudley University of Wolverhampton, UK

Sliding Window With Logical (RANGE) OffsetSliding Window With Logical (RANGE) Offset

Physical offset— Specified number of rows

Logical offset— A RANGE of values

• Numeric or date— Values in the ordering column indirectly determine number of rows in

window

SELECT ename ,sal ,SUM(sal) OVER(ORDER BY sal DESC RANGE BETWEEN 150 PRECEDING AND 75 FOLLOWING) sal_windowFROM empORDER BY sal DESC ,ename;

Page 52: Analytics ioug 2011

52Carl Dudley University of Wolverhampton, UK

Sliding Window With Logical (RANGE) Offset (continued)Sliding Window With Logical (RANGE) Offset (continued)

ENAME SAL SAL_WINDOW---------- ---------- ----------KING 5000 5000FORD 3000 8975SCOTT 3000 8975JONES 2975 8975BLAKE 2850 11825CLARK 2450 2450ALLEN 1600 1600TURNER 1500 3100MILLER 1300 3800MARTIN 1250 3800WARD 1250 3800ADAMS 1100 3600JAMES 950 2050SMITH 800 1750

Range for this row is 3000 to 2775

Page 53: Analytics ioug 2011

53Carl Dudley University of Wolverhampton, UK

UNBOUNDED and CURRENT ROWUNBOUNDED and CURRENT ROW

Sliding windows have starting and ending points

— BETWEEN <starting point> AND <ending point>

Ways for specifying starting and ending points

— UNBOUNDED PRECEDING specifies the first row as starting point— UNBOUNDED FOLLOWING specifies the last row as ending point— CURRENT ROW specifies the current row

Create a window that grows with each row in ename order— The RANGE clause is not necessary if a running total is required (default)

SELECT ename ,sal ,SUM(sal) OVER(ORDER BY ename RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) run_totalFROM empORDER BY ename;

Page 54: Analytics ioug 2011

54Carl Dudley University of Wolverhampton, UK

Keywords UNBOUNDED and CURRENT ROW(continued)Keywords UNBOUNDED and CURRENT ROW(continued)

Running Total

— Produced by default 'expanding' window when window not specified

ENAME SAL RUN_TOTAL---------- ---------- ----------ADAMS 1100 1100ALLEN 1600 2700BLAKE 2850 5550CLARK 2450 8000FORD 3000 11000JAMES 950 11950JONES 2975 14925KING 5000 19925MARTIN 1250 21175MILLER 1300 22475SCOTT 3000 25475SMITH 800 26275TURNER 1500 27775WARD 1250 29025

Explanation: =1100=1600+1100=2700+2850=5550+2450=8000+3000=11000+950=11950+2975=14925+5000=19925+1250=21175+1300=22475+3000=25475+800=26275+1500=27775+1250

Page 55: Analytics ioug 2011

55Carl Dudley University of Wolverhampton, UK

Keywords UNBOUNDED and CURRENT ROW(continued)Keywords UNBOUNDED and CURRENT ROW(continued)

Be aware of the subtle difference between RANGE and ROWS in this context— Apparent only when adjacent rows have equal values

SELECT ename ,sal ,SUM(sal) OVER(ORDER BY sal DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) row_tot ,SUM(sal) OVER(ORDER BY sal DESC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) range_tot ,SUM(sal) OVER(ORDER BY sal DESC) default_totFROM EMPORDER BY sal DESC ,ename;

Page 56: Analytics ioug 2011

56Carl Dudley University of Wolverhampton, UK

Difference between ROWS and RANGEDifference between ROWS and RANGE

Ford and Scott fall within the same range - also applies to Martin and Ward

— For example Scott is included in range when the value for Ford is calculated

ENAME SAL ROW_TOT RANGE_TOT DEFAULT_TOT---------- ---------- ---------- --------- -----------KING 5000 5000 5000 5000FORD 3000 8000 11000 11000SCOTT 3000 11000 11000 11000JONES 2975 13975 13975 13975BLAKE 2850 16825 16825 16825CLARK 2450 19275 19275 19275ALLEN 1600 20875 20875 20875TURNER 1500 22375 22375 22375MILLER 1300 23675 23675 23675MARTIN 1250 24925 26175 26175WARD 1250 26175 26175 26175ADAMS 1100 27275 27275 27275JAMES 950 28225 28225 28225SMITH 800 29025 29025 29025

Page 57: Analytics ioug 2011

57Carl Dudley University of Wolverhampton, UK

Time IntervalsTime Intervals

Sliding windows are often based on time intervals

Example: Compare the salary of each employee to the maximum and minimum salaries of hirings made within three months of their own hiring date

SELECT ename ,hiredate ,sal ,MIN(sal) OVER(ORDER BY hiredate RANGE BETWEEN INTERVAL '3' MONTH PRECEDING AND INTERVAL '3' MONTH FOLLOWING) min ,MAX(sal) OVER(ORDER BY hiredate RANGE BETWEEN INTERVAL '3' MONTH PRECEDING AND INTERVAL '3' MONTH FOLLOWING) maxFROM emp;

Page 58: Analytics ioug 2011

58Carl Dudley University of Wolverhampton, UK

Time Intervals(continued)Time Intervals(continued)

Sliding time window

ENAME HIREDATE SAL MIN MAX---------- --------- ---------- ---------- ----------SMITH 17-DEC-80 800 800 1600ALLEN 20-FEB-81 1600 800 2975WARD 22-FEB-81 1250 800 2975JONES 02-APR-81 2975 1250 2975BLAKE 01-MAY-81 2850 1250 2975CLARK 09-JUN-81 2450 1500 2975TURNER 08-SEP-81 1500 950 5000MARTIN 28-SEP-81 1250 950 5000KING 17-NOV-81 5000 950 5000JAMES 03-DEC-81 950 950 5000FORD 03-DEC-81 3000 950 5000MILLER 23-JAN-82 1300 950 5000SCOTT 09-DEC-82 3000 1100 3000ADAMS 12-JAN-83 1100 1100 3000

Page 59: Analytics ioug 2011

59Carl Dudley University of Wolverhampton, UK

Analytic FunctionsAnalytic Functions

Overview of Analytic Functions

Ranking Functions

Partitioning

Aggregate Functions

Sliding Windows

Row Comparison Functions

Analytic Function Performance

Page 60: Analytics ioug 2011

60Carl Dudley University of Wolverhampton, UK

LAG and LEAD FunctionsLAG and LEAD Functions

Useful for comparing values across rows — Need to specify count of rows which separate target row from current row

• No need for self-join— LAG provides access to a row at a given offset prior to the current position— LEAD provides access to a row at a given offset after the current position

— offset is an optional parameter and defaults to 1— default is an optional parameter and is the value returned if offset falls

outside the bounds of the table or partition• In this case, NULL will be returned if no default is specified

{LAG | LEAD} ( value_expr [, offset] [, default] ) OVER ( [query_partition_clause] order_by_clause )

Page 61: Analytics ioug 2011

61Carl Dudley University of Wolverhampton, UK

LAG/LEAD Simple ExampleLAG/LEAD Simple Example

SELECT hiredate ,sal AS salary ,LAG(sal,1) OVER (ORDER BY hiredate) AS LAG1 ,LEAD(sal,1) OVER (ORDER BY hiredate) AS LEAD1FROM emp;

HIREDATE SALARY LAG1 LEAD1--------- ---------- ---------- ----------17-DEC-80 800 160020-FEB-81 1600 800 125022-FEB-81 1250 1600 297502-APR-81 2975 1250 285001-MAY-81 2850 2975 245009-JUN-81 2450 2850 150008-SEP-81 1500 2450 125028-SEP-81 1250 1500 500017-NOV-81 5000 1250 95003-DEC-81 950 5000 300003-DEC-81 3000 950 130023-JAN-82 1300 3000 300009-DEC-82 3000 1300 110012-JAN-83 1100 3000

Comparison of salaries with those for nearest

recruits in terms of proximity of hiredates

Page 62: Analytics ioug 2011

62Carl Dudley University of Wolverhampton, UK

FIRST_VALUE and LAST_VALUEFIRST_VALUE and LAST_VALUE

Hold first or last value in a partition (based on ordering) as a start point

SELECT empno, deptno, hiredate ,FIRST_VALUE(hiredate) OVER (PARTITION BY deptno ORDER BY hiredate) firstdate ,hiredate - FIRST_VALUE(hiredate) OVER (PARTITION BY deptno ORDER BY hiredate) Day_GapFROM empORDER BY deptno, Day_Gap;

Days after hiring of first employee in this department

EMPNO DEPTNO HIREDATE FIRSTDATE DAY_GAP----- ------ --------- --------- ------- 7782 10 09-JUN-81 09-JUN-81 0 7839 10 17-NOV-81 09-JUN-81 161 7934 10 23-JAN-82 09-JUN-81 228

7369 20 17-DEC-80 17-DEC-80 0 7566 20 02-APR-81 17-DEC-80 106 7902 20 03-DEC-81 17-DEC-80 351 7788 20 09-DEC-82 17-DEC-80 722 7876 20 12-JAN-83 17-DEC-80 756

7499 30 20-FEB-81 20-FEB-81 0 7521 30 22-FEB-81 20-FEB-81 2 7698 30 01-MAY-81 20-FEB-81 70 7844 30 08-SEP-81 20-FEB-81 200 7654 30 28-SEP-81 20-FEB-81 220 7900 30 03-DEC-81 20-FEB-81 286

Works with partitioning and windowing subclauses

Page 63: Analytics ioug 2011

63Carl Dudley University of Wolverhampton, UK

SELECT deptno,ename,sal ,LAST_VALUE(ename) OVER (PARTITION BY deptno ORDER BY sal ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS hsal1 ,LAST_VALUE(ename) OVER (PARTITION BY deptno ORDER BY sal) AS hsal2FROM empORDER BY deptno,sal;

DEPTNO ENAME SAL HSAL1 HSAL2------ ------ ---- ---------- ---------- 10 MILLER 1300 KING MILLER 10 CLARK 2450 KING CLARK 10 KING 5000 KING KING 20 SMITH 800 SCOTT SMITH 20 ADAMS 1100 SCOTT ADAMS 20 JONES 2975 SCOTT JONES 20 FORD 3000 SCOTT SCOTT 20 SCOTT 3000 SCOTT SCOTT 30 JAMES 950 BLAKE JAMES 30 MARTIN 1250 BLAKE WARD 30 WARD 1250 BLAKE WARD 30 TURNER 1500 BLAKE TURNER 30 ALLEN 1600 BLAKE ALLEN 30 BLAKE 2850 BLAKE BLAKE

Influence of Window on LAST_VALUEInfluence of Window on LAST_VALUE

Last value in expanding window (based on range)

Page 64: Analytics ioug 2011

64Carl Dudley University of Wolverhampton, UK

Ignoring Nulls in First and Last ValuesIgnoring Nulls in First and Last Values

SELECT ename ,FIRST_VALUE (ename) OVER (PARTITION BY deptno ORDER BY ename) fv ,LAST_VALUE (ename) OVER (PARTITION BY deptno ORDER BY ename) lv ,comm ,FIRST_VALUE (comm) OVER (PARTITION BY deptno ORDER BY comm) fv_comm ,LAST_VALUE (comm) OVER (PARTITION BY deptno ORDER BY comm) lv_comm ,LAST_VALUE (comm IGNORE NULLS) OVER (PARTITION BY deptno ORDER BY comm) lv_ignoreFROM empWHERE deptno = 30;

ENAME FV LV COMM FV_COMM LV_COMM LV_IGNORE---------- ---------- ---------- ---------- ---------- ---------- ----------ALLEN ALLEN ALLEN 300 0 300 300BLAKE ALLEN BLAKE 0 1400JAMES ALLEN JAMES 0 1400MARTIN ALLEN MARTIN 1400 0 1400 1400TURNER ALLEN TURNER 0 0 0 0WARD ALLEN WARD 500 0 500 500

Highest value (1400) is 'kept' for null values

Page 65: Analytics ioug 2011

65Carl Dudley University of Wolverhampton, UK

NTH_VALUENTH_VALUE

SELECT deptno ,ename ,sal ,FIRST_VALUE(sal) OVER (PARTITION BY deptno ORDER BY sal DESC) - NTH_VALUE(sal,2) FROM FIRST OVER (PARTITION BY deptno ORDER BY sal DESC) t2_diffFROM emp; DEPTNO ENAME SAL T2_DIFF---------- ---------- ---- ------- 10 KING 5000 10 CLARK 2450 2550 10 MILLER 1300 2550 20 SCOTT 3000 0 20 FORD 3000 0 20 JONES 2975 0 20 ADAMS 1100 0 20 SMITH 800 0 30 BLAKE 2850 30 ALLEN 1600 1250 30 TURNER 1500 1250 30 MARTIN 1250 1250 30 WARD 1250 1250 30 JAMES 1250 1250

Could use FROM LAST

Reports difference between first and second member of each partition

0??

SELECT deptno ,ename ,sal ,FIRST_VALUE(sal) OVER (PARTITION BY deptno ORDER BY sal DESC) - NTH_VALUE(sal,3) FROM FIRST OVER (PARTITION BY deptno ORDER BY sal DESC) t2_diffFROM emp;

DEPTNO ENAME SAL T2_DIFF------ ---------- ---------- ---------- 10 KING 5000 10 CLARK 2450 10 MILLER 1300 3700 20 SCOTT 3000 20 FORD 3000 20 JONES 2975 25 20 ADAMS 1100 25 20 SMITH 800 25 30 BLAKE 2850 30 ALLEN 1600 30 TURNER 1500 1350 30 MARTIN 1250 1350 30 WARD 1250 1350 30 JAMES 950 1350

Page 66: Analytics ioug 2011

66Carl Dudley University of Wolverhampton, UK

LISTAGG FunctionLISTAGG Function

Example - show columns in indexes in an ordered list

SELECT table_name ,index_name ,LISTAGG(column_name,’;’) WITHIN GROUP ( ORDER BY column_position) “Column List”FROM user_ind_columnsGROUP BY table_name ,index_name;

TABLE_NAME INDEX_NAME Column List------------ ------------------ -----------------------------EMP EMP_PK EMPNOPROJ_ASST SYS_C0011223 PROJNO;EMPNO;START_DATEDEPT DEPT$DIVNO_DEPTNO DIVNO;DEPTNO

Page 67: Analytics ioug 2011

67Carl Dudley University of Wolverhampton, UK

FIRST and LASTFIRST and LAST

Compare each employee's salary with the average salary of the first year ofhirings of their department— Must use KEEP— Must use DENSE_RANK

SELECT empno ,deptno ,TO_CHAR(hiredate,'YYYY') Hire_Yr ,sal ,TRUNC(AVG(sal) KEEP (DENSE_RANK FIRST ORDER BY TO_CHAR(hiredate,'YYYY') ) OVER (PARTITION BY deptno)) Avg_Sal_Yr1_HireFROM empORDER BY deptno, empno, Hire_Yr;

EMPNO DEPTNO HIRE_YR SAL AVG_SAL_YR1_HIRE----- ---------- ------- ------- ---------------- 7782 10 1981 2450 3725 7839 10 1981 5000 3725 7934 10 1982 1300 3725

7369 20 1980 800 800 7566 20 1981 2975 800 7788 20 1982 3000 800 7876 20 1983 1100 800 7902 20 1981 3000 800

7499 30 1981 1600 1566 7521 30 1981 1250 1566 7654 30 1981 1250 1566 7698 30 1981 2850 1566 7844 30 1981 1500 1566 7900 30 1981 950 1566

Page 68: Analytics ioug 2011

68Carl Dudley University of Wolverhampton, UK

FIRST and LAST (Continued)FIRST and LAST (Continued)

Compare salaries to the average of the 'LAST' department— Note no ORDER BY inside the

OVER clause— No support for any

<window> clause

SELECT empno ,deptno ,TO_CHAR(hiredate,'YYYY') Hire_Yr ,sal ,TRUNC(AVG(sal) KEEP (DENSE_RANK LAST ORDER BY deptno ) OVER () ) AVG_SAL_LAST_DEPTFROM empORDER BY deptno, empno, Hire_Yr;

EMPNO DEPTNO Hire_Yr SAL AVG_SAL_LAST_DEPT----- ------ ------- ---- ----------------- 7782 10 1981 2450 1566 7839 10 1981 5000 1566 7934 10 1982 1300 1566 7369 20 1980 800 1566 7566 20 1981 2975 1566 7788 20 1982 3000 1566 7876 20 1983 1100 1566 7902 20 1981 3000 1566 7499 30 1981 1600 1566 7521 30 1981 1250 1566 7654 30 1981 1250 1566 7698 30 1981 2850 1566 7844 30 1981 1500 1566 7900 30 1981 950 1566

Page 69: Analytics ioug 2011

69Carl Dudley University of Wolverhampton, UK

Bus TimesBus TimesSELECT route,stop,bus,TO_CHAR(bustime,'DD-MON-YYYY HH24.MI.SS') bustime FROM bustimes ORDER BY route,stop,bustime;

ROUTE STOP BUS BUSTIME---------- -------- -------- -------------------- 1 1 10 01-MAR-2011 12.17.33 1 1 30 01-MAR-2011 12.58.10 1 1 20 01-MAR-2011 13.58.41 1 1 40 01-MAR-2011 14.06.13 1 1 50 01-MAR-2011 14.11.45 1 2 10 01-MAR-2011 12.56.19 1 2 30 01-MAR-2011 13.00.09 1 2 40 01-MAR-2011 14.20.45 1 2 50 01-MAR-2011 14.24.01 1 2 20 01-MAR-2011 14.31.04 1 3 10 01-MAR-2011 13.58.53 1 3 40 01-MAR-2011 14.35.58 1 3 20 01-MAR-2011 14.58.41 1 3 50 01-MAR-2011 15.18.09 1 3 30 01-MAR-2011 15.28.33 1 4 10 01-MAR-2011 14.17.33 1 4 40 01-MAR-2011 15.11.26 1 4 30 01-MAR-2011 15.30.30 1 4 20 01-MAR-2011 15.42.25 1 4 50 01-MAR-2011 15.55.54 1 5 40 01-MAR-2011 15.51.14 1 5 50 01-MAR-2011 16.02.19 1 5 20 01-MAR-2011 16.18.09 1 5 10 01-MAR-2011 16.30.21 1 5 30 01-MAR-2011 16.47.58

Times for 5 buses stopping at 5 stops on route 1

Page 70: Analytics ioug 2011

70Carl Dudley University of Wolverhampton, UK

Journey Times of Buses Between StopsJourney Times of Buses Between Stops

SELECT route ,stop ,bus ,TO_CHAR(bustime,'dd/mm/yy hh24:mi:ss') bus_stop_time ,TO_CHAR(LAG(bustime,1) OVER (PARTITION BY bus ORDER BY route,stop,bustime) ,'dd/mm/yy hh24:mi:ss') prev_bus_stop_time ,SUBSTR(NUMTODSINTERVAL(bustime - LAG(bustime,1) OVER (PARTITION BY bus ORDER BY route,stop,bustime),'DAY'),12,8) time_between_stops ,SUBSTR(NUMTODSINTERVAL(bustime - FIRST_VALUE(bustime) OVER (PARTITION BY bus ORDER BY route,stop,bustime),'DAY'),12,8) jrny_timeFROM bustimes;

Page 71: Analytics ioug 2011

71Carl Dudley University of Wolverhampton, UK

Journey Times of Buses Between Stops (cont'd)Journey Times of Buses Between Stops (cont'd)

ROUTE STOP BUS BUS_STOP_TIME PREV_BUS_STOP_TIM TIME_BET JRNY_TIM----- ---- --- ----------------- ----------------- -------- -------- 1 1 10 01/03/11 12:17:33 00:00:00 1 2 10 01/03/11 12:56:19 01/03/11 12:17:33 00:38:46 00:38:46 1 3 10 01/03/11 13:58:53 01/03/11 12:56:19 01:02:34 01:41:20 1 4 10 01/03/11 14:17:33 01/03/11 13:58:53 00:18:40 02:00:00 1 5 10 01/03/11 16:30:21 01/03/11 14:17:33 02:12:48 04:12:48 1 1 20 01/03/11 13:58:41 00:00:00 1 2 20 01/03/11 14:31:04 01/03/11 13:58:41 00:32:23 00:32:23 1 3 20 01/03/11 14:58:41 01/03/11 14:31:04 00:27:37 01:00:00 1 4 20 01/03/11 15:42:25 01/03/11 14:58:41 00:43:44 01:43:44 1 5 20 01/03/11 16:18:09 01/03/11 15:42:25 00:35:44 02:19:28 1 1 30 01/03/11 12:58:10 00:00:00 1 2 30 01/03/11 13:00:09 01/03/11 12:58:10 00:01:59 00:01:59 1 3 30 01/03/11 15:28:33 01/03/11 13:00:09 02:28:24 02:30:23 1 4 30 01/03/11 15:30:30 01/03/11 15:28:33 00:01:57 02:32:20 1 5 30 01/03/11 16:47:58 01/03/11 15:30:30 01:17:28 03:49:48 1 1 40 01/03/11 14:06:13 00:00:00 1 2 40 01/03/11 14:20:45 01/03/11 14:06:13 00:14:32 00:14:32 1 3 40 01/03/11 14:35:58 01/03/11 14:20:45 00:15:13 00:29:45 1 4 40 01/03/11 15:11:26 01/03/11 14:35:58 00:35:28 01:05:13 1 5 40 01/03/11 15:51:14 01/03/11 15:11:26 00:39:48 01:45:01 1 1 50 01/03/11 14:11:45 00:00:00 1 2 50 01/03/11 14:24:01 01/03/11 14:11:45 00:12:16 00:12:16 1 3 50 01/03/11 15:18:09 01/03/11 14:24:01 00:54:08 01:06:24 1 4 50 01/03/11 15:55:54 01/03/11 15:18:09 00:37:45 01:44:09 1 5 50 01/03/11 16:02:19 01/03/11 15:55:54 00:06:25 01:50:34

Page 72: Analytics ioug 2011

72Carl Dudley University of Wolverhampton, UK

Average Wait Times for a BusAverage Wait Times for a Bus

SELECT v.route ,v.stop ,v.bus ,v.bustime ,v.prev_bus_time ,SUBSTR(NUMTODSINTERVAL(v.numgap,'DAY'),12,8) wait_for_next_bus ,CASE WHEN bustime = FIRST_VALUE(bustime) OVER (PARTITION BY stop ORDER BY route,stop,bustime) THEN SUBSTR(NUMTODSINTERVAL(AVG(v.numgap) OVER (PARTITION BY stop),'DAY'),12,8) ELSE NULL END ave_waitFROM (SELECT route ,stop ,bus ,TO_CHAR(bustime,'dd/mm/yy hh24:mi:ss') bustime ,TO_CHAR(LAG(bustime,1) OVER (PARTITION BY stop ORDER BY route,stop,bustime) ,'dd/mm/yy hh24:mi:ss') prev_bus_time ,bustime - LAG(bustime,1) OVER (PARTITION BY stop ORDER BY route,stop,bustime) numgap FROM bustimes) v;

Page 73: Analytics ioug 2011

73Carl Dudley University of Wolverhampton, UK

Average Waiting Times for a Bus (continued)Average Waiting Times for a Bus (continued)

ROUTE STOP BUS BUSTIME PREV_BUS_TIME WAIT_FOR AVE_WAIT ----- ---- --- ------------------ ----------------- -------- -------- 1 1 10 01/03/11 12:17:33 00:28:33 1 1 30 01/03/11 12:58:10 01/03/11 12:17:33 00:40:37 1 1 20 01/03/11 13:58:41 01/03/11 12:58:10 01:00:31 1 1 40 01/03/11 14:06:13 01/03/11 13:58:41 00:07:32 1 1 50 01/03/11 14:11:45 01/03/11 14:06:13 00:05:32 1 2 10 01/03/11 12:56:19 00:23:41 1 2 30 01/03/11 13:00:09 01/03/11 12:56:19 00:03:50 1 2 40 01/03/11 14:20:45 01/03/11 13:00:09 01:20:36 1 2 50 01/03/11 14:24:01 01/03/11 14:20:45 00:03:16 1 2 20 01/03/11 14:31:04 01/03/11 14:24:01 00:07:03 1 3 10 01/03/11 13:58:53 00:22:25 1 3 40 01/03/11 14:35:58 01/03/11 13:58:53 00:37:05 1 3 20 01/03/11 14:58:41 01/03/11 14:35:58 00:22:43 1 3 50 01/03/11 15:18:09 01/03/11 14:58:41 00:19:28 1 3 30 01/03/11 15:28:33 01/03/11 15:18:09 00:10:24 1 4 10 01/03/11 14:17:33 00:24:35 1 4 40 01/03/11 15:11:26 01/03/11 14:17:33 00:53:53 1 4 30 01/03/11 15:30:30 01/03/11 15:11:26 00:19:04 1 4 20 01/03/11 15:42:25 01/03/11 15:30:30 00:11:55 1 4 50 01/03/11 15:55:54 01/03/11 15:42:25 00:13:29 1 5 40 01/03/11 15:51:14 00:14:11 1 5 50 01/03/11 16:02:19 01/03/11 15:51:14 00:11:05 1 5 20 01/03/11 16:18:09 01/03/11 16:02:19 00:15:50 1 5 10 01/03/11 16:30:21 01/03/11 16:18:09 00:12:12 1 5 30 01/03/11 16:47:58 01/03/11 16:30:21 00:17:37

Page 74: Analytics ioug 2011

74Carl Dudley University of Wolverhampton, UK

Analytic FunctionsAnalytic Functions

Overview of Analytic Functions

Ranking Functions

Partitioning

Aggregate Functions

Sliding Windows

Row Comparison Functions

Analytic Function Performance

Page 75: Analytics ioug 2011

75Carl Dudley University of Wolverhampton, UK

Finding Holes in 'Sequences'Finding Holes in 'Sequences'

SELECT DISTINCT prod_id FROM sales ORDER BY prod_id;

PROD_ID------- : 46 47 48 113 114 115 :

SELECT prod_id ,next_prod_idFROM ( SELECT prod_id ,LEAD(prod_id) OVER(ORDER BY prod_id) next_prod_id FROM sales)WHERE next_prod_id - prod_id > 1;

PROD_ID NEXT_PROD_ID---------- ------------ 48 113

Sales table has 918843 rows— Gap in prod_ids from 48 to 113

Elapsed time : 3.17 secs

Page 76: Analytics ioug 2011

76Carl Dudley University of Wolverhampton, UK

Eliminating Duplicate rowsEliminating Duplicate rows

dup_emp table has 3670016 rows with unique empno values and no primary key

— dup_emp now has one extra duplicate row

Use conventional SQL to eliminate the duplicate row

INSERT INTO dup_emp SELECT * FROM dup_emp WHERE empno = 1;

DELETE FROM dup_emp y WHERE ROWID <>(SELECT MAX(ROWID)FROM dup_emp WHERE y.empno = empno);1 row deleted.Elapsed: 00:01:38.76

------------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------------- | 0 | DELETE STATEMENT | | 3670K| | 1 | DELETE | DUP_EMP | | |* 2 | HASH JOIN | | 3670K| | 3 | VIEW | VW_SQ_1 | 3670K| | 4 | SORT GROUP BY | | 3670K| | 5 | TABLE ACCESS FULL| DUP_EMP | 3670K| | 6 | TABLE ACCESS FULL | DUP_EMP | 3670K| -------------------------------------------------

Page 77: Analytics ioug 2011

77Carl Dudley University of Wolverhampton, UK

Eliminating Duplicate rows (continued)Eliminating Duplicate rows (continued)

Use the ranking function to efficiently eliminate the same duplicate row— ORDER BY clause is necessary so NULL is used as a dummy

DELETE FROM dup_emp WHERE ROWID IN (SELECT rid FROM (SELECT ROWID rid ,ROW_NUMBER() OVER (PARTITION BY empno ORDER BY NULL) rnk FROM dup_emp) WHERE rnk > 1);1 row deleted.Elapsed: 00:00:19.61

---------------------------------------------------------| Id | Operation | Name | Rows |---------------------------------------------------------| 0 | DELETE STATEMENT | | 1 || 1 | DELETE | DUP_EMP | || 2 | NESTED LOOPS | | 1 || 3 | VIEW | VW_NSO_1 | 3670K|| 4 | SORT UNIQUE | | 1 ||* 5 | VIEW | | 3670K|| 6 | WINDOW SORT | | 3670K|| 7 | TABLE ACCESS FULL | DUP_EMP | 3670K|| 8 | TABLE ACCESS BY USER ROWID| DUP_EMP | 1 |

Similar story with index on empno

Page 78: Analytics ioug 2011

78Carl Dudley University of Wolverhampton, UK

Analytic Function PerformanceAnalytic Function Performance

Example based on sales table in sh schema — 918843 rows, 72 different prod_ids

PROD_ID CUST_ID TIME_ID CHANNEL_ID PROMO_ID QUANTITY_SOLD AMOUNT_SOLD------- ---------- --------- ---------- ---------- ------------- ----------- 46 11702 15-FEB-98 3 999 1 24.92 125 942 27-MAR-98 3 999 1 16.86 46 6406 17-JUL-98 2 999 1 24.83 127 4080 11-SEP-98 3 999 1 38.14 14 19810 20-JUL-98 3 999 1 1257.35 123 3076 24-OCT-98 3 999 1 64.38 48 11403 28-OCT-98 2 999 1 12.95 148 6453 27-MAR-99 2 999 1 20.25 119 609 27-NOV-99 4 999 1 6.54 30 4836 13-DEC-99 2 999 1 10.15 31 1698 17-FEB-00 3 999 1 9.47 119 22354 09-FEB-00 2 999 1 7.75 114 6609 01-JUN-00 3 999 1 21.06 21 8539 28-AUG-00 3 999 1 1097.9 143 11073 14-JAN-01 3 999 1 21.59 119 2234 18-FEB-01 3 999 1 7.51 43 488 25-JUN-01 3 999 1 47.63 27 1577 17-SEP-01 4 999 1 46.16 : : : : : : :

Page 79: Analytics ioug 2011

79Carl Dudley University of Wolverhampton, UK

Analytic Function Performance - ScenarioAnalytic Function Performance - Scenario

Number of times products are on order

SELECT prod_id ,COUNT(*) FROM sh.sales GROUP BY prod_id;

PROD_ID COUNT(*)------- ---------- 22 3441 25 19557 30 29282 34 13043 42 12116 43 8340 123 139 129 7557 138 5541 13 6002 28 16796 116 17389 120 19403 : :

Page 80: Analytics ioug 2011

80Carl Dudley University of Wolverhampton, UK

nth Best Product – "Conventional" SQL Solutionnth Best Product – "Conventional" SQL Solution

Find nth ranked product in terms of numbers of orders for each product

PROD_ID YCNT------- ---------- 33 22768

Elapsed: 00:00:24.09

SELECT prod_id ,ycnt FROM (SELECT prod_id ,COUNT(*) ycnt FROM sh.sales y GROUP BY prod_id) vWHERE &position - 1 = (SELECT COUNT(*) FROM (SELECT COUNT(*) zcnt FROM sh.sales z GROUP BY prod_id) w WHERE w.zcnt > v.ycnt);

5

Page 81: Analytics ioug 2011

81Carl Dudley University of Wolverhampton, UK

"Conventional" SQL Solution - Trace"Conventional" SQL Solution - Trace

----------------------------------------------------------------------------| Id | Operation | Name | Rows | Cost |----------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 72 | 134||* 1 | FILTER | | | || 2 | VIEW | | 72 | 67|| 3 | HASH GROUP BY | | 72 | 67|| 4 | PARTITION RANGE ALL | | 918K| 29|| 5 | BITMAP CONVERSION COUNT | | 918K| 29|| 6 | BITMAP INDEX FAST FULL SCAN | SALES_PROD_BIX | | || 7 | SORT AGGREGATE | | 1 | || 8 | VIEW | | 4 | 67||* 9 | FILTER | | | || 10 | SORT GROUP BY | | 4 | 67|| 11 | PARTITION RANGE ALL | | 918K| 29|| 12 | BITMAP CONVERSION TO ROWIDS | | 918K| 29|| 13 | BITMAP INDEX FAST FULL SCAN| SALES_PROD_BIX | | |----------------------------------------------------------------------------

Predicate Information (identified by operation id):---------------------------------------------------

1 - filter( (SELECT COUNT(*) FROM (SELECT COUNT(*) "ZCNT" FROM "SH"."SALES" "Z" GROUP BY "PROD_ID" HAVING COUNT(*)>:B1) "W")=4) 9 - filter(COUNT(*)>:B1)

Statistics----------------------------------------------------------8468consistent gets 72 sorts (memory)

Page 82: Analytics ioug 2011

82Carl Dudley University of Wolverhampton, UK

nth Best Product – "Failed" SQL Solution nth Best Product – "Failed" SQL Solution

Find nth ranked product in terms of numbers of orders for each product

SELECT prod_id ,ycnt FROM (SELECT prod_id ,COUNT(*) ycnt FROM sh.sales y GROUP BY prod_id) vWHERE &position - 1 = (SELECT COUNT(*) FROM (SELECT ycnt FROM v) w WHERE w.ycnt > v.ycnt);

*ERROR at line 8:ORA-04044: procedure, function, package, or type is not allowed here

Page 83: Analytics ioug 2011

83Carl Dudley University of Wolverhampton, UK

nth Best Product – Factored Subquery Solutionnth Best Product – Factored Subquery Solution

Find nth ranked product in terms of numbers of orders for each product

WITH v AS (SELECT prod_id ,COUNT(*) ycnt FROM sh.sales y GROUP BY prod_id)SELECT prod_id ,ycnt FROM vWHERE &position - 1 = (SELECT COUNT(*) FROM (SELECT ycnt FROM v) w WHERE w.ycnt > v.ycnt);

PROD_ID YCNT------- ---------- 33 22768

Elapsed: 00:00:00.07

5

Page 84: Analytics ioug 2011

84Carl Dudley University of Wolverhampton, UK

Factored Subquery Solution - TraceFactored Subquery Solution - Trace

---------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Cost |---------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 71 || 1 | TEMP TABLE TRANSFORMATION | | | || 2 | LOAD AS SELECT | | | || 3 | HASH GROUP BY | | 72 | 67 || 4 | PARTITION RANGE ALL | | 918K| 29 || 5 | BITMAP CONVERSION COUNT | | 918K| 29 || 6 | BITMAP INDEX FAST FULL SCAN| SALES_PROD_BIX | | ||* 7 | FILTER | | | || 8 | VIEW | | 72 | 2 || 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D661A_14D8441 | 72 | 2 || 10 | SORT AGGREGATE | | 1 | ||* 11 | VIEW | | 72 | 2 || 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D661A_14D8441 | 72 | 2 |---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):---------------------------------------------------

7 - filter( (SELECT COUNT(*) FROM (SELECT /*+ CACHE_TEMP_TABLE ("T1") */ "C0" "PROD_ID","C1" "YCNT"

"SYS"."SYS_TEMP_0FD9D661A_14D8441" "T1") "V" WHERE "YCNT">:B1)=4) 11 - filter("YCNT">:B1)

Statistics----------------------------------------------------------355 consistent gets 0 sorts (memory)

Page 85: Analytics ioug 2011

85Carl Dudley University of Wolverhampton, UK

nth Best Product – Analytic Function Solutionnth Best Product – Analytic Function Solution

Find nth ranked product in terms of numbers of orders for each product

SELECT prod_id ,vcnt FROM (SELECT prod_id ,vcnt ,RANK() OVER (ORDER BY vcnt DESC) rnk FROM (SELECT prod_id ,COUNT(*) vcnt FROM sh.sales z GROUP BY z.prod_id)) qryWHERE qry.rnk = &position;

PROD_ID YCNT------- ---------- 33 22768

Elapsed: 00:00:00.01

5

Page 86: Analytics ioug 2011

86Carl Dudley University of Wolverhampton, UK

Analytic Function Solution - TraceAnalytic Function Solution - Trace

--------------------------------------------------------------------------| Id | Operation | Name | Rows | Cost |--------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 72 | 105||* 1 | VIEW | | 72 | 105||* 2 | WINDOW SORT PUSHED RANK | | 72 | 105|| 3 | HASH GROUP BY | | 72 | 105|| 4 | PARTITION RANGE ALL | | 918K| 29|| 5 | BITMAP CONVERSION COUNT | | 918K| 29|| 6 | BITMAP INDEX FAST FULL SCAN| SALES_PROD_BIX | | |--------------------------------------------------------------------------

Predicate Information (identified by operation id):---------------------------------------------------

1 - filter("QRY"."RNK"=5) 2 - filter(RANK() OVER ( ORDER BY COUNT(*) DESC )<=5)

Statistics----------------------------------------------------------116 consistent gets 1 sorts (memory)

Page 87: Analytics ioug 2011

87Carl Dudley University of Wolverhampton, UK

Analytic Function PerformanceAnalytic Function Performance

Defining the PARTITION BY and ORDER BY clauses on indexed columns will provide optimum performance— For example, a composite index on (deptno, hiredate) columns will

prove effective

Analytic functions still provide acceptable performance in absence of indexes but need to do sorting for computing based on partition and order by columns— If the query contains multiple analytic functions, sorting and partitioning on

two different columns should be avoided if they are both not indexed

Page 88: Analytics ioug 2011

88Carl Dudley University of Wolverhampton, UK

PerformancePerformance

Hiding analytics in views can prevent the use of indexes— SUM(sal) has to be computed across all rows before the analysis CREATE OR REPLACE VIEW vv AS SELECT *, SUM(sal) OVER (PARTITION BY deptno) Deptno_Sum_Sal FROM emp;

SELECT * FROM vv WHERE empno = 7900;

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DEPTNO_SUM_SAL----- ----- ----- ---- --------- ---- ---- ------ -------------- 7900 JAMES CLERK 7698 03-DEC-81 950 30 9400--------------------------------------------| Id | Operation | Name | Rows |--------------------------------------------| 0 | SELECT STATEMENT | | 14 ||* 1 | VIEW | VV | 14 || 2 | WINDOW SORT | | 14 || 3 | TABLE ACCESS FULL| EMP | 14 |--------------------------------------------

SELECT * FROM emp WHERE empno = 7900;------------------------------------------------------------| Id | Operation | Name | Rows |------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 || 1 | TABLE ACCESS BY INDEX ROWID| EMP | 1 ||* 2 | INDEX UNIQUE SCAN | SYS_C0017750 | 1 |------------------------------------------------------------

Page 89: Analytics ioug 2011

89Carl Dudley University of Wolverhampton, UK

SELECT empno, ename, sal, deptno ,SUM(sal) OVER (PARTITION BY deptno ORDER BY sal) sumsal FROM emp ORDER BY deptno, sal;

EMPNO ENAME SAL DEPTNO SUMSAL---------- ---------- ---------- ---------- ---------- 7934 MILLER 1300 10 1300 7782 CLARK 2450 10 3750 7839 KING 5000 10 8750 7369 SMITH 800 20 800 7876 ADAMS 1100 20 1900 7566 JONES 2975 20 4875 7788 SCOTT 3000 20 10875 7902 FORD 3000 20 10875 7900 JAMES 950 30 950 7654 MARTIN 1250 30 3450 7521 WARD 1250 30 3450 7844 TURNER 1500 30 4950 7499 ALLEN 1600 30 6550 7698 BLAKE 2850 30 9400

Default window is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Steamy WindowsSteamy Windows

Page 90: Analytics ioug 2011

90Carl Dudley University of Wolverhampton, UK

SELECT empno, ename, sal, deptno ,SUM(sal) OVER (PARTITION BY deptno ORDER BY sal) sumsalFROM emp WHERE ename LIKE '%M%' ORDER BY deptno ,sal

EMPNO ENAME SAL DEPTNO SUMSAL---------- ---------- ---------- ---------- ---------- 7934 MILLER 1300 10 1300 7369 SMITH 800 20 800 7876 ADAMS 1100 20 1900 7900 JAMES 950 30 950 7654 MARTIN 1250 30 2200

SELECT * FROM (SELECT empno, ename, sal, deptno ,SUM(sal) OVER (PARTITION BY deptno ORDER BY sal) sumsal FROM emp ) WHERE ename LIKE '%M%' ORDER BY deptno ,sal;

EMPNO ENAME SAL DEPTNO SUMSAL---------- ---------- ---------- ---------- ---------- 7934 MILLER 1300 10 1300 7369 SMITH 800 20 800 7876 ADAMS 1100 20 1900 7900 JAMES 950 30 950 7654 MARTIN 1250 30 3450

Steamy Windows (continued)Steamy Windows (continued)

Includes WARD who is in department 30 and has a salary of 1250. which is within the RANGE with MARTIN

Page 91: Analytics ioug 2011

91Carl Dudley University of Wolverhampton, UK

In the Final AnalysisIn the Final Analysis

So we have discussed

The ranking of data using analytic functions

Partitioning datasets from queries

Using aggregate functions in analytic scenarios

How to apply sliding windows to query results

Comparing values across rows

Performance characteristics

Page 92: Analytics ioug 2011

92Carl Dudley University of Wolverhampton, UK

Analytic Functions Analytic Functions

Carl DudleyUniversity of Wolverhampton, UK

UKOUG CouncilOracle ACE Director

[email protected]