Kenneth Naim Consultant Session #303

50
Kenneth Naim Consultant Session #303 Powerful and Efficient SQL Features (that are rarely used)

description

Powerful and Efficient SQL Features (that are rarely used). Kenneth Naim Consultant Session #303. Speaker Qualifications. Kenneth Naim – Oracle Database / Data Warehouse Consultant. - PowerPoint PPT Presentation

Transcript of Kenneth Naim Consultant Session #303

Page 1: Kenneth Naim Consultant Session #303

Kenneth NaimConsultant

Session #303

Powerful and Efficient SQL Features

(that are rarely used)

Page 2: Kenneth Naim Consultant Session #303

Speaker Qualifications

Kenneth Naim – Oracle Database / Data Warehouse Consultant

• Architected and coded the ETL for several data warehouses for insurance and points processing companies.

• Wrote proprietary batch (sub-ledger) interfaces for financial applications.

• Tuned dozens of batch & OLTP processes by writing better and more creative sql.

Page 3: Kenneth Naim Consultant Session #303

Creative problem solving

• Creative solutions are created when tools, knowledge and imagination are combined.

Page 4: Kenneth Naim Consultant Session #303

SQL Features

• Ordering of nulls• Regular Expressions• Random Number Generator (dbms_random)• Case Statements• Nullif• Coalesce• Analytical Functions (Ntile, Row_Number,

Rank/Drank, Lead/Lag etc.)• Sub-queries

Page 5: Kenneth Naim Consultant Session #303

SQL Features Cont’d

• Inline Views• Order By clause within an Inline View• Merge Statements• Updateable Views• Full Outer Joins• Partitioned Outer Joins• Set Operators (Union, Minus, Intersect)• Hierarchical Queries (Connect By)

Page 6: Kenneth Naim Consultant Session #303

SQL Features Cont’d

• Global Temporary Tables • External Tables• Model Clause• Multi-table inserts• Hints

Page 7: Kenneth Naim Consultant Session #303

Standard SQL Clauses

select to_char(sum(b.sal),'9,9999.00'), dname

from dept a,

emp b

where a.deptno=b.deptno

and loc!='BOSTON'

group by a.dname

having sum(b.sal)>9000order by dname;

Page 8: Kenneth Naim Consultant Session #303

Ordering of Nulls (8i)

• Null sort as the last/highest value when sorting in ascending order

• Prior to Oracle 8i changing how Nulls sorted required use of nvl and/or decode functions.

• Nulls can now be explicitly set to sort first or last by specifying “nulls first|last”

select * from scott.emporder by comm nulls first;

Page 9: Kenneth Naim Consultant Session #303

Regular Expressions

Four regular expression functions exist in SQL and work in a similar manor to their non regular expression counterpart.• regexp_like• regexp_instr• regexp_substr• regexp_replace

Page 10: Kenneth Naim Consultant Session #303

Regular Expressions – Regexp_like

Regular expression based (regex) functions significantly simplified the task of data validation and cleansing. Identifying zip codes or phone numbers containing non-digits for example. Regexp_like differs from the like operator as the pattern is specified within function.

1. replace(

translate(postal_code,'1234567890~', '~~~~~~~~~~x'),

'~',null) is not null

2. regexp_like(postal_code,'[^[:digit:]]')

Page 11: Kenneth Naim Consultant Session #303

Regular Expressions – Cont’d

update hr.locations

set postal_code =

regexp_replace(postal_code,'[^[:digit:]]',null)where regexp_like(postal_code,'[^[:digit:]]')

Page 12: Kenneth Naim Consultant Session #303

Dbms_random – Random number generator

Typically data sets are ordered explicitly or implicitly however there are times when an explicit random ordering can be beneficial.

• Content or ad selection for website display.• Picking winners from a list of entries for a

promotion or contest.• Control and experimental group selection for

marketing materials.

Page 13: Kenneth Naim Consultant Session #303

Dbms_random – Content Rotation

• To prevent the same content from being displayed for customers that returned to a website multiple times per day dbms_random was used to provide a “fresh” list.

select content_id, ...

from tables...

where conditions...

order by prioriry asc, dbms_random.random;

Page 14: Kenneth Naim Consultant Session #303

Dbms_random – Contest winner Selection

To pick a set of winner for various promotions we used a query that returned all eligible entries and sorted them randomly. We then picked the first n number of entries required as winners and alternates.

insert into entries (entrant_id, entry_number)

select entrant_id, rownum

from (select member_id entrant_id

from tables...

where conditions....

order by dbms_random.random);

Page 15: Kenneth Naim Consultant Session #303

Dbms_random – Control Group Selection

To test the effectiveness of different marketing materials members needed to be randomly separated into four equally sized groups.

1. trunc(dbms_random.value(1,5))

• Produced the correct group numbers however the sets sizes were not exactly equal as the distribution was truly random.

2. ntile(4) over (order by dbms_random.random)

• Ntile, an analytical function breaks apart sets into smaller set differing in count by at most one.

Page 16: Kenneth Naim Consultant Session #303

Case Statements (8i)

• Case Statements within sql statements allow the incorporation of If/Then/Else logic into SQL that normally would be written in a procedural language.

select a.*, case when comm is null or comm=0 then 'None' when comm between 300 and 499 then 'Basic' when comm between 500 and 1199 and deptno=20 then 'Mid-Level' when comm> 1200 then 'High' else 'Unknown' end commfrom scott.emp a;

Page 17: Kenneth Naim Consultant Session #303

Case Statements Cont’d

• Criteria within each when statement can be completely different.

• Can be used to filter data in the Select clause allowing the combination of two similar queries with differing filters to be executed in one pass rather than two or more.

select sum(case when dname ='ACCOUNTING' then sal else null end) tot_acct_sal,

sum(case when loc ='DALLAS' then sal else null end) tot_dallas_sal

from scott.emp a, scott.dept b where a.deptno=b.deptno;

Page 18: Kenneth Naim Consultant Session #303

Specialized Case Statements

• Coalescecoalesce(primary_rate_code, secondary_rate_code, tertiary_rate_code)case when expr1 is not null then expr1 when expr2 is not null then expr2 when expr3 is not null then expr3 else null end

• Nullifnullif(expr1,expr2)

case when expr1 = expr2 then null else expr1 end

Page 19: Kenneth Naim Consultant Session #303

Analytical Functions (8.1.6)

– Allow each row to access data present in the entire set or specified subset.

– Process the result set after the Where, Group by, and Having clauses have been applied but prior to the Order by clause.

– Are permitted in the Select, Group by and Order by clauses.– Can nest and be in nested in regular functions but cannot be

nested in/by another analytical function. Inline views must be used for this purpose as well as filtering on them.

Page 20: Kenneth Naim Consultant Session #303

Analytical Functions

– Can be defined to process the entire set or a sub-set by using the Partition By clause

– Are useful for • Aggregating data within a subset or a moving window (50-

day moving average).• Ranking data within a sub-set (rank of members by points

earned within each metro area).• Comparing data from the current row against a previous

and following row (days between current & previous sale) eliminating the need for a self join.

• Comparing data from the current row against an aggregate (each policies premium as a percent of all premium collected).

Page 21: Kenneth Naim Consultant Session #303

Analytical Functions Cont’d

lgl_enty_id

policy_id

eff_dt end_dt rngrp group_policy_i

d

1 456 1/1/2000 12/31/2000

1 1 456

1 456 1/1/2001 12/31/2001

1 456

1 640 2/1/2001 1/31/2003 3 3 640

1 720 2/15/2003 2/14/2004 4 4 720

1 810 2/15/2004 2/14/2005 4 720

2 310 6/1/2002 5/31/2003 1 1 310

2 367 7/1/2003 6/30/2004 2 2 367

2 367 7/1/2004 6/30/2005 2 367

2 975 7/1/2005 6/30/2006 2 367

Page 22: Kenneth Naim Consultant Session #303

Analytical Functions Cont’d

case when row_number () over (partition by lgl_enty_id order by eff_dt)=1 then 1 when eff_dt-lag(end_dt, 1, end_dt) over (partition by lgl_enty_id order by eff_dt) >1 then row_number () over (partition by lgl_enty_id order by eff_dt) --lapsedelse null -- current policy end rn

max(rn) over (partition by lgl_enty_id order by eff_dt) grp

first_value(policy_id) over (partition by lgl_enty_id, grp order by plcy_prd_eff_dt) group_policy_id

Page 23: Kenneth Naim Consultant Session #303

Sub-queries

Sub-queries are nested select statements that can be used anywhere a literal can be used except– Default values of a column– Returning clause– Function based index expressions– Check constraints– When conditions of triggers– Group By, Having and Connect By clauses

Page 24: Kenneth Naim Consultant Session #303

Sub-queries

• Can be nested 255 levels deep• Must be scalar, return only one finite value when used

with operators that can only accept one value (=, >, <, +, -, *, /, like etc.)

• Are most commonly used an In clause within a dml statement and can return multiple values.

select policy_id, premium_period_eff_dt, premium_period_end_dtfrom premium_periodswhere (policy_id, policy_period_eff_dt) in (select policy_id, policy_period_eff_dt from policy_periods where premium_type='a');

Page 25: Kenneth Naim Consultant Session #303

With Clause – Sub-query Factoring (9ir2)

Sub-query factoring benefits• Materializes the sub-query so the result set can be reused

without multiple executions of the sub-query• Eliminates sub-query repetition enhancing code

maintainability.• Shortens complex code increasing readability.with sub_query_alias_1 as (first select statement goes here), sub_query_alias_2 as (second select statement goes here), sub_query_alias_3 as (third select statement goes here)select * from table_1 a, sub_query_alias_1 b, sub_query_alias_2 c where a.col1=b.col1 and b.col2 =c.col2 and b.col3 in (select d.col3 from sub_query_alias_3)

Page 26: Kenneth Naim Consultant Session #303

With Clause Example

with q as (select policy_period_key from policies a, calendar_premium b where a.policy_key= b.policy_key and policy_period_start_date between to_date('1-jan-2004') and to_date('1-jan-2007')-1/86400 and pl_team in ('A','B') group by policy_period_key having sum(premium) >=50000)select claim_number, nvl(total_incurred,0), nvl(total_paid,0)from (select claim_number, sum(incurred_loss_amount) total_incurred from claims a, incurred_losses b where a.claim_key=b.claim_key and policy_period_key in (select policy_period_key from q)) full outer join (select claim_number, sum(paid_loss_amount) total_paid from claims a, paid_losses b where a.claim_key=b.claim_key and policy_period_key in (select policy_period_key from q)) y using claim_number;

Page 27: Kenneth Naim Consultant Session #303

Inline Views

Inline views are• Sub-queries in the From clause of a select

statement or Using clause of the merge statement and are used in place of tables

• Useful to aggregate and/or filter data prior to being joined.

• Used to avoid joining issues such as fan and chasm traps when dealing with multiple fact style tables.

Page 28: Kenneth Naim Consultant Session #303

Order by in an inline view (8i)

• Pagination queries were very difficult to write prior to 8i and we routinely done on client side.

• Nesting an inline view with an order by reduces overhead on the sever and client side as only the required data needs to be processed and sent over the network.

select ename, sal

from (select ename, sal, rownum rn

from (select ename, sal from scott.emp order by sal))

where rn between 5 and 10;

Page 29: Kenneth Naim Consultant Session #303

Merge Statements – aka Upsert (9i)

• Allow the manipulation of data residing in one table based upon whether data exists in another set (table/ view/ in-line view).

• Data could only be updated if matching data existed and insert if the matching data did not exist.(9i)

• Almost any combination of inserts, updates, and deletes are allowed. (10g)

• The When Matched and When Not Matched clauses are optional.(10g)

• A Where clause can be applied to both match clauses.(10g)

Page 30: Kenneth Naim Consultant Session #303

Merge Statements Cont’d

merge into policy_lists xusing (select a.plcy_id, a.plcy_prd_eff_dt, a.prem_typ_cd

from policy_period_premium_drv a where a.prem_typ_cd='a' and a.rt_elem_typ_cd='eap') y

on (x.plcy_id=y.plcy_id and x.plcy_prd_eff_dt=y.plcy_prd_eff_dt)when matched then update set x.prem_typ_cd=y.prem_typ_cdwhen not matched then insert (plcy_id, plcy_prd_eff_dt, prem_typ_cd)

values (y.plcy_id, y.plcy_prd_eff_dt, y.prem_typ_cd) where (plcy_prd_eff_dt>’15-may-2003’ );

Page 31: Kenneth Naim Consultant Session #303

Updateable Views

Defined and inline views can be updated as long as only data from one base table is updated and the view does not contain any of the following clauses.– Aggregate Functions (SUM, AVG, COUNT, MAX, MIN)

– Set Operators – Distinct– Group by – Order by– Connect by

Page 32: Kenneth Naim Consultant Session #303

Updateable Views Cont’d

update (

select empno, a.deptno, dname, comm

from scott.emp a, scott.dept b

where a.deptno=b.deptno)

set comm=comm * 1.1where dname='SALES';

Page 33: Kenneth Naim Consultant Session #303

Full Outer Joins (9i)

Policy Id

Category

Balance

Policy Id

Category

Txn Amount

1 A 30.00 1 A 20.00

1 B 45.00 1 C 75.00

1 C 75.00 1 D 65.00

1 E 28.00 1 F 25.00

1 F 16.00 1 G 12.00

Set A Set B

Page 34: Kenneth Naim Consultant Session #303

Full Outer Joins Cont’d

Result Set

Policy Id

Category

Balance

Txn Amount New Txn Amount

1 A 30.00 20.00 10.00

1 B 45.00 45.00

1 D 65.00 -65.00

1 E 28.00 28.00

1 F 16.00 25.00 -9.00

1 G 12.00 -12.00

Page 35: Kenneth Naim Consultant Session #303

Full Outer Joins Cont’dwith q as (select policy_id from policy_list where change_date between trunc(sysdate-1) and trunc(sysdate)-1/86400)select policy_id, category, a.balance, b.transaction_amount, nvl(a.balance,0)- nvl(b.transaction_amount,0) new_transaction_amountfrom (select policy_id, category, balance from source_tables where policy_id in (select policy_id from q)) a full outer join (select policy_id, category, sum(transaction_amount) transaction_amount from warehouse_tables where policy_id in (select policy_id from q)) b using (policy_id, category)where nvl(balance,0)- nvl(transaction_amount,0) <>0;

Page 36: Kenneth Naim Consultant Session #303

Partitioned Outer Joins

Product

Sales Quarter

Total Quantity Sold

13 2005Q1 25

14 2005Q4 17

14 2005Q3 44

13 2005Q4 36

13 2005Q3 22

Product Sales Quarter

Total Quantity Sold

13 2005Q1 25

13 2005Q2 0

13 2005Q3 22

13 2005Q4 36

14 2005Q1 0

14 2005Q2 0

14 2005Q3 44

14 2005Q4 17

Page 37: Kenneth Naim Consultant Session #303

Partitioned Outer Joins Cont’d

• Cartesion Product Versionselect product, sales_quarter, nvl(total_quantity_sold,0) total_quantity_soldfrom (select product, sales_quarter from (select year_quarter sales_quarter from days where end_of_quarter='Yes' and year=2005) a, (select product from products where product in (13,14)) b) x, (select product, sales_quarter, sum(quantity_sold) total_quantity_sold from sales group by product, sales_quarter) y where x.product=y.product(+)and x.sales_quarter=y.sales_quarter (+)

Page 38: Kenneth Naim Consultant Session #303

Partitioned Outer Joins Cont’d

• Partitioned Outer Join Versionselect a.product, b.sales_quarter, nvl(total_quantity_sold,0) total_quantity_sold

from (select product, sales_quarter, sum(quantity_sold) total_quantity_sold

from sales group by product, sales_quarter) b partiton by (product) right outer join

(select year_quarter sales_quarter from days where end_of_quarter='Yes' and year=2005) a on b.sales_quarter=a.sales_quarter;

Page 39: Kenneth Naim Consultant Session #303

Hierarchical Queries (Connect By)

• Connect By allows the creation of tree relationships within sql.

• In 10g the connect by feature was enhanced to include:

– Connect_by_root – returns the root of the hierarchy for the current row.

– Connect_by_iscycle – flag indicating that the current record is part of an infinite loop ( A-B-C-A).

– Connect_by_isleaf – flag identifying that the current row does not have children rows.

– Nocycle – prevents the query from throwing an error if it identifies an infinite loop.

Page 40: Kenneth Naim Consultant Session #303

Hierarchical Queries Cont’d

Connect By allows the creation of tree relationships within sql.

select empno, ename, job, level, substr(sys_connect_by_path (ename, '>‘) ,2) path, connect_by_iscycle cyclefrom scott.emp astart with job = 'PRESIDENT'connect by nocycle prior empno = mgr;

Page 41: Kenneth Naim Consultant Session #303

Hierarchical Queries (Connect By)

empno ename job level path leaf cycle

7839 KING PRESIDENT 1 KING 0 0

7566 JONES MANAGER 2 KING>JONES 0 0

7788 SCOTT ANALYST 3 KING>JONES>SCOTT 0 0

7876 ADAMS CLERK 4 KING>JONES>SCOTT>ADAMS 1 0

7902 FORD ANALYST 3 KING>JONES>FORD 0 0

7369 SMITH CLERK 4 KING>JONES>FORD>SMITH 1 0

7698 BLAKE MANAGER 2 KING>BLAKE 0 0

7499 ALLEN SALESMAN 3 KING>BLAKE>ALLEN 1 0

7521 WARD SALESMAN 3 KING>BLAKE>WARD 1 0

7654 MARTIN SALESMAN 3 KING>BLAKE>MARTIN 1 0

7844 TURNER SALESMAN 3 KING>BLAKE>TURNER 1 0

7900 JAMES CLERK 3 KING>BLAKE>JAMES 1 0

7782 CLARK MANAGER 2 KING>CLARK 0 0

7934 MILLER CLERK 3 KING>CLARK>MILLER 1 0

Page 42: Kenneth Naim Consultant Session #303

Set Operations

• Union • Union all• Minus • Intersect

Page 43: Kenneth Naim Consultant Session #303

Set Operation example

select * from policies x, (select plcy_id, plcy_prd_eff_dt, 'e' prem_typ_cd from policy_coverage_premium a where plcy_prd_eff_dt <= c_start_of_day_to_process and prem_typ_cd = 'a' and empl_id_crea = 99

UNION select plcy_id, plcy_prd_eff_dt, 'e' prem_typ_cd from policy_premium_prd_prem_drv a where prem_typ_cd = 'e' and plcy_prd_eff_dt <= c_start_of_day_to_process and plcy_prem_prd_prem_drv_eff_dtm between

c_start_of_day_to_process and c_end_of_day_to_process) UNION select agre_id, b.plcy_prd_eff_dt, 'e' prem_typ_cd from la_agreement_rule_asgn_hist b where b.la_agre_rl_asgn_hist_crea_dtm between

c_start_of_day_to_process and c_end_of_day_to_process and plcy_prd_eff_dt <= c_start_of_day_to_process) y

where x.plcy_id=y.plcy_id and x.plcy_prd_eff_dt=y.plcy_prd_eff_dt and x.prem_typ_cd=y.prem_typ_cd;

Page 44: Kenneth Naim Consultant Session #303

Global Temporary Tables

• Have a persistent data dictionary definition.• Has table level options for data to only persist for the duration of a

session or a transaction.• Is stored in memory and uses temporary segments once it gets too

large.• Act like normal tables and can be queried, built into views, stored

procedures etc.• Can have indexes and constraints.• Useful for holding intermediate results that need to be accessed by

multiple queries that do not need persistent storage.

Page 45: Kenneth Naim Consultant Session #303

Global Temporary Tables Cont’d

• Data can only be been seen by the session that inserted it even after it has been committed.

• Export/import/Datapump will only extract and import the table definition and not any data.

• Data is cleaned up when the session ends gracefully or not.• Can speed up long running queries as they act like in-memory

arrays.• Can prevent poor execution plans for complex SQL statements by

breaking the process into smaller pieces but without the redo/undo overhead of normal table and are particularly useful for anti-joins.

Page 46: Kenneth Naim Consultant Session #303

External Tables

• SQL based interface for SQL*Loader• Have a persistent data dictionary definition with much of the same

information that is contained within a SQL*Loader control file and column definitions.

• Can be used for manipulating, aggregating, cleansing and filtering data prior without the redo/undo overhead prior to being inserted into normal tables.

• Cannot have a storage clause as the data is stored in a text file on operating system.

• Cannot have indexes and constraints.• Data cannot be inserted, deleted or updated.

Page 47: Kenneth Naim Consultant Session #303

Items Learned in this Session

• SQL has many powerful and under-utilized functions and features.

• Tools, Knowledge and Imagination are required for creative problem solving.

• Use of Case statements and Analytical functions can convert most painful procedural code into super fast efficient SQL statements.

Page 48: Kenneth Naim Consultant Session #303

Favorite Quotes

“I have no particular talent. I am merely inquisitive.”

“Imagination is more important than knowledge.”

“It’s not that I am smart, it’s just that I stay with problems longer.”

--Albert Einstein

Page 49: Kenneth Naim Consultant Session #303

Questions?

• Questions? Riddles? Comments?

Page 50: Kenneth Naim Consultant Session #303

Thank You

• Please complete your evaluation forms– Kenneth Naim– Powerful and Efficient SQL Features

Session #303– Email incase of additional questions

[email protected]