Olap Functions Suport in Informix
-
Upload
bingjie-miao -
Category
Data & Analytics
-
view
366 -
download
0
Transcript of Olap Functions Suport in Informix
OLAP Functions Support in Informix 12.1
Bingjie MiaoIBM
1
Agenda
• What is OLAP• OLAP functions in Informix– the OVER clause– supported OLAP functions
• Questions?
What is OLAP?• On-Line Analytical Processing• Commonly used in Business
Intelligence (BI) tools– ranking products, salesmen, items, etc– exposing trends in sales from historic data– testing business scenarios (forecast)– sales breakdown or aggregates on multiple
dimensions (Time, Region, Demographics, etc)
OLAP Functions in Informix• Supports subset of commonly used
OLAP functions• Enables more efficient query
processing from BI tools such as Cognos
Example query with group byselect customer_num, count(*)from orderswhere customer_num <= 110 group by customer_num;
customer_num (count(*))
101 1 104 4 106 2 110 2
4 row(s) retrieved.
Example query with OLAP functionselect customer_num, ship_date, ship_charge, count(*) over (partition by customer_num) from orders where customer_num <= 110;
customer_num ship_date ship_charge (count(*))
101 05/26/2008 $15.30 1 104 05/23/2008 $10.80 4 104 07/03/2008 $5.00 4 104 06/01/2008 $10.00 4 104 07/10/2008 $12.20 4 106 05/30/2008 $19.20 2 106 07/03/2008 $12.30 2 110 07/06/2008 $13.80 2 110 07/16/2008 $6.30 2
9 row(s) retrieved.
Where does OLAP function fit?
Joins, group by, having, aggregation
OLAP functions
Final order by
OLAP function as predicates• Use derived table query block to compute
OLAP function first
select * from (select customer_num, ship_date, ship_charge, count(*) over (partition by customer_num) as cnt from orders where customer_num <= 110)where cnt >= 3;
OLAP function example• Running 3-month average sales for a particular
product during a particular period
select product_name, avg(sales) over ( partition by region order by year, month rows between 1 preceding and 1 following )from total_saleswhere product_id = 105 and year between 2001 and 2010;
The over() Clauseolap_func(arg) over(partition by clause order by clause window frame clause)
• Defines the “domain” of OLAP function calculation– partition by: divide into partitions– order by: ordering within each partition– window frame: sliding window within each partition– all clauses optional
Partition Bysum(x) over (
partition by a, b order by c, d rows between 2 preceding and 2 following)
a=1, b=1
a=2, b=2
a=1, b=2
a=2, b=1
Order Bysum(x) over (
partition by a, b order by c, d rows between 2 preceding and 2 following)
partition a=1, b=2c=1,d=1c=1,d=2c=1,d=3c=2,d=2c=2,d=4c=3,d=1c=4,d=1c=4,d=2
Window Frame
c=1,d=1c=1,d=2c=1,d=3c=2,d=2c=2,d=4c=3,d=1c=4,d=1c=4,d=2
sum(x) over (partition by a, b order by c, d rows between 2 preceding and 2 following)
Partition By• Divide result set of query into partitions for
computing of an OLAP function• If partition by clause is not specified, then
entire result set is a single partition
max(salary) over (partition by dept_id)sum(sales) over (partition by region)avg(price) over ()
Order By• Ordering within each partition• Required for some OLAP functions
– ranking, window frame clause• Support ASC/DESC, NULLS FIRST/NULLS LAST
rank() over (partition by dept order by salary desc)dense_rank() over(order by total_sales nulls last)
Window Frame• Defines a sliding window within a partition• OLAP function value computed from rows in the
sliding window• Order by clause is required
Physical vs. Logical Window Frame• Physical window frame
– ROWS keyword– count offset by position– fixed window size– order by one or more column expressions
• Logical window frame– RANGE keyword– count offset by value– window size may vary– order by single column (numeric, date or datetime type)
Window Frame Examplesavg(price) over (order by year, day rows between 6 preceding and current row)count(*) over (order by ship_date range between 2 preceding and 2 following)
• Current row can be physically outside the windowavg(sales) over (order by month range between 3 preceding and 1 preceding)sum(sales) over (order by month rows between 2 following and 5 following)
Order By – Special Semantics• “cumulative” semantics in absence of window
frame clause– for OLAP function that allows window frame clause– equivalent to “ROWS between unbounded preceding and
current row” select sales, sum(sales) over (order by quarter) from sales where year = 2012 sales (sum) 120 120 135 255 127 382 153 535
Supported OLAP Functions• Ranking functions
– RANK, DENSE_RANK (DENSERANK)– PERCENT_RANK, CUME_DIST, NTILE– LEAD, LAG
• Numbering functions– ROW_NUMBER (ROWNUMBER)
• Aggregate functions– SUM, COUNT, AVG, MIN, MAX– STDEV, VARIANCE, RANGE– FIRST_VALUE, LAST_VALUE– RATIO_TO_REPORT (RATIOTOREPORT)
Ranking Functions• Partition by clause is optional• Order by clause is required• Window frame clause is NOT allowed• Duplicate value handling is different between
rank() and dense_rank()– same rank given to all duplicates– next rank used “skips” ranks already covered by
duplicates in rank(), but uses next rank for dense_rank()
RANK vs DENSE_RANKselect emp_num, sales, rank() over (order by sales) as rank, dense_rank() over (order by sales) as dense_rankfrom sales;
emp_num sales rank dense_rank 101 2,000 1 1 102 2,400 2 2 103 2,400 2 2 104 2,500 4 3 105 2,500 4 3 106 2,650 6 4
PERCENT_RANK and CUME_DIST• Calculates ranking information as a percentile• Returns value between 0 and 1select emp_num, sales, percent_rank() over (order by sales) as per_rank, cume_dist() over (order by sales) as cume_distfrom sales;
emp_num sales per_rank cume_dist
101 2,000 0 0.166666667 102 2,400 0.2 0.500000000 103 2,400 0.2 0.500000000 104 2,500 0.6 0.833333333 105 2,500 0.6 0.833333333 106 2,650 1.0 1.000000000
NTILE
• Divides the ordered data set into N number of tiles indicated by the expression.
• Number of tiles needs to be exact numeric with scale zero
NTILE Exampleselect name, salary,
ntile(5) over (partition by dept order by salary)
from employee;
name salary (ntile)
John 35,000 1 Jack 38,400 1 Julie 41,200 2 Manny 45,600 2 Nancy 47,300 3 Pat 49,500 4 Ray 51,300 5
LEAD and LAG LEAD(expr, offset, default) LAG(expr, offset, default)• Gives LEAD/LAG value of the expression at the
specified offset• offset is optional, default to 1 if not specified• default is optional, NULL if not specified
– default used when offset goes beyond current partition boundary
• NULL handling– RESPECT NULLS (default)– IGNORE NULLS
LEAD/LAG Exampleselect name, salary, lag(salary) over (partition by dept order by salary), lead(salary, 1, 0) over (partition by dept order by salary)from employee; name salary (lag) (lead) John 35,000 38,400 Jack 38,400 35,000 41,200 Julie 41,200 38,400 45,600 Manny 45,600 41,200 47,300 Nancy 47,300 45,600 49,500 Pat 49,500 47,300 51,300 Ray 51,300 49,500 0
LEAD/LAG NULL handlingselect price, lag(price ignore nulls, 1) over (order by day), lead(salary, 1) ignore nulls over (order by day)from stock_price;
price (lag) (lead) 18.25 18.37 18.37 18.25 19.03 18.37 19.03 18.37 19.03 19.03 18.37 18.59 18.59 19.03 18.21 18.21 18.59
Numbering Functions
• Partition by clause and order by clause are optional
• Window frame clause is NOT allowed• Provides sequential row number to result set
– regardless of duplicates when order by is specified
ROW_NUMBER Example
select row_number() over (order by sales), emp_num, sales
from sales; (row_number) emp_num sales
1 101 2,000 2 102 2,400 3 103 2,400 4 104 2,500 5 105 2,500 6 106 2,650
Aggregate Functions• Partition by, order by and window frame clauses
are all optional– window frame clause requires order by clause
• All currently supported aggregate functions– SUM, COUNT, MIN, MAX, AVG, STDEV, RANGE,
VARIANCE• New aggregate functions
– FIRST_VALUE/LAST_VALUE– RATIO_TO_REPORT
Aggregate Function Exampleselect price, avg(price) over (order by day rows between 1 preceding and 1 following)from stock_price;
price (avg) 18.25 18.31 18.37 18.31 18.37 19.03 19.03 18.81 18.59 18.61 18.21 18.40
DISTINCT handling• DISTINCT is supported, however DISTINCT is mutually
exclusive with order by clause or window frame clause
select emp_id, manager_id, count(distinct manager_id) over (partition by department)from employee; emp_id manager_id (count) 101 103 3 102 103 3 103 100 3 104 110 3 105 110 3
FIRST_VALUE and LAST_VALUE
• Gives FIRST/LAST value of current partition
• NULL handling– RESPECT NULLS (default)– IGNORE NULLS
FIRST_VALUE/LAST_VALUE Example
select price, price – first_value(price) over (partition by year order by day) as diff_pricefrom stock_price;
price diff_price 18.25 0 18.37 0.12 19.03 0.78 18.59 0.34 18.21 -0.04
RATIO_TO_REPORT
• Computes the ratio of current value to sum of all values in current partition or window frame.
select emp_num, sales, ratio_to_report(sales) over (partition by year order by sales) from sales;
RATIO_TO_REPORT Example
select year, sales, ratio_to_report(sales) over (partition by year) from sales;
year sales (ratio_to_report) 1998 2400 0.2308 1998 2550 0.2452 1998 2650 0.2548 1998 2800 0.2692 1999 2450 0.2311 1999 2575 0.2429 1999 2725 0.2571 1999 2850 0.2689
Nested OLAP Functions• OLAP function can be nested inside another OLAP
function
select emp_id, salary, salary – first_value(salary) over (order by rank() over (order by salary)) as diff_salaryfrom employee;
select sum(ntile(10) over (order by salary)) over (partition by department)from employee;
OLAP functions and IWA
• Queries containing OLAP functions can be accelerated by Informix Warehouse Accelerator (IWA)
• IWA processes majority of the query block– scan, join, group by, having, aggregation
• Informix server processes OLAP functions based on query result from IWA
For more information
• Links to OLAP function in Informix 12.1 documentation
http://pic.dhe.ibm.com/infocenter/informix/v121/index.jsp?topic=%2Fcom.ibm.sqls.doc%2Fids_sqs_2583.htm
http://pic.dhe.ibm.com/infocenter/informix/v121/index.jsp?topic=%2Fcom.ibm.acc.doc%2Fids_acc_queries1.htm