Post on 03-Feb-2021
Pivot Tables
The Pivot relational operator (available in some SQL platforms/servers) allows us to write cross-tabulation queries from tuples in tabular layout. It takes data in separate rows, aggregates it and convert it into columns.
1
Pivot Tables – Motivation (1)
2
If we want to know how many customers bought someting once, twice, thrice and so on, from each state, a regular SQL to satisfy that query would be, select state_code, times_purchased, count(*) cnt from customers group by state_code, times_purchased;
Cust_id Cust_name State_code Times_purchased
1 John CT 1
2 Mary NY 10
3 Alfredo NJ 2
4 Ana NY 4
... ... ...
Considering table customers as:
Pivot Tables – Motivation (2)
3
This is the information we need but it is a little hard to read. A crosstab where we could organize the data vertically and states horizontally would be preferable:
State_code Times_purchased cnt
CT 0 90
CT 1 165
CT 2 179
... ... ...
NY 1 33048
Whose result would be:
Times_purchased CT NY NJ ...
0 90 0 35 ...
1 165 33048
20 ...
2 179 219 37 ...
3 ...
Pivot Tables: another example
order_id customer_ref product_id
50001 SMITH 10
50002 SMITH 20
50003 ANDERSON 30
50004 ANDERSON 40
50005 JONES 10
50006 JONES 20
50007 SMITH 20
50008 SMITH 10
50009 SMITH 20
The following tuples:
4
Can be shown as:
customer_ref 10 20 30
ANDERSON 0 0 1
JONES 1 1 0
SMITH 2 3 0
PIVOT clause – syntax (1)
SELECT * FROM ( SELECT column1,…, columnj FROM tables WHERE conditions ) PIVOT ( aggregate_function(columnj) FOR columnj IN ( expr1, expr2, ... expr_n) | subquery ) ORDER BY expression [ ASC | DESC ];
5
PIVOT clause – syntax (2)
Where: aggregate_function can be a function such as SUM, COUNT, MIN, MAX or AVG IN ( expr1, expr2, ... expr_n ) is a list of values for columnj to pivot into headings in the cross-tabulation query. Each distinct value will be shown as a separate column subquery can be used instead of a list of values.
6
PIVOT clause – Application (1)
7
select * from ( select times_purchased times, state_code from customers t ) pivot ( count(state_code) for state_code in ('NY','CT','NJ','FL','MO') ) order by times_purchased
times NY CT NJ FL MO
0 16601 90 35 0 0
1 33048 165 20 0 0
2 33151 179 37 0 0
3 32978 173 0 0 0
4 33109 173 0 1 0
Searching with PIVOT clause (1)
8
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
7839 KING PRESIDENT 17-NOV-81 5000 10
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7566 JONES MANAGER 7839 02-APR-81 2975 20
... ... ... ... ... ... ... ...
EMP table
Question: For each job, display the salary totals in a separate column for each department.
Searching with PIVOT clause (2)
9
JOB 10 20 30 40
CLERK 1430 2090 1045
SALESMAN 6160
PRESIDENT 5500
MANAGERT 2695 3272.5 3135
ANALYST 6600
WITH pivot_data AS (SELECT deptno, job, sal from EMP) select * from pivot_data PIVOT ( SUM(sal) for deptno in (10, 20, 30, 40) );
The list of values in deptno was hard-coded in this example (10, 20, 30, 40)
Searching with PIVOT clause (2)
10
JOB 10 20 30 40
CLERK 1430 2090 1045
SALESMAN 6160
PRESIDENT 5500
MANAGER 2695 3272.5 3135
ANALYST 6600
select * from (SELECT deptno, job, sal from EMP) PIVOT ( SUM(sal) for deptno in (10, 20, 30, 40));
Alternatively, an inline-view may be used to obtain the same result:
Searching with PIVOT clause (3)
11
Groupings will be affected if pivot queries are performed on a larger set of columns. Ex: SELECT * from EMP PIVOT ( SUM(sal) for deptno in (10, 20, 30, 40)); Here, deptno is still the pivot column but the large group of columns Including a superkey of EMP cause the effective useless of the pivot (results in the next slide).
Searching with PIVOT clause (4)
12
EMPNO ENAME JOB MGR HIREDATE COMM 10 20 30 40
7654 MARTIN SALESMAN 7698 28/09/81 1400 1375
7698 BLAKE MANAGER 7839 01/05/81 3135
7934 MILLER CLERK 7782 23/01/82 1430
7521 WARD SALESMAN 7782 22/02/81 500 1375
7566 JONES MANAGER 7698 02/04/81 3272.5
7844 TURNER SALESMAN 7839 08/09/81 0 1650
7900 JAMES CLERK 7698 03/12/81 1045
7839 KING PRESIDENT 19/04/87 5500
7876 ADAMS CLERK 7788 23/05/87 1210
7902 FORD ANALYST 7566 03/12/81 3300
... ... ... ... ... ... ... ... ... ...
Searching with PIVOT clause (5)
13
Question: For ANALYST, CLERK and SALESMAN, display the salary totals in a separate column for each department.
WITH pivot_data AS (SELECT deptno, job, sal from EMP) select * from pivot_data PIVOT ( SUM(sal) for deptno in (10, 20, 30, 40)) where job in (‘ANALYST’, ‘CLERK’, ‘SALESMAN’);
JOB 10 20 30 40
CLERK 1430 2090 1045
SALESMAN 6160
ANALYST 6600
Searching with PIVOT clause (6)
14
Aliases can be used:
WITH pivot_data AS (SELECT deptno, job, sal from EMP) select * from pivot_data PIVOT ( SUM(sal) as salaries for deptno in (10 as Dep10, 20 as Dep20, 30 as Dep30, 40 AS Dep40)) where job in (‘ANALYST’, ‘CLERK’, ‘SALESMAN’);
JOB Dep10_salaries Dep20_salaries Dep30_salaries Dep40_salaries
CLERK 1430 2090 1045
SALESMAN 6160
ANALYST 6600
Searching with PIVOT clause (7)
15
Pivoting multiple columns:
WITH pivot_data AS (SELECT deptno, job, sal from EMP) select * from pivot_data PIVOT ( SUM(sal) as sum, count(sal) as cnt for deptno in (10 as D10, 20 as D20, 30 as D30));
JOB D10_sum D10_cnt ... D30_sum D30_cnt
CLERK 1430 1 ... 1045 1
SALESMAN 0 ... 6160 4
PRESIDENT 5500 1 ... 0
MANAGER 2695 1 ... 3135 1
ANALYST 0 ... 0
Searching with PIVOT clause (8)
16
Or:
WITH pivot_data AS (SELECT deptno, job, sal from EMP) select * from pivot_data PIVOT ( SUM(sal) as sum, count(sal) as cnt for (deptno, job) in ((30, 'SALESMAN') as d30_sls, (30, 'MANAGER') as d30_mgr, (30, 'CLERK') AS d30_clk));
D30_SLS_SUM D30_SLS_CNT D30_MGR_SUM D30_MGR_CNT ...
6160 4 3135 1 ...
PIVOTing an Unknown Domain of Values (1)
17
By default, the pivot syntax does not support a dynamic list of values in the pivot_in_clause. A subquery instead of a hard-code list of values used in the pivot_in_clause will generate an error: SELECT * FROM emp PIVOT (SUM(sal) AS salaries FOR deptno IN (SELECT deptno FROM dept));
PIVOTing an Unknown Domain of Values (2)
18
A possible workaround to solve this problem: (with Oracle) select * from (SELECT deptno, job, sal from EMP) PIVOT XML ( SUM(sal) for deptno in (any));
JOB DEPTNO_XML
ANALYST 206600
MANAGER ....
... ...
It implies extra work to read the information from the XML format!
PIVOTing an Unknown Domain of Values (3)
19
Another workaround to solve the problem (with Oracle SQLplus): column namelist new_value nlist noprint; /* first obtain a string with the list of distinct values of deptno select wm_concat(''''||deptno||'''') namelist from (select distinct deptno from emp) connect by nocycle deptno = prior deptno group by level; WITH pivot_data AS (SELECT deptno, job, sal from EMP) select * from pivot_data PIVOT ( SUM(sal) for deptno in (&nlist)); /* &nlist is a variable containing the string “'10','20','30‘“(results in the next slide). */
PIVOTing an Unknown Domain of Values (4)
20
JOB 10 20 30
CLERK 1430 2090 1045
SALESMAN 6160
PRESIDENT 5500
MANAGER 2695 3272.5 3135
ANALYST 6600
UnPIVOT – turning pivot tables into rows (1)
21
SELECT ... FROM ... UNPIVOT [INCLUDE|EXCLUDE NULLS] ( unpivot_clause unpivot_for_clause unpivot_in_clause ) WHERE ...
unpivot clause: specifies a name for a column to represent the unpivoted measure values. unpivot_for_clause: specifies the name for the column that will result from our unpivot query. unpivot_for_clause: this contains the list of pivoted columns (not values) to be unpivoted
UnPIVOT – turning pivot tables into rows (2)
22
CREATE VIEW pivoted_data as SELECT * FROM pivot_data PIVOT (SUM(sal) FOR deptno IN (10 AS d10_sal, 20 as d20_sal, 30 aS d30_sal, 40 AS d40_sal));
select * from pivoted_data;
JOB D10_sal D20_sal D30_sal D40_sal
CLERK 1430 2090 1045
SALESMAN 6160
PRESIDENT 5500
MANAGERT 2695 3272.5 3135
ANALYST 6600
UnPIVOT – turning pivot tables into rows (3)
23
SELECT * FROM pivoted_data UNPIVOT ( Deptsal FOR saldesc IN (d10_sal, d20_sal, d30_sal, d40_sal) );
JOB SALDESC DEPTSAL
CLERK D10_SAL 1430
CLERK D20_SAL 2090
CLERK D30_SAL 1045
SALESMAN D30_SAL 6160
PRESIDENT D10_SAL 5500
MANAGER D10_SAL 2695
MANAGER D20_SAL 3272.5
MANAGER D30_SAL 3135
ANALYST D20_SAL 6600
UnPIVOT – other uses (1)
24
Since columns in the unpivot_in_clause must all be of the same datatype, this would cause an error:
SELECT empno, job, unpivot_col_name, unpivot_col_value FROM emp UNPIVOT (unpivot_col_value FOR unpivot_col_name IN (ename, deptno, hiredate));
UnPIVOT – other uses (2)
25
A workaround (in oracle) consists on datatype conversion: WITH emp_data AS ( SELECT empno, job , ename , TO_CHAR(deptno) as deptno, TO_CHAR(hiredate) as hiredate FROM emp) SELECT empno , job , unpivot_col_name , unpivot_col_value FROM emp_data UNPIVOT (unpivot_col_value FOR unpivot_col_name IN (ename, deptno, hiredate)); (results in the next page)
UnPIVOT – other uses (3)
26
EMPNO JOB UNPIVOT_COL_NAME UNPIVOT_COL_VALUE
7369 CLERK ENAME SMITH
7369 CLERK DEPTNO 20
7369 CLERK HIREDATE 17/12/1980
7499 SALESMAN ENAME ALLEN
7499 SALESMAN DEPTNO 30
7499 SALESMAN HIREDATE 20/02/1981
... ... ... ...