QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A....
-
Upload
julian-harris -
Category
Documents
-
view
225 -
download
0
Transcript of QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A....
![Page 1: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/1.jpg)
QUERY BY EXCEL
A. Witkowski, S. Bellamkonda, T. Bozkaya, B. A. Naimat, L. Sheng,
S. Subramanian, A. Waingold
Oracle Corporation
![Page 2: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/2.jpg)
Spreadsheets
Spreadsheets are established analytical tools:– Attractive user interface
– Easy to use computational model
– Interactivity for what if analysis
But, they do not offer:–Scalability
–Parallelization
–A unified view of the data model
![Page 3: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/3.jpg)
Our proposal
QUERY BY EXCEL (QBX)
Combines Presentational interactive modeling power
of Excel (spreadsheet tools) Computational power and scalability of
RDBMS via analytical extensions
![Page 4: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/4.jpg)
QBX – How it works Analyst builds a model using Excel. The model
is translated to SQL and stored in relational views.
Analysts designate areas in Excel as relational sources (RTables). An RTable can be transformed into another RTable using Excel operations corresponding to Outer Join, Selection, Projection and Aggregation. Analyst does not write any SQL during this process.
Analysts write Excel formulas on samples of relational sources that fit in a spreadsheet. The translated SQL works on the whole data for scalability.
Business Reporting tools can access the relational views for consolidation.
![Page 5: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/5.jpg)
Analytic SQL Extensions for QBX SQL MODEL (Witkowski, et al. Sigmod
2003) SQL PIVOT (Cunningham, et al. Vldb
2004)
![Page 6: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/6.jpg)
QBX Architecture
Excel Analyst
RDBMS Interaction &
Modeling
Persistence
EXCEL
Excel ->
SQL Translation
QBX
Database
Schema
QBX generated
SQL Objects
RDBMS
Application RDBMS User
![Page 7: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/7.jpg)
QBX Metadata
Cells( eid, sheet, row, col, x, f) A B C D
1 Sale Diff
2 10.00
3 12.00
=C3-C2
For this Excel spreadsheet, we store five rows in Cells Table: C1, C2, C3, D1, D3
RTables( eid, RTable, sheet, row, col, sample, RTableView, …)
eid sheet
row col x f
1 1 1 3 ‘Sale’
1 1 1 4 ‘Diff’
1 1 2 3 ’10.00’
1 1 3 3 ’12.00’
1 1 3 4 ‘=C3-C2’
Excels( eid, name, owner, ExcelBinary, SQLView)
![Page 8: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/8.jpg)
QBX Infrastructure
Interaction and Modeling Component (VBA add on)
– Menu interface (QBX)– QBX->Rtables manages Rtables (import, add
column, save as relational view..)– QBX->Spreadsheet translates Excel to SQL,
saves and loads it.
Persistence Component (VBA add-on) Translation Component
![Page 9: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/9.jpg)
Excel to SQL Translation
Fix Frame translation Table Translation Unified Translation
![Page 10: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/10.jpg)
SELECT sheet, row, col, x FROM cellsMODEL DBY (sheet,row,col) MEA (x) RULES AUTOMATIC ORDER( x[1,1,2] = x[1,1,1] + x[1,2,1], -- B1=A1+A2 x[1,2,2] = x[1,3,1] + 1, -- B2=A3+1 x[1,3,2] = sum(x)[1,1<=row<=3,1] –- B3=sum(A1:A3));
Fix Frame TranslationA B C
1 1 =A1+A2
2 2 =A3+1
3 3 =sum(A1:A3)
![Page 11: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/11.jpg)
Fix Frame Translation-VLOOKUP - HLOOKUPWe use REFERENCE SQL MODEL:VLOOKUP(key, (<rs,cs>, <re,ce>), col):
REFERENCE vlookup_ref ON
( SELECT k.x key, v.x value
FROM cells k, cells v
WHERE k.col=cs AND v.col=cs+col-1 AND
k.row >= rs AND k.row <= re AND
v.row=k.row )
DIMENSION BY (key) MEASURES (value)
![Page 12: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/12.jpg)
Fix Frame Translation-VLOOKUP - HLOOKUP
EXAMPLE: A3 =Vlookup(C3, A1:B4, 2)
SELECT row, col, x FROM cellsMODELREFERENCE vlookup_ref ON (SELECT k.x key,v.x value FROM cells k,cells v WHERE k.col = 1 AND v.col = 2 AND k.row >= 0 AND k.row <= 4 AND v.row = k.row) DIMENSION BY(key) MEASURES(value)MAIN DIMENSION BY (row, col) MEASURES (x)RULES
( x[3,1] = vlookup_ref.value[ x[3,3] ] );
A B C D
1 1 2 3 4
2 11 6 7 8
3 9 10 11 12
4 13 14 15 16
5 17 18 19 20
6
C3
![Page 13: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/13.jpg)
Table Translation
Table Translation creates named, protected regions within Excel named RTables.
We remember associated metadata for RTable regions (PK, PK-FK constraints, etc..)
A direct RTable represents a– An RDBMS table (entire table or sample)– An RDBMS view (Direct Rtables can be created through QBX menu)
A derived RTable represents the result of relational operations on other RTables.
![Page 14: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/14.jpg)
Rtable ExampleA B C D … I J
1 FACT TIME_D
2 City Prod Month Sale Month Year
3 LA tv M1.00 10.00 M1.00 Y.00
4 LA radio M2.00 12.00 M2.00 Y.00
5 LA tv M1.01 14.00
6 LA radio M2.01 16.00 PROD_D
7 Boston tv M1.00 20.00 Prod Categ
8 Boston radio M2.00 22.00 tv Video
9 Boston tv M1.01 24.00 radio Audio
10 Boston radio M2.01 26.00
11 REGION_D
12 City State
13 LA CA
14 Boston MA
FACT(A2:D10)
TIME_D(I2:J4)
PROD_D(I7:J9)
REGION_D(I12:J14)
![Page 15: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/15.jpg)
Table Translation - Operations Inter-column calculations
Adding new (calculated) column to an Rtable Projection Joining of Rtables
The closest Excel operation to join is Hlookup/Vlookup,
Which is similar to relational OUTER JOIN.Steps: (R1 LEFT OUTER JOIN R2 ON R1.col1=R2.col2)1. A new column is added.2. The new column is populated with
(VLOOKUP(R1.Col1, R2, R2.Col2) Aggregation
![Page 16: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/16.jpg)
Inter-column Calculations
Computations involving columns of the same row. EX: A1 = B1+D1A B C D
1 10 8 2
2 15 9 6
3 18 2 16
… … … …MODELDBY(row,col) MEASURES (x)RULES( x[ANY, 1] = x[cv(row),2] + x[cv(row),4])
![Page 17: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/17.jpg)
Table Translation – Join ExampleA B C D E F G … I J
1 FACT TIME_D
2 City Prod Month State
Categ
Year Sale Month Year
3 LA tv M1.00 CA video Y.00 10.00
M1.00 Y.00
4 LA radio
M2.00 CA audio Y.00 12.00
M2.00 Y.00
5 LA tv M1.01 CA video Y.01 14.00
6 LA radio
M2.01 CA audio Y.01 16.00
PROD_D
7 Boston
tv M1.00 MA video Y.00 20.00
Prod Categ
8 Boston
radio
M2.00 MA audio Y.00 22.00
tv Video
9 Boston
tv M1.01 MA video Y.01 24.00
radio Audio
10 Boston
radio
M2.01 MA audio Y.01 26.00
E3=VLOOKUP(B3,I8:J9,2)
F3=VLOOKUP(C3,I3:J4,2)
![Page 18: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/18.jpg)
Join SQL
SELECT f.city, f.prod, f.month, g.state, p.categ, t.year, sale,
row_number() over (order by city, prod, month) rn
FROM
fact f outer join time_d t on f.month = t.month
outer join prod_d p on f.prod = p.prod
outer join geog_d g on f.city = g.city
ORDER BY city NULLS LAST, prod NULLS LAST,
month NULLS LAST;
![Page 19: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/19.jpg)
Table Translation- Aggregation Aggregation in Excel can be done
through DATA PIVOTTABLE operation.This corresponds to (in RDBMS):
Aggregation via SQL GROUP BY operator Aggregation via SQL PIVOT operator
![Page 20: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/20.jpg)
Aggregation – SQL GROUP BYL M N
1 AGG_Q
2 State Year
Total
3 CA Y.00 22.00
4 Y.01 30.00
5 52.00
6 MA Y.00 42.00
7 Y.01 50.00
8 92.00
SELECT state, year, sum(amt) amt,
row_number() over (order by state,year) rn
FROM
fact f outer join time_d t on f.month = t.month
outer join prod_d p on f.prod = p.prod
outer join geog_d g on f.city = g.city
GROUP BY
GROUPING SETS ((state,year),(state))
ORDER BY state NULLS LAST,
year NULLS LAST;
![Page 21: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/21.jpg)
Translation of Fix Frame Operations on RTables
Excel computation is possible once we map relational data to 2-D form. This is called linearization.
Assignment Linearization Reference Linearization
![Page 22: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/22.jpg)
Reference Linearization
L M N O
1 AGG_Q
2 State Year Total Ratio
3 CA Y.00 22.00
=N3/N5
4 Y.01 30.00
=N4/N5
5 52.00
=N5/N5
6 MA Y.00 42.00
=N6/N8
7 Y.01 50.00
=N7/N8
8 92.00
=N8/N8
SELECT row, col, x FROM cells
MODEL
REFERENCE r ON
( SELECT rn, state, time, total FROM RT )
DIMENSION BY (rn)
MEASURES (state, time, total)
DIMENSION BY (row, col) MEASURES (x)
( x[3, 15] = r.total[1] / r.total[3], -- =N3/N5
x[4, 15] = r.total[2] / r.total[3], -- =N4/N5
x[5, 15] = r.total[3] / r.total[3], -- =N5/N5
x[6, 15] = r.total[4] / r.total[6], -- =N6/N5
x[7, 15] = r.total[5] / r.total[6], -- =N6/N5
x[8, 15] = r.total[6] / r.total[6] -- =N8/N5
);
![Page 23: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/23.jpg)
Relative Referencing to RTables
Introducing a new lookup function for referencing values in Rtables:
RTLOOKUP(RTREGION, COL, {PKEYS})
L M N O
1 AGG_Q
2 State Year
Total Ratio
3 CA Y.00 22.00
0.42
4 Y.01 30.00
0.58
5 52.00
1.00
6 MA Y.00 42.00
0.48
7 Y.01 50.00
0.52
8 92.00
1.00
O3 = N3/rtlookup(L2:N8,3,L3,NULL)
O4 = N3/rtlookup(L2:N8,3,L3,NULL)
![Page 24: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/24.jpg)
Relative Referencing to RTablesCREATE VIEW AGG_Q ASSELECT state, year, ratio, sum(amt) amt, row_number() over () (order by state, year) rnFROM fact f outer join time_d t on f.month=t.month outer join prod_d p on f.prod = p.prod outer join geog_d g on f.city = g.cityGROUP BY GROUPING SETS ((state,year),(state))MODEL DBY (state, year) MEA (total, 0 ratio)( ratio[ANY,ANY] = total[CV(state),CV(year)]/total[CV(state), null])ORDER BY state nulls last, year nulls last;
![Page 25: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/25.jpg)
Optimizations
Collapsing of Equivalent Rules For Loops vs. Existential Form Existing optimization of SQL Model
functionality– Rule pruning– Filter pushdown– Others…
![Page 26: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/26.jpg)
Optimizations- Collapsing of Rules
MODEL
DBY(row,col) MEA(x)
RULES
(
x[1,1] = x[1,2] + x[1,4],
x[2,1] = x[2,2] + x[2,4],
…
x[20,1] = x[20,2] +x[20,4]
);
MODELDBY(row,col) MEA(x)RULES( x[for row from 1 to 20,1] = x[CV(row),2] + x[CV(row), 4]);
==
A B C D
1 10 8 2
2 15 9 6
3 18 2 16
4 … … …
5
A1=B1+D1
![Page 27: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/27.jpg)
Optimizations- Collapsing of Rules
0
510
15
2025
30
3540
45
500 1000 2500 5000 10000
Number of rules
Co
mp
ilat
ion
tim
e
![Page 28: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/28.jpg)
Optimization – For Loops vs Existential Rules
MODELDBY(row,col) MEA(x)RULES( x[for row from 1 to 20,1] = x[CV(row),2] +
x[CV(row), 4]);
MODELDBY(row,col) MEA(x)RULES( x[1<=row<=20,1] = x[CV(row),2] +
x[CV(row), 4]);
VS
A B C D
1 10 8 2
2 15 9 6
3 18 2 16
4 … … …
5
A1=B1+D1
![Page 29: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/29.jpg)
Optimization – For Loops vs Existential Rules
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0.1 0.5 1 5 10 20 40
Percentage of cells modified
Lo
ok
up
tim
e/S
ca
n t
ime
Use For Loops Use scan
![Page 30: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/30.jpg)
Conclusion Our goal is to translate Excel computation to SQL
so that Business Models built in Excel can be stored and queried in RDBMS.
We proposed translation techniques for expressing Excel computation in RDBMS SQL using new analytic extensions.
We proposed representation techniques for relational data in Excel by using Rtables and described how Excel operations on RTables can be simulated in SQL.
We discussed how this proposed system would fit into our RDBMS SQL execution engine and benefit from all its capabilities and optimizations.
![Page 31: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/31.jpg)
What is ahead?
Excel– Pivoting and advanced filtering turned out
to be essential, but a few more relational friendly extensions would go a long way, particularly in simulating joins, window function computations.
SQL– RDBMS SQL needs to be extended to cover
the functionality provided in Excel, particularly financial functions.
![Page 32: QUERY BY EXCEL A.Witkowski, S. Bellamkonda, T. Bozkaya, B.A. Naimat, L. Sheng, S. Subramanian, A. Waingold Oracle Corporation.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f145503460f94c2830a/html5/thumbnails/32.jpg)
AQ&Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S