SQL Top-N and Pagination Pattern (IOUG) - Whitepaper

Track: Database

SQL SQL TOPTOP-N -N ANDAND P PAGINATIONAGINATION PATTERNPATTERN

Maxym Kharchenko

ABSTRACT

TARGET AUDIENCE DBAs and developers, who design and tune SQL queries will benefit from this whitepaper. Attendees are expected to be familiar with (ORACLE) SQL query syntax and be able to interpret and understand SQL plans and execution statistics.

EXECUTIVE SUMMARY “Top-N” queries and their close cousins: “pagination” queries are a special class of SQL queries that have a very unique requirement: Do NOT return all the data that qualifies!Instead, these queries impose an additional “data window” restriction that is designed to return “no more than N” most interesting records to the user.While this additional restriction seems simple, it requires effort to implement it so that top-N and pagination queries are efficient every time they run. Making top-N and pagination queries efficient is the focus of this whitepaper.

BACKGROUND Most SQL queries are designed to answer user questions completely. That is, whatever restrictions a user puts in a WHERE clause, fully defines the data that user gets in the end. Be it 10 records, 1000 records or 10 million records.

However, in some cases, knowing (and extracting) a full set of data may be overkill.

Think of a typical search query on websites such as google.com or reddit.com. In these searches, users usually request searches on very generic keywords, such as “LOLcats” or “snow chains”, that can result in thousands or millions of data hits. Now a website search engineer

1 Session # 403

Pagination (of which top-N is a special use case) is a very common SQL query technique. It deals with extracting a limited number of “most interesting” records from a potentially large qualifying result set. While pagination requests seem simple (and the next generation of ORACLE database made them even simpler), executing these queries efficiently requires some ground work in both schema design and query coding.

Learner will be able to: Design efficient and well performing top-N and pagination SQL queries. Design databases objects (i.e. indexes or additional columns) to support efficient

pagination. Spot potential problems with pagination queries and address them.

Track: Database

has a few problems to contend with while deciding how to display the data that qualified.

1. First of all, it would be blatantly impossible to “fit” all the qualified data in one screen. Even though computer monitors are getting increasingly larger, they are nowhere near the capacity to hold millions of “items of interest”.

2. Displaying everything wouldn’t be a very wise thing to do anyway, due to the fact that users (or, at least, human users) do not have mental capacity to grasp “millions of items”. According to medical research, after a few seconds of focused attention, “it is likely that a person will look away, return to a previous task, or think about something else”.

These are, obviously, major issues and the very common technique to address them is called “pagination”, where query results are organized and presented in “pages”: a user immediately sees the first “page” that contains the most relevant data (we’ll talk about what this means shortly) and is given ability to look at other “pages” (aka: “paginate”) to see progressively less interesting data.

As pagination queries are very likely supported by some sort of back end database (such as: ORACLE), let’s look at the problem from a query designer perspective.

One obvious thing is the definition of “most interesting data”. This implies that the data must be ordered or pre-ordered (from the most interesting to the least interesting) at some point during query execution.

Another obvious thing is that “fact based WHERE restrictions” are not the whole story here. While it is possible to extract all the qualified data by SQL and then order and select most interesting records by some other means outside the database, it would obviously be very inefficient.

So, our first requirement is to do ordering and selection in the database: design SQL queries that will return only the limited subset of “the most interesting data” (“Give me the top 10 results”, aka: “top-N queries”) or only the limited subsets of “a bit less interesting data” (“Give me only the results from 20 to 30”, aka: “pagination queries”).

Our second requirement is to make top-N and pagination queries consistently efficient. In a nutshell, such query:

Has to be fast (duh!) Its timing has to be constant, regardless of whether 10 records or 10 million records

qualified through WHERE clause Its timing cannot depend on whether it is a “page 1” or “page 1000” in a range of most

interesting records

Writing efficient pagination queries is a challenge (in other words, it is very easy to write an inefficient pagination query). Fortunately, there are a number of query techniques that can help us meet our efficiency goals and it is these techniques I will focus on in this whitepaper.

2 Session # 403

http://en.wikipedia.org/wiki/Attention_span

Track: Database

TECHNICAL DISCUSSIONS AND EXAMPLES

SETUP

For these exercises, I’m going to use 2011 US census data, which you can download here:

http://www.census.gov/popest/data/cities/totals/2011/files/SUB-EST2011_AL_MO.csv http://www.census.gov/popest/data/cities/totals/2011/files/SUB-EST2011_MT_WY.csv

CREATE DIRECTORY zips_dir AS ‘directory path’;

GRANT read, write ON DIRECTORY zips_dir TO your_user;

CREATE TABLE ext_sub_est2011_al_mo ( SUMLEV varchar2(100), STATE varchar2(2), COUNTY varchar2(4), PLACE varchar2(100), COUSUB varchar2(100), CONCIT varchar2(100), NAME varchar2(100), STNAME varchar2(100), CENSUS2010POP varchar2(100), ESTIMATESBASE2010 varchar2(100), POPESTIMATE2010 varchar2(100), POPESTIMATE2011 varchar2(100))organization external ( type oracle_loader default directory zips_dir access parameters ( records delimited by newline skip 2 fields terminated by ',' missing field values are null ) location ('SUB-EST2011_AL_MO.csv'))reject limit unlimited/

select count(1) from ext_sub_est2011_al_mo/

drop table ext_sub_est2011_mt_wy

3 Session # 403

http://www.census.gov/popest/data/cities/totals/2011/files/SUB-EST2011_MT_WY.csv

http://www.census.gov/popest/data/cities/totals/2011/files/SUB-EST2011_AL_MO.csv

Track: Database

/

create table ext_sub_est2011_mt_wy ( SUMLEV varchar2(100), STATE varchar2(2), COUNTY varchar2(4), PLACE varchar2(100), COUSUB varchar2(100), CONCIT varchar2(100), NAME varchar2(100), STNAME varchar2(100), CENSUS2010POP varchar2(100), ESTIMATESBASE2010 varchar2(100), POPESTIMATE2010 varchar2(100), POPESTIMATE2011 varchar2(100))organization external ( type oracle_loader default directory zips_dir access parameters ( records delimited by newline skip 2 fields terminated by ',' missing field values are null ) location ('SUB-EST2011_MT_WY.csv'))reject limit unlimited/

select count(1) from ext_sub_est2011_mt_wy/

drop table cities/

create table cities ( name not null, state not null, population not null) pctfree 99 pctused 1as select name, stname, to_number(census2010pop)from ext_sub_est2011_al_mowhere regexp_like(census2010pop, '\d+') and name <> stname and name NOT LIKE '%County'

4 Session # 403

Track: Database

/

insert /*+ append */ into cities ( select name, stname, to_number(census2010pop) from ext_sub_est2011_mt_wy where regexp_like(census2010pop, '\d+') and name <> stname and name NOT LIKE '%County')/

commit;

select count(1) from cities/

drop table ext_sub_est2011_al_mo/

drop table ext_sub_est2011_mt_wy/

exec dbms_stats.gather_table_stats(user, 'CITIES');

In the end, you should have the CITIES table with this simple structure:

SQL> @desc CITIES

Name Null? Type ----------------------------------------- -------- ---------------------------- NAME VARCHAR2(100) STATE VARCHAR2(100) POPULATION NUMBER

And a fair size for our purposes:

SQL> SELECT segment_name, segment_type, round(bytes/1024/1024/1024, 2) as size_gbFROM dba_segments WHERE segment_name='CITIES';

SEGMENT_NAME SEGMENT_TYPE SIZE_GB------------------------------ -------------------- -----------CITIES TABLE .4

SQL> SELECT count(1) FROM cities;

5 Session # 403

Track: Database

COUNT(1)---------- 75727

NAÏVE TOP-N

We will begin our quest with this simple top-N query:

GIVE ME THE TOP 5 MOST POPULOUS CITIES IN THE UNITED STATES.

Despite its simplicity, this is a stumbling block for many developers (which makes it a good interview question :-) ). For many, the first approach to this query looks like:

SELECT name, populationFROM citiesWHERE rownum <= 5ORDER BY population DESC/

NAME POPULATION--------------- ----------Alabaster city 30352Adamsville city 4522Abbeville city 2688Addison town 758Akron town 356

--------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |--------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 5 | 115 | | 15503 (1)| 00:03:22 || 1 | SORT ORDER BY | | 5 | 115 | 2576K| 15503 (1)| 00:03:22 ||* 2 | COUNT STOPKEY | | | | | | || 3 | TABLE ACCESS FULL| CITIES | 81698 | 1835K| | 14950 (1)| 00:03:15 |--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):---------------------------------------------------

2 - filter(ROWNUM<=5)

Statistics---------------------------------------------------------- 6 consistent gets

The query is blazingly fast (only 6 consistent gets), but … can you notice anything wrong about results?

6 Session # 403

Track: Database

Of course! As much as I love “Alabaster City”, I doubt that it is the most populous city in the United States.

What happened here can be explained by the fact that in ORACLE SQL, WHERE condition is evaluated before ORDER BY. Thus, the WHERE rownum <= 5 data is selected first (taking the first 5 records from the first data block) and then these “random results” are sorted.

This is obviously not what we wanted.

CORRECT TOP-N

Since WHERE is processed before ORDER BY, we have to modify our query to get correct results.

In many databases, top-N requests can be coded quite simply. I.e., this is an example of top-N query in MySql:

SELECT name, population FROM citiesorder by population DESCLIMIT 5;

In MongoDb (mongo shell):

db.cities.find({}, {name: 1, population: 1, _id: 0}) .sort({population: -1}) .limit(5)

And in ORACLE:

SELECT name, populationFROM citiesORDER BY population DESCFETCH FIRST 5 ROWS ONLY;

ORACLE query actually does not look too bad. Unfortunately, this syntax is only available in (as of yet unreleased) ORACLE 12c. In ORACLE 11g and before, the top-N syntax is more complicated and consists of two queries:

the inner query that does the ordering the outer query that does the limiting

SELECT * FROM ( SELECT name, population FROM cities ORDER BY population DESC

7 Session # 403

Track: Database

) WHERE rownum <= 5;

CORRECT TOP-N QUERY: EXECUTION:

Let’s execute this query and see how it performs:

set timi onset autotrace on

SELECT * FROM ( SELECT name, population FROM cities ORDER BY population DESC) WHERE rownum <= 5/

NAME POPULATION------------------ ----------New York city 8175133Los Angeles city 3792621Los Angeles city 3792621Chicago city 2695598Chicago city (pt.) 2695598

Elapsed: 00:00:04.44

------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 5 | 325 | | 14975 (1)| 00:03:15 ||* 1 | COUNT STOPKEY | | | | | | || 2 | VIEW | | 86662 | 5501K| | 14975 (1)| 00:03:15 ||* 3 | SORT ORDER BY STOPKEY| | 86662 | 5501K| 6488K| 14975 (1)| 00:03:15 || 4 | TABLE ACCESS FULL | CITIES | 86662 | 5501K| | 13646 (1)| 00:02:58 |------------------------------------------------------------------------------------------


1 - filter(ROWNUM<=5) 3 - filter(ROWNUM<=5)


8 Session # 403

Track: Database

The results look right, but wow! – 4.4 seconds is A LOT of time for this query. The reason for that becomes clear if we look at the execution plan – a full table scan is being performed to get the results!

Logically, what is happening is this:

We have 5 records that we are really looking for here and they are scattered in some random places in the table segment. ORACLE thus must scan the entire segment to find them.

But wait! Perhaps we were just unlucky. What if the records that we searched for were found in the first few blocks that we searched, something like this:

Would it make a difference? Let’s find out:

CREATE TABLE ordered_cities pctfree 99 pctused 1 AS SELECT * FROM cities ORDER BY population DESC/

SELECT * FROM ( SELECT name, population FROM ordered_cities ORDER BY population DESC) WHERE rownum <= 5/

Elapsed: 00:00:06.49

--------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |--------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 5 | 325 | | 14579 (1)| 00:03:10 ||* 1 | COUNT STOPKEY | | | | | | || 2 | VIEW | | 66254 | 4205K| | 14579 (1)| 00:03:10 ||* 3 | SORT ORDER BY STOPKEY| | 66254 | 4205K| 4968K| 14579 (1)| 00:03:10 || 4 | TABLE ACCESS FULL | ORDERED_CITIES | 66254 | 4205K| | 13562 (1)| 00:02:57 |

9 Session # 403

Track: Database

--------------------------------------------------------------------------------------------------


There is no difference! The problem here is that even though, all the results ARE in the first searched block – ORACLE does not know that it is true.

Data in a regular (heap) ORACLE table may or may not be sorted, but the important point is that sorting order is not guaranteed. Thus ORACLE can never be sure if these are no more “better qualifying” records somewhere down the road and thus it has to scan everything and then sort everything.

Obviously the results would be drastically different if we could be sure that the records are truly sorted. If only there was a storage object in ORACLE that would enforce this …

“GUARANTEED ORDER” DATA STRUCTURE (AKA: “AN INDEX”)

The storage object, which we are looking for is called an index. Let’s build one on CITIES.POPULATION column and see what happens:

SQL> CREATE INDEX i_pop ON cities(population) pctfree 99;

Index created.

SQL> SELECT * FROM ( SELECT name, population FROM cities ORDER BY population DESC) WHERE rownum <= 5/

NAME POPULATION

------------------ ----------

New York city 8175133

Los Angeles city 3792621

Los Angeles city 3792621

Chicago city (pt.) 2695598

Chicago city 2695598

Elapsed: 00:00:00.02

10 Session # 403

Track: Database

----------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 5 | 325 | 22 (0)| 00:00:01 |

|* 1 | COUNT STOPKEY | | | | | |

| 2 | VIEW | | 10 | 650 | 22 (0)| 00:00:01 |

| 3 | TABLE ACCESS BY INDEX ROWID| CITIES | 75727 | 1626K| 22 (0)| 00:00:01 |

| 4 | INDEX FULL SCAN DESCENDING| I_POP | 10 | | 12 (0)| 00:00:01 |

----------------------------------------------------------------------------------------

Statistics

----------------------------------------------------------

12 consistent gets

Huge difference, only 12 consistent gets, 2 orders of magnitude smaller!

Note: don’t be freaked out by: INDEX FULL SCAN, it is really a range scan and ORACLE just misrepresents it. The important metric here is: number of consistent gets and also, you can easily verify what is happening by running a SQL trace.

WHY INDEXES WORK

There are 3 reasons:

1) Because index is sorted, all the data that we are looking for is co-located together, so ORACLE only needs to scan a few pages to get all results

2) More importantly, because of the order guarantee, ORACLE can stop after reading 5 records from the index – as there cannot be any qualified data left

3) Finally, notice that SORT ORDER BY operation is gone and this is not a small thing by itself.

UNCERTAIN NATURE OF FILTERING

Let’s try a more elaborate top-N query by asking for the most populous cities, located in Florida.

GIVE ME THE TOP 5 MOST POPULOUS CITIES IN FLORIDA.

We are going to use the same index on POPULATION.

SELECT * FROM (

11 Session # 403

Track: Database

SELECT name, population FROM cities c WHERE state='Florida' ORDER BY population DESC) WHERE rownum <= 5/

5 rows selected.

----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 5 | 325 | 1015 (0)| 00:00:14 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 10 | 650 | 1015 (0)| 00:00:14 ||* 3 | TABLE ACCESS BY INDEX ROWID| CITIES | 1485 | 44550 | 1015 (0)| 00:00:14 || 4 | INDEX FULL SCAN DESCENDING| I_POP | 510 | | 512 (0)| 00:00:07 |----------------------------------------------------------------------------------------


1 - filter(ROWNUM<=5) 3 - filter("STATE"='Florida')


SELECT * FROM ( SELECT name, population FROM cities c WHERE state='Florida' ORDER BY population DESC) WHERE rownum <= 200/

200 rows selected.

----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 10 | 650 | 1015 (0)| 00:00:14 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 10 | 650 | 1015 (0)| 00:00:14 ||* 3 | TABLE ACCESS BY INDEX ROWID| CITIES | 1485 | 44550 | 1015 (0)| 00:00:14 || 4 | INDEX FULL SCAN DESCENDING| I_POP | 510 | | 512 (0)| 00:00:07 |----------------------------------------------------------------------------------------

12 Session # 403

Track: Database


1 - filter(ROWNUM<=200) 3 - filter("STATE"='Florida')


I think you can clearly see a problem here. We now have to filter in the index (in other words, we have to read through some junk and throw it away). It might be ok for a small data window (depending on the data), but as pagination windows get larger, the problem becomes increasingly worse.

A typical scenario here is that your query will return a few rows, “freeze”, return a few rows again etc. Clearly, it is not a good situation to be in.

MULTICOLUMN INDEXES

There is a trick that we can do here and it is related to the fact that we have an equality condition on STATE.

Consider a multicolumn index on (STATE, POPULATION). It is ordered on STATE and also on STATE+POPULATION, but is not ordered on POPULATION directly.

However, if we “fix the STATE with equality” (state = ‘Florida’), we now have an effective subindex, which IS ordered by POPULATION for Florida. And our top-N becomes efficient again.

SQL> CREATE INDEX i_state_pop ON cities (state, population) pctfree 99;

Index created.

SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida'

13 Session # 403

Track: Database

ORDER BY population DESC) WHERE rownum <= 5/

----------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 5 | 325 | 22 (0)| 00:00:01 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 10 | 650 | 22 (0)| 00:00:01 || 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 1485 | 44550 | 22 (0)| 00:00:01 ||* 4 | INDEX RANGE SCAN DESCENDING| I_STATE_POP | 10 | | 12 (0)| 00:00:01 |----------------------------------------------------------------------------------------------


1 - filter(ROWNUM<=5) 4 - access("STATE"='Florida')


TRIPS TO THE TABLE

One final consideration for efficient top-N is trips to the table and it is a big one!

Consider that top-N queries are, essentially, index range scans. A typical index range scan is executed in 3 steps:

1) Descend through the index structure to the beginning of the range2) Traversing leaf blocks in the range until (top-N) condition is satisfied3) If additional filtering (or additional data) is needed that is not in the index, for each index

entry a trip to the table is made.

Let’s look at the costs of range scan, assuming our window size is 500 records (where rownum <= 500)

Step 1 is usually very lightweight as ORACLE indexes are shallow. Typically we’ll read 3-4 blocks here.

Step 2 is heavier, but usually not by much, as all the data we are interested in is (hopefully) close together and on top of that, index entries are usually pretty small and you can pack a lot of them in one block. We are probably looking at something like 5-10 logical reads here (assuming index is well packed and we do not filter a lot in the index itself).

14 Session # 403

Track: Database

But step 3 is different. Since we need a separate trip to the table for every index entry that qualifies, we are looking at 500 separate trips to the table (and logical reads). If we are lucky, and index clustering factor is small (that is: table and index are ordered the same way), it is likely that the actual number of blocks being read would be fairly small (we still have to perform 500 separate logical reads, but since we will be reusing blocks a lot, this might not be too bad). If we are unlucky and the table and index are out of sync as far as ordering is concerned, we will have to read A LOT of table blocks. The really bad part here is that unless your data is pretty small (or, alternatively, memory is very large) it is likely that significant portion of table blocks will NOT be cached and thus we are slowing ourselves down even further.

Think of it this way: in our (otherwise efficient) index range scan for 500 records, 500 out of 1002 logical reads (or ~ 50%) comes from step 3. This, by the way is a highly skewed result, caused by the fact that for this exercise we built indexes with PCTFREE 99. Typical ratio for Step 3 is much worse, usually as high as 75-95 % of all the reads.

SELECT * FROM ( SELECT name, population FROM cities c WHERE state='Florida' ORDER BY population DESC) WHERE rownum <= 500/

500 rows selected.

----------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 10 | 650 | 22 (0)| 00:00:01 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 10 | 650 | 22 (0)| 00:00:01 || 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 1485 | 44550 | 22 (0)| 00:00:01 ||* 4 | INDEX RANGE SCAN DESCENDING| I_STATE_POP | 10 | | 12 (0)| 00:00:01 |----------------------------------------------------------------------------------------------




COVERING INDEXES

15 Session # 403

Track: Database

The good part here is that Step 3 is entirely optional. As long as all the data that is required by the query is already in the index, we do not need it.

SQL> CREATE INDEX i_state_pop_c on cities (state, population, name) pctfree 99;

Index created.

SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida' ORDER BY population DESC) WHERE rownum <= 5/

-----------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |-----------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 5 | 325 | 13 (0)| 00:00:01 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 10 | 650 | 13 (0)| 00:00:01 ||* 3 | INDEX RANGE SCAN DESCENDING| I_STATE_POP_C | 1485 | 44550 | 13 (0)| 00:00:01 |-----------------------------------------------------------------------------------------------




As we’ve predicted, having all data required by the query in the same index, makes a lot of sense.

IDEAL TOP-N

Paraphrasing a well known data modeling motto, the ideal top-N query is achieved when you:

1) Use the index2) Make the best index3) And read only from the index

LESS THAN IDEAL TOP-N

16 Session # 403

Track: Database

Having ideal situation to support your top-N queries (or anything) is, well, ideal Unfortunately, it does not always happen in a real world.

There are several cases where top-N or pagination scenario becomes less than ideal and I’m going to talk about 4 notable ones:

1) Effect of query conditions2) Effect of DESC/ASC3) Effect of deletes/updates4) Driver “technicalities”

EFFECT OF QUERY CONDITIONS

Here is a simple test case. Let’s say that you have a table with an ACTIVE column, which can only take 2 values: ‘Y’ or ‘N’. Suppose, you need to get the first 5 ‘ACTIVE’ records (ordered by, say, a sequence).

Would it matter if we select these records as ACTIVE=’Y’ or ACTIVE != ‘N’

CREATE TABLE t (n, active NOT NULL CHECK (active IN ('Y', 'N')))PCTFREE 99 PCTUSED 1 AS SELECT level, CASE WHEN 0 = mod(level, 10) THEN 'Y' ELSE 'N' ENDFROM dual CONNECT BY level <= 10000/

Table created.

SQL> CREATE INDEX t_i ON t(active, n) PCTFREE 99;

Index created.

SELECT * FROM ( SELECT * FROM t WHERE active = 'Y' ORDER BY n) WHERE rownum <= 5/

5 rows selected.

Elapsed: 00:00:00.01

---------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |---------------------------------------------------------------------------

17 Session # 403

Track: Database

| 0 | SELECT STATEMENT | | 5 | 80 | 13 (0)| 00:00:01 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 10 | 160 | 13 (0)| 00:00:01 ||* 3 | INDEX RANGE SCAN| T_I | 10 | 60 | 13 (0)| 00:00:01 |---------------------------------------------------------------------------


1 - filter(ROWNUM<=5) 3 - access("ACTIVE"='Y')


SELECT * FROM ( SELECT * FROM t WHERE active != 'N' ORDER BY n) WHERE rownum <= 5/

5 rows selected.

Elapsed: 00:00:00.01

--------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 5 | 80 | 444 (1)| 00:00:06 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 1000 | 16000 | 444 (1)| 00:00:06 ||* 3 | SORT ORDER BY STOPKEY| | 1000 | 6000 | 444 (1)| 00:00:06 ||* 4 | TABLE ACCESS FULL | T | 1000 | 6000 | 443 (1)| 00:00:06 |--------------------------------------------------------------------------------


1 - filter(ROWNUM<=5) 3 - filter(ROWNUM<=5) 4 - filter("ACTIVE"<>'N')


18 Session # 403

Track: Database

It matters a great deal to ORACLE. Equality condition usually means a very specific and narrow range of records that optimizer can access and read directly, while inequality means “anything but that range”, which usually translates into reading the entire data set and filtering out unneeded data. The latter means that if we order on a column below the filter, we have to read everything and sort. You can argue that in this case ACTIVE != ‘N’ is equivalent to ACTIVE=’Y’ due to the way that we defined the table, but ORACLE optimizer does not see it that way (yet).

Bottom line: if in doubt, always choose equality.

EFFECT OF DESC/ASC

Let’s say we need to select top 10 most populous cities in all states from Florida forwards, or STATE >= ‘Florida’ (which, admittedly is an unusual request, but anything to prove a point).

SQL> CREATE INDEX i_s_pop ON cities(state, population) PCTFREE 99;

Index created.

SELECT * FROM ( SELECT * FROM cities WHERE state >= 'Florida' ORDER BY state, population DESC) WHERE rownum <= 10/

10 rows selected.

Elapsed: 00:00:04.69

Execution Plan----------------------------------------------------------Plan hash value: 2951925630

------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 10 | 1170 | | 14132 (1)| 00:03:04 ||* 1 | COUNT STOPKEY | | | | | | || 2 | VIEW | | 60040 | 6860K| | 14132 (1)| 00:03:04 ||* 3 | SORT ORDER BY STOPKEY| | 60040 | 1758K| 2368K| 14132 (1)| 00:03:04 ||* 4 | TABLE ACCESS FULL | CITIES | 60040 | 1758K| | 13646 (1)| 00:02:58 |------------------------------------------------------------------------------------------


19 Session # 403

Track: Database

1 - filter(ROWNUM<=10) 3 - filter(ROWNUM<=10) 4 - filter("STATE">='Florida')


What happened? Why is there a full table scan here?

Because we are using DESC, our index data in the range of STATE >= ‘Florida’ is no longer ordered by population. Yes, it is ordered by population (individually) for ‘Florida’ and it is ordered by population (individually) in ‘Colorado’, but with the combined range: No.

Yet, if we remove DESC requirement on population, our range is ordered by STATE, POPULATION again.

SELECT * FROM ( SELECT * FROM cities WHERE state >= 'Florida' ORDER BY state, population) WHERE rownum <= 10/

10 rows selected.

Elapsed: 00:00:00.43

-----------------------------------------------------------------------------------------

20 Session # 403

AKAL FL MA WA

… …GA HI

+ SORT

AKAL FL MA WA

… …GA HI

NO SORT

Track: Database

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |-----------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 10 | 1170 | 26 (0)| 00:00:01 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 12 | 1404 | 26 (0)| 00:00:01 || 3 | TABLE ACCESS BY INDEX ROWID| CITIES | 12 | 360 | 26 (0)| 00:00:01 ||* 4 | INDEX RANGE SCAN | I_S_POP | | | 14 (0)| 00:00:01 |-----------------------------------------------------------------------------------------


1 - filter(ROWNUM<=10) 4 - access("STATE">='Florida')


Lesson to learn here: watch out for DESC/ASC and build indexes appropriately. In this case, an index on: (STATE, POPULATION DESC) would help the query.

EFFECT OF DELETES OR UPDATES

Even if indexes and top-N queries are efficient originally, the inevitable wear and tear with time cuts into it. The very common factor that affects efficiency of top-N and pagination queries is the effect of deletes and updates.

Let’s look at DELETE operation first as it is the simplest of the two.

When a record gets deleted, its entry is marked ‘empty’ in both table and all corresponding index structures. Empty space in a (heap) table can be reused immediately (with some caveats on block PCTUSED etc) by ANY new record that gets inserted into the table. Not so with an index, since index is a structure that enforces order. Empty space in an index can get reused only if new data ‘fits’ within that space (by order). Thus, index ‘holes’ are much more difficult to fill in, unless you are constantly reinserting the same data.

As for UPDATES - they actually run very different in indexes and tables.

In tables, an update would modify the target record in place and unless something drastic happens, the record will likely stay in the same data block.

It is entirely different story in an index: since index structure is ordered, updated index entry HAS TO move to the new place that fits in a sorted order, very likely to a different data block.

21 Session # 403

Track: Database

Thus UPDATE in an index is really: DELETE+INSERT, which leaves a ‘hole’ in the original index block, just like DELETE would.

Bottom line is: as more and more ‘holes’ accumulate in an index, top-N queries become progressively less efficient.

Let’s look at the example.

SQL> CREATE TABLE cities2 (name, state, population, budget_surplus)PCTFREE 99 PCTUSED 1AS SELECT name, state, population, 'Y' FROM cities/

Table created.

SQL> CREATE INDEX i2_pop ON cities2(budget_surplus, population, name) PCTFREE 99/

Index created.

SELECT * FROM ( SELECT name, population FROM cities2 WHERE budget_surplus='Y' ORDER BY population DESC) WHERE rownum <= 5/

5 rows selected.

Elapsed: 00:00:00.00

----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 5 | 325 | 14 (0)| 00:00:01 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 12 | 780 | 14 (0)| 00:00:01 ||* 3 | INDEX RANGE SCAN DESCENDING| I2_POP | 75720 | 1774K| 14 (0)| 00:00:01 |----------------------------------------------------------------------------------------


1 - filter(ROWNUM<=5) 3 - access("BUDGET_SURPLUS"='Y')

Statistics

22 Session # 403

Track: Database

---------------------------------------------------------- 7 consistent gets

-- This statement, by the way is executed very inefficiently, ignoring top-N-- optimization. I haven’t been able to figure it out yetUPDATE cities2 SET budget_surplus='N' WHERE rowid IN ( SELECT r FROM ( SELECT rowid r FROM cities2 WHERE budget_surplus='Y' ORDER BY population DESC) WHERE rownum <= 200);

200 rows updated.

commit;

SELECT * FROM ( SELECT name, population FROM cities2 WHERE budget_surplus='Y' ORDER BY population DESC) WHERE rownum <= 5/

5 rows selected.

Elapsed: 00:00:00.01

----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 5 | 325 | 14 (0)| 00:00:01 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 12 | 780 | 14 (0)| 00:00:01 ||* 3 | INDEX RANGE SCAN DESCENDING| I2_POP | 75720 | 1774K| 14 (0)| 00:00:01 |----------------------------------------------------------------------------------------


1 - filter(ROWNUM<=5) 3 - access("BUDGET_SURPLUS"='Y')


During the first execution of the statement, we are looking at top 5 most populous cities that have a budget surplus. The execution is very efficient because we can quickly find the records.

23 Session # 403

Track: Database

We then remove 200 records from the range, which leave 200 holes in the index.

When we re-run our top-5 query now we suddenly have to read a lot more blocks (through holes) to get to the data that we need.

You might point out that empty blocks would get eventually reused by ORACLE, which would resolve this issue and you would, of course, be right. However, there are a couple of caveats here.

First of all, only the blocks that are completely empty will get reused. In this (very artificial) case, most of them are, but in a real world where deletes and updates tend to be more random, even one valid key entry remaining in the block will prevent it from being reused.

Also, even if the block is completely empty, it can be reused only after we insert a new key to the index AND ORACLE happens to pick up this block from the free list chain. In other words, if the data is fairly static, completely empty blocks can stay “inside the range” for quite a while.

So, what is the solution?

Well, even though it pains me to say it, you can see how an index rebuild (or coalesce) will actually help here by packing up index entries and removing holes. The problem, of course, is that rebuilds or coalesces are BIG operations that can run for hours (days?) on real production databases and will affect the system pretty significantly.

Is there a better way? Can we resolve the problem in minutes rather than hours?

There might be a better way if we “narrow down” the fix to specific sub range of the index (i.e. data for “only one customer”) and are willing to tolerate a bit of dead space in the index.

Here is how it works: we need to “version” our index sub trees. I.e. if originally, we had an index on:

BUDGET_SURPLUSPOPULATION

24 Session # 403

Track: Database

“Versioned” index looks like:

BUDGET_SURPLUSVERSIONPOPULATION

SQL> ALTER TABLE cities2 ADD (version number DEFAULT 0 NOT NULL);

Table altered.

SQL> CREATE INDEX i_pop_v ON cities2 (budget_surplus, version, population) pctfree 99;

Index created.

-- Let’s run an original query, slightly modifying it to accept verion=0SQL> SELECT * FROM ( SELECT name, population FROM cities2 WHERE budget_surplus='Y' AND version=0 ORDER BY population DESC) WHERE rownum <= 5/

Elapsed: 00:00:00.01

------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 5 | 325 | 12 (0)| 00:00:01 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 11 | 715 | 12 (0)| 00:00:01 || 3 | TABLE ACCESS BY INDEX ROWID | CITIES2 | 757 | 24224 | 12 (0)| 00:00:01 ||* 4 | INDEX RANGE SCAN DESCENDING| I_POP_V | 4 | | 7 (0)| 00:00:01 |------------------------------------------------------------------------------------------


1 - filter(ROWNUM<=5) 4 - access("BUDGET_SURPLUS"='Y' AND "VERSION"=0)


-- Let’s now run the update that screws things upSQL> UPDATE cities2 SET budget_surplus='N' WHERE rowid IN (

25 Session # 403

Track: Database

SELECT r FROM ( SELECT rowid r FROM cities2 WHERE budget_surplus='Y' ORDER BY population DESC) WHERE rownum <= 200);

200 records updated.

SQL> commit;

Commit complete.

-- And verify that the problem with over reading the range appearsSELECT * FROM ( SELECT name, population FROM cities2 WHERE budget_surplus='Y' AND version=0 ORDER BY population DESC) WHERE rownum <= 5/

Elapsed: 00:00:00.09





-- And it does: we are now reading 212 blocks instead of 14-- Let’s, “fix” the problem with a version:

-- This query affects the entire table, but it is not a requirement-- We can easily construct better WHERE conditions in real lifeSQL> UPDATE cities2 SET version=1 WHERE budget_surplus='Y' AND version=0;

-- And now we use version 1, where index data is already compacted

26 Session # 403

Track: Database

SQL> SELECT * FROM ( SELECT name, population FROM cities2 WHERE budget_surplus='Y' AND version=1 ORDER BY population DESC) WHERE rownum <= 5/

Elapsed: 00:00:00.18





As you can see, newly “versioned” data is compact enough that reads returned to (almost) pre-update level, so this process is a great way to resolve “a hole” problem for a specific part of the index (i.e. “only one customer” or “a few orders”). It normally can be done much faster than, say, index rebuild and, of course, it is a plain DML, so it can be done online.

The one thing you probably noticed is that we supplied version number in our query explicitly. How did we know it? In some cases you just do, or, we can keep version numbers elsewhere, in a simpler table or entirely outside ORACLE.

It is also a fairly lightweight modification to extract current version number using SELECT max() query, i.e.:

SELECT max(version) FROM cities2 WHERE budget_surplus='Y'/

PAGINATION

27 Session # 403

Track: Database

If ORACLE 11g top-N query looks a bit ugly, a traditional 11g pagination query looks a bit uglier still.

SELECT * FROM ( SELECT * FROM ( SELECT name, population, rownum AS rn FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 20) WHERE rn > 10/

Notice that we are now dealing with 3 queries:

1) The inner query that restricts and orders data2) The intermediate query that restricts the upper bound of pagination window (WHERE

rownum <= 20)3) The outer query that restricts the lower bound of pagination window (WHERE rn >10)

Let’s look through a couple of data pages to see how this query executes.



SELECT * FROM ( SELECT * FROM ( SELECT name, population, rownum AS rn FROM cities WHERE state='Florida' ORDER BY population DESC

28 Session # 403

Track: Database

) WHERE rownum <= 30) WHERE rn > 20/




Notice a curious thing here: as we move along to later windows, even though our window size remains the same (that is: we are getting 10 records every time), we have to read more and more data to get them.

This is because, with this type of pagination query, ORACLE does not know exactly where the window starts. It only knows where the start of “all windows” is (as defined by WHERE conditions).

In other words, if we need to access page 1, this query is super efficient as ORACLE descends to the actual window start and reads only page 1 records. When we need page 2, ORACLE is a bit less efficient: it will still descend to the start of page 1, read the leaf blocks for page 1 and page 2 and then throw away blocks for page 1.When we need page 5000, well, you can see where I am going with this.

Because of this waste effect, this type of pagination query is sometimes called “dumb pagination”.

SMART PAGINATION

How can we make pagination query efficient every time?

29 Session # 403

Track: Database

Remember that the reason that “dump pagination” gets inefficient is because ORACLE does not know where the actual data page (that we are retrieving) starts and has to resort to counting records to find it. What if we could somehow supply “page start” information to ORACLE?

This is easier to do than you think, once you realize that one almost never requests an individual page (say, page 20) in the middle of the range, but rather pages are accesses in succession. First page 1, then page 2, page 3 etc

The great insight here is that previous page “knows” where it ends (and thus, the next one begins) and we can supply this information back to the database.

SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 5/

NAME POPULATION-------------------- ----------Jacksonville city 821784Jacksonville city 821784Miami city 399457Miami city 399457Tampa city 335709

Statistics---------------------------------------------------------- 12 consistent gets Notice that ‘Tampa city’ is the last item on this page, but more importantly with POPULATION=335709.Let’s feed this information back to the database and request a 2nd page:

SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida' AND population < 335709 ORDER BY population DESC ) WHERE rownum <= 5/

NAME POPULATION

30 Session # 403

Track: Database

-------------------- ----------St. Petersburg city 244769St. Petersburg city 244769Orlando city 238300Orlando city 238300Hialeah city 224669

------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 5 | 325 | 23 (0)| 00:00:01 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 10 | 650 | 23 (0)| 00:00:01 || 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 61 | 1830 | 23 (0)| 00:00:01 ||* 4 | INDEX RANGE SCAN DESCENDING| I_S_POP | 10 | | 13 (0)| 00:00:01 |------------------------------------------------------------------------------------------


1 - filter(ROWNUM<=5) 4 - access("STATE"='Florida' AND "POPULATION"<335709)


Notice that we are reading essentially the same number of blocks to answer “page 2” query vs. “page 1” query (1 additional read in this case is the artifact of a bit random data layout caused by PCTFREE 99).

And the reads will stay constant for all subsequent pages. You can easily prove it to yourself by supplying a random number to compare with POPULATION).

Of course, there is a slight problem with the “page 1” query (“what should be the MAX population +1”), but it is easily solvable if you know your data and get a little creative. I.e. I’m fairly certain that we do not have any cities in the US exceeding 1 billion inhabitants.

POPULATION in this case is what is sometimes called a “pagination token” and the process of applying tokens is known as “tokenized pagination”, or, to put it simply a “smart pagination”.

One common question that many people ask is: “what happens if I have duplicate entries in my pagination tokens”, i.e. what would happen if 10 cities have the same exact population and it just happens to be at the edge of a page?

There are several workarounds here:

31 Session # 403

Track: Database

1) First of all, you can select a bigger range, change < to <= and add some additional logic to the application

2) Or, you can simply chose guaranteed unique “sequence based” column from the table (surrogate primary keys are a prime example here) and use it as a pagination token.

TOP-N WITH JOINS

Even though so far we’ve only looked at “single table” Top-N and pagination queries, the pattern can also be applied to joins, although you have to be very careful how joins are executed. Let’s look at an example.

First of all, we’ll construct a “second table”. Let’s not be too sophisticated here and just make up something usable for an example from the data that we already have, i.e. STATES table.

CREATE TABLE states(state NOT NULL, capital NOT NULL) pctfree 99 pctused 1 AS SELECT state, max(name) FROM cities GROUP BY state/

Table created.

-- And add an index to look for the state

CREATE INDEX s_idx ON states(state, capital) pctfree 99/

Index created.

Let’s now run our top-N query that will select top-5 cities in Florida AND their state capital.

SELECT * FROM ( SELECT /*+ leading(c) use_nl(s) */ c.name as city, c.state, c.population, s.capital FROM cities c, states s WHERE c.state = s.state AND c.state='Florida' ORDER BY c.state, c.population DESC) WHERE rownum <= 5/

CITY STATE POPULATION CAPITAL-------------------- --------------- ---------- ------------------------------Jacksonville city Florida 821784 Zolfo Springs townJacksonville city Florida 821784 Zolfo Springs townMiami city Florida 399457 Zolfo Springs town

32 Session # 403

Track: Database

Miami city Florida 399457 Zolfo Springs townTampa city Florida 335709 Zolfo Springs town

5 rows selected.

------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 5 | 845 | 23 (0)| 00:00:01 ||* 1 | COUNT STOPKEY | | | | | || 2 | VIEW | | 10 | 1690 | 23 (0)| 00:00:01 || 3 | NESTED LOOPS | | 10 | 550 | 23 (0)| 00:00:01 ||* 4 | INDEX RANGE SCAN DESCENDING| I_STATE_POP_C | 1485 | 44550 | 13 (0)| 00:00:01 ||* 5 | INDEX RANGE SCAN | S_IDX | 1 | 25 | 1 (0)| 00:00:01 |------------------------------------------------------------------------------------------------


1 - filter(ROWNUM<=5) 4 - access("C"."STATE"='Florida') 5 - access("S"."STATE"='Florida')


And, apart from the fact that state capital of Florida is apparently “Zolfo Springs Town”, I think you can agree that this query was very efficient.

There are a number of requirements for successful top-N queries with joins that you need to follow and here are some of the main ones:

1) All the sorting has to come from one table (more precisely: one index). This means that in ORDER BY there can only be columns from the join leading table (the only exception to that is the join columns themselves)

2) You have to use NESTED LOOPS join type. It is the only join type that can stop after reading N rows that qualify.

3) Indexes on a leading table must be built as: <Index filters (WHERE)>,<Order By>,<Join columns>,<Other baggage, i.e. select>

4) Indexes on “table 2” can be built a few different ways as long as they work efficiently with NESTED LOOPS coming from “table 1” (in our example, SELECT state, capital FROM states WHERE state=:x, which requires index on either <State> or, more efficiently on <State>,<Capital>)

33 Session # 403

SQL Top-N and Pagination Pattern (IOUG) - Whitepaper

Technology

Transcript of SQL Top-N and Pagination Pattern (IOUG) - Whitepaper