Top-N queries provide a method for limiting the number of rows returned from ordered sets of data. They are extremely useful when you want to return the top or bottom "N" number of rows from a set or when you are paging through data. This article presents several methods to implement Top-N queries.
Related articles.
- Analytic Functions
- RANK, DENSE_RANK, FIRST and LAST Analytic Functions
- FIRST_VALUE and LAST_VALUE Analytic Functions
- LAG and LEAD Analytic Functions
- LISTAGG Analystic Function in 11g Release 2
Setup
First we must create and populate a test table.
DROP TABLE rownum_order_test; CREATE TABLE rownum_order_test ( val NUMBER ); INSERT ALL INTO rownum_order_test INTO rownum_order_test SELECT level FROM dual CONNECT BY level <= 10; COMMIT;
The following query shows we have 20 rows with 10 distinct values.
SELECT val FROM rownum_order_test ORDER BY val; VAL ---------- 1 1 2 2 3 3 4 4 5 5 6 VAL ---------- 6 7 7 8 8 9 9 10 10 20 rows selected. SQL>
What not to do!
The following example shows a common trap people fall into when they don't understand the way the
ROWNUM
pseudocolumn and ORDER BY
clause interact. Let's assume we wanted to return the top 5 values in the ID column. We might decide to order the data by descending ID and pick off the first five rows. That sounds correct, so we go ahead and issue the following query.SELECT val FROM rownum_order_test WHERE rownum <= 5 ORDER BY val DESC; VAL ---------- 5 4 3 2 1 5 rows selected. SQL>
That didn't do what we wanted!
The problem is that the
ROWNUM
assignment is performed prior to the ORDER BY
operation, resulting in potentially random data being returned.Inline View and ROWNUM
The classic Top-N style query uses an ordered inline view to force the data into the correct order, then uses the
ROWNUM
check to limit the data returned.SELECT val FROM (SELECT val FROM rownum_order_test ORDER BY val DESC) WHERE ROWNUM <= 5; VAL ---------- 10 10 9 9 8 5 rows selected. SQL>
As the data is in the desired order before the
ROWNUM
check is performed, we get the result we wanted. Notice that we asked for 5 rows and we got five, even though there is a second row with the value "8".
We can return the 5 smallest values by altering the
ORDER BY
clause to ascending.SELECT val FROM (SELECT val FROM rownum_order_test ORDER BY val) WHERE rownum <= 5; VAL ---------- 1 1 2 2 3 5 rows selected. SQL>
This method can also be used for paging through data, like paged web reports.
SELECT val FROM (SELECT val, rownum AS rnum FROM (SELECT val FROM rownum_order_test ORDER BY val) WHERE rownum <= 8) WHERE rnum >= 4; VAL ---------- 2 3 3 4 4 5 rows selected. SQL>
This looks like it might not perform well, but we can see from the execution plan that Oracle can push the predicates down into the inline views to make them much more efficient, so this is the best way to page through data using SQL.
Execution Plan ---------------------------------------------------------- Plan hash value: 2927523340 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 8 | 208 | 4 (25)| 00:00:01 | |* 1 | VIEW | | 8 | 208 | 4 (25)| 00:00:01 | |* 2 | COUNT STOPKEY | | | | | | | 3 | VIEW | | 100 | 1300 | 4 (25)| 00:00:01 | |* 4 | SORT ORDER BY STOPKEY| | 100 | 1300 | 4 (25)| 00:00:01 | | 5 | TABLE ACCESS FULL | ROWNUM_ORDER_TEST | 100 | 1300 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNUM">=4) 2 - filter(ROWNUM<=8) 4 - filter(ROWNUM<=8)
WITH Clause and ROWNUM
The previous example can be rewritten to use a
WITH
clause in place of the inline view.WITH ordered_query AS (SELECT val FROM rownum_order_test ORDER BY val DESC) SELECT val FROM ordered_query WHERE rownum <= 5; VAL ---------- 10 10 9 9 8 5 rows selected. SQL>
RANK
The
RANK
analytic function assigns a sequential rank for each distinct value in the specified window.SELECT val FROM (SELECT val, RANK() OVER (ORDER BY val DESC) AS val_rank FROM rownum_order_test) WHERE val_rank <= 5; VAL ---------- 10 10 9 9 8 8 6 rows selected. SQL>
At first glance this looks like there may be a problem, but displaying the rank information shows us what is happening.
SELECT val, val_rank FROM (SELECT val, RANK() OVER (ORDER BY val DESC) AS val_rank FROM rownum_order_test) WHERE val_rank <= 5; VAL VAL_RANK ---------- ---------- 10 1 10 1 9 3 9 3 8 5 8 5 6 rows selected. SQL>
From this we can see that duplicate rows are assigned the same rank, followed by a skip in the sequence to keep the rank consistent. Similar to Olympic medal places. This means the
RANK
function doesn't give us the "top N rows" or the "top N distinct values". The number of rows returned is dependent on the number of duplicates in the data.DENSE_RANK
The
DENSE_RANK
analytic function is similar to the RANK
analytic function in that it assigns a sequential rank for each distinct value in the specified window. The difference being the ranks are compacted, so there are no gaps.SELECT val FROM (SELECT val, DENSE_RANK() OVER (ORDER BY val DESC) AS val_rank FROM rownum_order_test) WHERE val_rank <= 5; VAL ---------- 10 10 9 9 8 8 7 7 6 6 10 rows selected. SQL>
Displaying the rank information shows us what is happening.
SELECT val, val_rank FROM (SELECT val, DENSE_RANK() OVER (ORDER BY val DESC) AS val_rank FROM rownum_order_test) WHERE val_rank <= 5; VAL VAL_RANK ---------- ---------- 10 1 10 1 9 2 9 2 8 3 8 3 7 4 7 4 6 5 6 5 10 rows selected. SQL>
Once again, duplicate values are assigned the same rank, but there is no gap in the rank sequence. As a result
DENSE_RANK
always gives us a "top N distinct values" result.ROW_NUMBER
The
ROW_NUMBER
analytic function is similar to the ROWNUM
pseudocolumn in that it assigns a unique number for each row returned, but like all analytic functions its action can be limited to a specific window of data in the result set and based on the order of data in that window. In this simple example using a window of the whole result set it functions the same as the ROWNUM
psuedocolumn.SELECT val FROM (SELECT val, ROW_NUMBER() OVER (ORDER BY val DESC) AS val_row_number FROM rownum_order_test) WHERE val_row_number <= 5; VAL ---------- 10 10 9 9 8 5 rows selected. SQL>
For more information see:
- Analytic Functions
- RANK, DENSE_RANK, FIRST and LAST Analytic Functions
- FIRST_VALUE and LAST_VALUE Analytic Functions
- LAG and LEAD Analytic Functions
- LISTAGG Analystic Function in 11g Release 2
- ROWNUM Pseudocolumn
- RANK
- DENSE_RANK
- subquery_factoring_clause (WITH clause)
- ROW_NUMBER
Credit goes to the below website(s):
http://www.oracle-base.com/articles/misc/top-n-queries.php
Select the Nth highest value from a table
select level, max(sal) from emp
where level=2
connect by prior sal > sal
group by level;
Select the Nth highest value from a table
select level, max(sal) from emp
where level=2
connect by prior sal > sal
group by level;
Select the Nth lowest value from a table
select level, min(sal) from emp
where level=2
connect by prior sal < sal
group by level;
Thanks for your information. Please keep posting.
ReplyDeleteOracle Apps online Training
MS Dynamics AX online Training