Pages

OracleEBSpro is purely for knowledge sharing and learning purpose, with the main focus on Oracle E-Business Suite Product and other related Oracle Technologies.

I'm NOT responsible for any damages in whatever form caused by the usage of the content of this blog.

I share my Oracle knowledge through this blog. All my posts in this blog are based on my experience, reading oracle websites, books, forums and other blogs. I invite people to read and suggest ways to improve this blog.


Wednesday, March 6, 2013

Introduction to 8i analytic functions


Analytic functions were introduced in Release 2 of 8i and simplify greatly the means by which pivot reports and OLAP queries can be computed in straight, non-procedural SQL. Prior to the introduction of analytic functions, complex reports could be produced in SQL by complex self-joins, sub-queries and inline-views but these were resource-intensive and very inefficient. Furthermore, if a question to be answered was too complex, it could be written in PL/SQL, which by its very nature is usually less efficient than a single SQL statement.
There are three types of SQL extensions that fall under the banner of "analytic functions" though the first could be said to provide "analytic functionality" rather than actually be analytic functions:
  • new grouping of resultsets through extensions to the GROUP BY clause (ROLLUP and CUBE);
  • the new analytic functions themselves; and
  • TOP-N analysis (largely enabled by the analytic functions).
Each of these will be dealt with in turn.

grouping extensions (rollup and cube)

Extensions to the GROUP BY clause enable pre-computed resultsets, summaries and aggregations to be supplied from within the Oracle server itself, rather than by a tool such as SQL*Plus. An example follows.
Example One: Sums the salaries by job and then sub-totals each department (similar to a break sum report in SQL*Plus) and then the entire salary column.
SELECT deptno
,      job
,      SUM(sal)
FROM   emp
GROUP  BY ROLLUP(deptno,job);

    DEPTNO JOB         SUM(SAL)
---------- --------- ----------
        10 CLERK           1300
        10 MANAGER         2450
        10 PRESIDENT       5000
        10                 8750
        20 ANALYST         6000
        20 CLERK           1900
        20 MANAGER         2975
        20                10875
        30 CLERK            950
        30 MANAGER         2850
        30 SALESMAN        5600
        30                 9400
                          29025
Example Two: Sums the salaries by job and then sub-totals each department and each job type (similar to a break sum report in SQL*Plus) and then the entire salary column. This will provide sub-totals for all columns within the GROUP BY clause.
SELECT deptno
,      job
,      SUM(sal)
FROM   emp
GROUP  BY CUBE(deptno,job);

    DEPTNO JOB         SUM(SAL)
---------- --------- ----------
        10 CLERK           1300
        10 MANAGER         2450
        10 PRESIDENT       5000
        10                 8750
        20 ANALYST         6000
        20 CLERK           1900
        20 MANAGER         2975
        20                10875
        30 CLERK            950
        30 MANAGER         2850
        30 SALESMAN        5600
        30                 9400
           ANALYST         6000
           CLERK           4150
           MANAGER         8275
           PRESIDENT       5000
           SALESMAN        5600
                          29025
Example Three: Using the GROUPING function to "label" the sub-total rows (i.e. determine which rows represented rollups). The GROUPING function returns a value of 1 if the current row is a row representing an aggregated rollup group (such as the sub-total rows) or zero if the row is one of the "source" records itself.
SELECT DECODE(GROUPING(deptno),1,'All Departments',deptno) AS deptno
,      DECODE(GROUPING(job),1,'Job Total',job) AS job
,      SUM(sal)
FROM   emp
GROUP  BY CUBE(deptno,job);

DEPTNO                                   JOB         SUM(SAL)
---------------------------------------- --------- ----------
10                                       CLERK           1300
10                                       MANAGER         2450
10                                       PRESIDENT       5000
10                                       Job Total       8750
20                                       ANALYST         6000
20                                       CLERK           1900
20                                       MANAGER         2975
20                                       Job Total      10875
30                                       CLERK            950
30                                       MANAGER         2850
30                                       SALESMAN        5600
30                                       Job Total       9400
All Departments                          ANALYST         6000
All Departments                          CLERK           4150
All Departments                          MANAGER         8275
All Departments                          PRESIDENT       5000
All Departments                          SALESMAN        5600
All Departments                          Job Total      29025

analytic functions

There are 33 new analytic functions, though it is likely that most users will concentrate on a small number of these and very few will use the statistical capabilities. They are:
  • AVG, MIN, MAX, COUNT, SUM
  • LAG, LEAD
  • FIRST_VALUE, LAST_VALUE
  • ROW_NUMBER
  • RANK, DENSE_RANK, PERCENT_RANK
  • NTILE, CUME_DIST, RATIO_TO_REPORT
  • CORR
  • COVAR_POP
  • VARIANCE, VAR_POP, VAR_SAMP
  • STTDEV, STTDEV_POP, STDDEV_SAMP
  • REGR_COUNT, REGR_P2, REGR_AVGX, REGR_AVGY, REGR_SXX, REGR_SXY, REGR_SYY, REGR_SLOPE, REGR_INTERCEPT
The way analytic functions work is to manipulate data contained within returning resultsets. This means they can process, merge and compute against data that has already been fetched from a query and partition and order the resultset into groups while at the same time returning the entire resultset without GROUP BY clauses. There follows a range of examples that demonstrate the analytic functions that, in my opinion, will be used most commonly. The first is a simple running total of salaries within each of an organization's departments.
Example Four: Calculate a running salary total for each department as new employees were hired.
SELECT empno
,      deptno
,      sal
,      SUM(sal) OVER
          (PARTITION BY deptno
           ORDER BY ename ASC NULLS LAST) AS department_running_total
,      ROW_NUMBER() OVER
          (PARTITION BY deptno
           ORDER BY ename ASC NULLS LAST) AS employees_in_running_total
FROM   emp
ORDER  BY
       deptno
,      ename;

ENAME          DEPTNO        SAL DEPARTMENT_RUNNING_TOTAL EMPLOYEES_IN_RUNNING_TOTAL
---------- ---------- ---------- ------------------------ --------------------------
CLARK              10       2450                     2450                          1
KING               10       5000                     7450                          2
MILLER             10       1300                     8750                          3

ADAMS              20       1100                     1100                          1
FORD               20       3000                     4100                          2
JONES              20       2975                     7075                          3
SCOTT              20       3000                    10075                          4
SMITH              20        800                    10875                          5

ALLEN              30       1600                     1600                          1
BLAKE              30       2850                     4450                          2
JAMES              30        950                     5400                          3
MARTIN             30       1250                     6650                          4
TURNER             30       1500                     8150                          5
WARD               30       1250                     9400                          6
This simple example demonstrates the power of analytic functions - they can split resultsets into working groups to compute, order and aggregate data. The above example would be considerably more complex with standard SQL, and would require something like three scans of the EMP table instead of the one scan with the above example.
Analytic functions are invoked using the OVER() clause. This also enables Oracle to distinguish between PL/SQL functions and analytic functions that share the same name such as AVG, MIN and MAX.
There are three components to the OVER clause:
  • PARTITION clause, by which the resultset can be broken into groups, such as departments in the example above. Without this the entire resultset is treated as a single partition;
  • ORDER BY clause, by which the resultset or partition group can be ordered. This is optional for some analytic functions but mandatory for those which need to access rows either side of the current row, such as LAG and LEAD; and
  • RANGE or ROWS clause (AKA windowing), by which the function can be made to include rows or values around the current row in its calculations. RANGE windows work on values and ROWS windows work on records, such as either X rows on each side of the current row or all rows preceding the current row, within the current partition.
The PARTITION and ORDER BY clauses are demonstrated in the first example above. The resultset was partitioned into the individual departments in the organization. Within each department, the data was ordered by ename (using default criteria (ASC and NULLS LAST). No RANGE clause was added which means that we used the default of RANGE UNBOUNDED PRECEDING, which means include all the preceding records in the current partition in the calculation for the current row. The easiest way to understand analytic functions and windowing is by examples which demonstrate the each of the three components to the OVER() clause.
Example Five: Find the average salary by department and compare each employees' salaries to the department average.
SELECT deptno
,      ename
,      sal
,      ROUND(average_sal_dept,0) AS average_sal_dept
,      ROUND(sal - average_sal_dept,0) AS sal_variance
FROM  (SELECT deptno
       ,      ename
       ,      sal
       ,      AVG(sal) OVER
                 (PARTITION BY deptno) AS average_sal_dept
       FROM   emp);

    DEPTNO ENAME             SAL AVERAGE_SAL_DEPT SAL_VARIANCE
---------- ---------- ---------- ---------------- ------------
        10 CLARK            2450             2917         -467
        10 KING             5000             2917         2083
        10 MILLER           1300             2917        -1617
        20 SMITH             800             2175        -1375
        20 ADAMS            1100             2175        -1075
        20 FORD             3000             2175          825
        20 SCOTT            3000             2175          825
        20 JONES            2975             2175          800
        30 ALLEN            1600             1567           33
        30 BLAKE            2850             1567         1283
        30 MARTIN           1250             1567         -317
        30 JAMES             950             1567         -617
        30 TURNER           1500             1567          -67
        30 WARD             1250             1567         -317
Example Six: Determine the order by which employees joined their respective departments. Also include the employees who preceded and succeeded them.
SELECT deptno
,      ename
,      hiredate
,      LAG(ename,1,NULL) OVER
          (PARTITION BY deptno
           ORDER BY hiredate ASC NULLS LAST) AS previous_employee_ename
,      LEAD(ename,1,NULL) OVER
           (PARTITION BY deptno
            ORDER BY hiredate ASC NULLS LAST) AS next_employee_ename
FROM   emp
ORDER  BY
       deptno;

    DEPTNO ENAME      HIREDATE    PREVIOUS_EMPLOYEE_ENAME        NEXT_EMPLOYEE_ENAME
---------- ---------- ----------- ------------------------------ ---------------------
        10 CLARK      09-JUN-1981                                KING
        10 KING       17-NOV-1981 CLARK                          MILLER
        10 MILLER     23-JAN-1982 KING
        20 SMITH      17-DEC-1980                                JONES
        20 JONES      02-APR-1981 SMITH                          FORD
        20 FORD       03-DEC-1981 JONES                          SCOTT
        20 SCOTT      09-DEC-1982 FORD                           ADAMS
        20 ADAMS      12-JAN-1983 SCOTT
        30 ALLEN      20-FEB-1981                                WARD
        30 WARD       22-FEB-1981 ALLEN                          BLAKE
        30 BLAKE      01-MAY-1981 WARD                           TURNER
        30 TURNER     08-SEP-1981 BLAKE                          MARTIN
        30 MARTIN     28-SEP-1981 TURNER                         JAMES
        30 JAMES      03-DEC-1981 MARTIN
Note: LAG() and LEAD() provide access to rows around the current row, something that was very difficult to achieve prior to 8.1.6. The functions take 3 parameters - expression to be returned from the LAG/LEAD row, the LAG/LEAD offset from the current row, and the value to be returned if the offset is beyond the partition window.
Example Seven: Determine the proportion of each department's salary taken up by its individual employees.
SELECT deptno
,      ename
,      sal
,      dept_sal
,      ROUND(employees_dept_ratio*100,2) AS emps_proportion
FROM  (SELECT deptno
       ,      ename
       ,      sal
       ,      SUM(sal) OVER
                 (PARTITION BY deptno) AS dept_sal
       ,      RATIO_TO_REPORT(sal) OVER
                 (PARTITION BY deptno) AS employees_dept_ratio
       FROM   emp)
ORDER  BY
       deptno;
       
    DEPTNO ENAME             SAL   DEPT_SAL EMPS_PROPORTION
---------- ---------- ---------- ---------- ---------------
        10 CLARK            2450       8750              28
        10 KING             5000       8750           57.14
        10 MILLER           1300       8750           14.86
        20 SMITH             800      10875            7.36
        20 ADAMS            1100      10875           10.11
        20 FORD             3000      10875           27.59
        20 SCOTT            3000      10875           27.59
        20 JONES            2975      10875           27.36
        30 ALLEN            1600       9400           17.02
        30 BLAKE            2850       9400           30.32
        30 MARTIN           1250       9400            13.3
        30 JAMES             950       9400           10.11
        30 TURNER           1500       9400           15.96
        30 WARD             1250       9400            13.3
Example Eight: RANGE windowing. Determine the first and last employee to be employed within 50 days of the current employees' hiredate.
SELECT ename
,      hiredate
,      FIRST_VALUE(ename) OVER
          (ORDER BY hiredate ASC NULLS LAST
           RANGE BETWEEN 50 PRECEDING AND 50 FOLLOWING) AS first_employee
,      LAST_VALUE(ename) OVER
          (ORDER BY hiredate ASC NULLS LAST
           RANGE BETWEEN 50 PRECEDING AND 50 FOLLOWING) AS last_employee
FROM   emp
ORDER  BY
       hiredate;
       
ENAME      HIREDATE    FIRST_EMPL LAST_EMPLO
---------- ----------- ---------- ----------
SMITH      17-DEC-1980 SMITH      SMITH
ALLEN      20-FEB-1981 ALLEN      JONES
WARD       22-FEB-1981 ALLEN      JONES
JONES      02-APR-1981 ALLEN      BLAKE
BLAKE      01-MAY-1981 JONES      CLARK
CLARK      09-JUN-1981 BLAKE      CLARK
TURNER     08-SEP-1981 TURNER     MARTIN
MARTIN     28-SEP-1981 TURNER     KING
KING       17-NOV-1981 MARTIN     JAMES
FORD       03-DEC-1981 KING       JAMES
JAMES      03-DEC-1981 KING       JAMES
MILLER     23-JAN-1982 MILLER     MILLER
SCOTT      09-DEC-1982 SCOTT      ADAMS
ADAMS      12-JAN-1983 SCOTT      ADAMS
Example Nine: ROWS windowing. Determine who was recruited two employees before and three after the current employee (note what happens when employees share hiredates).
SELECT ename
,      hiredate
,      FIRST_VALUE(ename) OVER
          (ORDER BY hiredate ASC NULLS LAST
           ROWS 2 PRECEDING) AS two_employees_back
,      FIRST_VALUE(ename) OVER
          (ORDER BY hiredate DESC NULLS LAST
           ROWS 3 PRECEDING) AS three_employees_forward
FROM   emp
ORDER  BY
       hiredate;
       
ENAME      HIREDATE    TWO_EMPLOYEES_BACK        THREE_EMPLOYEES_FORWARD
---------- ----------- ------------------------- -------------------------
SMITH      17-DEC-1980 SMITH                     JONES
ALLEN      20-FEB-1981 SMITH                     BLAKE
WARD       22-FEB-1981 SMITH                     CLARK
JONES      02-APR-1981 ALLEN                     TURNER
BLAKE      01-MAY-1981 WARD                      MARTIN
CLARK      09-JUN-1981 JONES                     KING
TURNER     08-SEP-1981 BLAKE                     FORD
MARTIN     28-SEP-1981 CLARK                     JAMES
KING       17-NOV-1981 TURNER                    MILLER
JAMES      03-DEC-1981 MARTIN                    ADAMS
FORD       03-DEC-1981 KING                      SCOTT
MILLER     23-JAN-1982 JAMES                     ADAMS
SCOTT      09-DEC-1982 FORD                      ADAMS
ADAMS      12-JAN-1983 MILLER                    ADAMS

top-n queries

These refer to ranked sets of data and are quite commonly requested, such as "Who are our top-n spending customers", "who are our top-n earners" and so on. Oracle 8i provides two ways of providing answers to TOP-N queries; either by the introduction of ORDER BY in in-line views or by analytic function. Prior to these methods being available, TOP-N queries could only be achieved using complex SQL, utilizing self-joins and subqueries.
At its simplest, TOP-N queries can be answered using an ORDER BY in an in-line view and then limiting the number of rows selected from the view. An example follows.
Example Ten: Who were the first three recruits to our organization?
SELECT ROWNUM AS rank
,      ename
,      hiredate
FROM  (SELECT ename
       ,      hiredate
       FROM   emp
       ORDER  BY
              hiredate ASC NULLS LAST)
WHERE  ROWNUM <= 3;

      RANK ENAME      HIREDATE
---------- ---------- -----------
         1 SMITH      17-DEC-1980
         2 ALLEN      20-FEB-1981
         3 WARD       22-FEB-1981
There is ambiguity to the above question, especially in using the above methodology. For example, if five people were recruited on the same day, then how would this question be answered? The ORDER BY in-line view method would generate the employees in no particular order and the stopkey would stop returning rows at record three.
To remove this ambiguity, analytic functions can help. For the following example, I've updated five employees to have the earliest date.
Example Eleven: Who were the first three recruits to our organization?
SELECT hire_rank
,      ename
,      hiredate
FROM  (SELECT ename
       ,      hiredate
       ,      RANK() OVER
                  (ORDER BY HIREDATE ASC NULLS LAST) AS hire_rank
       FROM   emp)
WHERE  hire_rank <= 3;

HIRE_RANK  ENAME      HIREDATE
---------- ---------- -----------
         1 SMITH      01-JAN-1951
         1 ALLEN      01-JAN-1951
         1 WARD       01-JAN-1951
         1 JONES      01-JAN-1951
         1 MARTIN     01-JAN-1951
Technically, this has not answered the actual question but has instead expanded it. In using the RANK() analytic function, the query has returned all the people who joined the organization on the same day, even though this is more than the three people asked for. Note the use of RANK() rather than the DENSE_RANK() function. The RANK() function skips ranking numbers, such that the sixth employee to be returned would be given a rank of 6. DENSE_RANK() would assign the sixth person a rank of 2 as this works on distinct values, rather than rows.
Example Twelve: Determine the ranking of each employees' salary within their departments and within the company as a whole.
SELECT deptno
,      ename
,      sal
,      DENSE_RANK() OVER
          (PARTITION BY deptno
           ORDER BY sal DESC NULLS LAST) AS dept_ranking
,      DENSE_RANK() OVER
          (ORDER BY sal DESC NULLS LAST) AS company_ranking
FROM   emp
ORDER  BY
       deptno;
       
    DEPTNO ENAME             SAL DEPT_RANKING COMPANY_RANKING
---------- ---------- ---------- ------------ ---------------
        10 KING             5000            1               1
        10 CLARK            2450            2               5
        10 MILLER           1300            3               8
        20 SCOTT            3000            1               2
        20 FORD             3000            1               2
        20 JONES            2975            2               3
        20 ADAMS            1100            3              10
        20 SMITH             800            4              12
        30 BLAKE            2850            1               4
        30 ALLEN            1600            2               6
        30 TURNER           1500            3               7
        30 WARD             1250            4               9
        30 MARTIN           1250            4               9
        30 JAMES             950            5              11
Note: The use of DENSE_RANK() and opposed to RANK() means that no rank numbers are skipped. For example, in department 20, SCOTT and FORD have the same salary so they share a dense_rank of 1, while JONES (next highest) has the dense_rank of 2. With RANK(), JONES would be ranked 3, as rank is relative to the number of rows, so the RANK() for SCOTT, FORD and JONES would be 1,1,3 respectively.

analytic functions and pl/sql

In 8i, the analytic functions cannot be used in PL/SQL (in 9i, this is overcome as the SQL and PL/SQL parsers are merged). To work around this, compile your cursor SELECT as a view and SELECT from the view instead of the underlying table in the PL/SQL block or alternatively, use Native Dynamic SQL to send your query to the SQL parser instead of compiling it with the PL/SQL parser (though be careful of this approach if the resulting SQL statement is to be executed many times - NDS necessitates one soft-parse of a statement per execution and as a result can be more "expensive" than the static SQL that could be achieved via the view method.

summary

This introduction demonstrates the power and relative simplicity of the analytic functions. They provide an easy mechanism to compute resultsets that, before 8i, were inefficient, impractical and, in some cases, impossible in "straight SQL".
To the uninitiated, the syntax can at first appear cumbersome, but once you have one or two examples under your belt, you will actively seek opportunities to use them. In addition to their flexibility and power, they are also extremely efficient - this can easily be demonstrated using SQL_TRACE and comparing the analytic functions' performance to the SQL statements that would have been required in the days before 8.1.6. Try it for yourself...

further reading

For an in-depth discussion of analytic functions, see Tom Kyte's "Expert One-on-One Oracle" where he gives Analytic Functions an entire chapter. Also see the February 2002 Oracle magazine for details on 9i extensions to the capabilities. And, of course, the various online documentation stored at OTN, tahiti.oracle.com etc.

Credit goes to the below website(s):

No comments:

Post a Comment