Oracle 10g has introduced an extremely useful new group function, COLLECT. This function enables us to aggregate data into a collection, retaining multiple records of data within a single row (like a nested table). One of the main benefits of this function is that it makes "string aggregation" (one of the web's most-requested Oracle technique) very simple. This article will introduce the COLLECT function and then demonstrate how it can be used to aggregate multiple records into a single value (a technique known as "string aggregation").
an overview of the collect function
We'll start by demonstrating the COLLECT function. We'll run a simple query against the ubiquitous EMP table to collect the names of all employees by department.
SQL> SELECT deptno 2 , COLLECT(ename) AS emps 3 FROM emp 4 GROUP BY 5 deptno;
DEPTNO EMPS ---------- ------------------------------------------------------------------------------------ 10 SYSTPXeCjDqbWSqWrshgYrRPR4Q==('CLARK', 'KING') 20 SYSTPXeCjDqbWSqWrshgYrRPR4Q==('SMITH', 'JONES', 'SCOTT', 'ADAMS', 'FORD') 30 SYSTPXeCjDqbWSqWrshgYrRPR4Q==('ALLEN', 'WARD', 'MARTIN', 'BLAKE', 'TURNER', 'JAMES') 40 SYSTPXeCjDqbWSqWrshgYrRPR4Q==('MILLER') 4 rows selected.
Something looks a little unusual here, but ignoring the strange identifier for a moment, we can see that the COLLECT function has aggregated the employee names per department as requested.
system-generated types
Moving on to the strange identifier in the example output, we can see that Oracle has created a collection type to support the COLLECT function. The behaviour is different between 10g releases 1 and 2, so we'll investigate each separately.
10g release 1
For our example EMP query above (executed in a 10.1 database), Oracle has created a supporting type named "SYSTPXeCjDqbWSqWrshgYrRPR4Q==". We can find this in the dictionary as follows.
SQL> SELECT owner 2 , typecode 3 FROM all_types 4 WHERE type_name = 'SYSTPXeCjDqbWSqWrshgYrRPR4Q==';
OWNER TYPECODE ------------------------------ ------------------------------ SYS COLLECTION 1 row selected.
It appears as though Oracle generates a supporting collection type every time it hard-parses a SQL statement that uses the COLLECT function. In Oracle 10.1, the type is created in the SYS schema. If we try to use this new type as follows, we'll find that we cannot.
SQL> SELECT * 2 FROM TABLE( 3 sys."SYSTPXeCjDqbWSqWrshgYrRPR4Q=="('A','B','C') );
sys."SYSTPXeCjDqbWSqWrshgYrRPR4Q=="('A','B','C') ) * ERROR at line 3: ORA-00904: "SYS"."SYSTPXeCjDqbWSqWrshgYrRPR4Q==": invalid identifier
In addition to the burden this must place on the parsing process, we might also be concerned about the number of system-generated types that might start appearing in our database (for reasons unknown, some DBAs and developers are worried about this sort of thing). If we flush the shared pool or even bounce the database, the type persists, even though it no longer supports a cached SQL statement. Oracle Support's official line on this (which used to be "bounce the database to remove the type") is that SMON cleans up unused types "after a period". Recent experience suggests that this period can be anything up to 24 hours after the bounce, so in an online database with a large shared pool, these types could stick around for some considerable period. Whether we choose to worry about this or not is another matter entirely!
10g release 2
In Oracle 10g Release 2, the type is created in the schema that parses the SQL statement, as we can see in the following example (we must first repeat the original EMP query to generate a type).
SQL> SELECT deptno 2 , COLLECT(ename) AS emps 3 FROM emp 4 GROUP BY 5 deptno;
DEPTNO EMPS ---------- ------------------------------------------------------------------------------------ 10 SYSTPo3itZvoiRAyeH+f5LKv6+Q==('CLARK', 'KING') 20 SYSTPo3itZvoiRAyeH+f5LKv6+Q==('SMITH', 'JONES', 'SCOTT', 'ADAMS', 'FORD') 30 SYSTPo3itZvoiRAyeH+f5LKv6+Q==('ALLEN', 'WARD', 'MARTIN', 'BLAKE', 'TURNER', 'JAMES') 40 SYSTPo3itZvoiRAyeH+f5LKv6+Q==('MILLER') 4 rows selected.
SQL> SELECT owner 2 , typecode 3 FROM all_types 4 WHERE type_name = 'SYSTPo3itZvoiRAyeH+f5LKv6+Q==';
OWNER TYPECODE ------------------------------ ------------------------------ SCOTT COLLECTION 1 row selected.
The fact that the parsing schema owns the collection type in Oracle 10.2 means that we can use these types if we wish, as follows.
SQL> SELECT * 2 FROM TABLE( 3 "SYSTPo3itZvoiRAyeH+f5LKv6+Q=="('A','B','C') );
COLUMN_VAL ---------- A B C 3 rows selected.
Furthermore, removing this system-generated type is much more simple in 10.2 than in 10.1. Firstly, because we own the type, we can simply drop it as follows (assuming that the SQL statement that generated it is no longer required).
SQL> DROP TYPE "SYSTPo3itZvoiRAyeH+f5LKv6+Q==";
Type dropped.
Bear in mind, however, that this type is supporting a SQL cursor. Therefore, if we do decide to drop the system-generated type as above, the underlying SQL cursor will be removed from the shared pool. Therefore, a re-run of the original SQL statement will need to be hard-parsed and a new type will be created accordingly.
Alternatively, a bounce of the database will drop the type immediately (rather than at some point during the next 24 hours, which is the 10.1 behaviour). Despite this, we are probably best to leave the SQL cursor to age out of the shared pool naturally, leaving SMON to clean up at a later stage.
using our own collection types
It is possible to use our own collection types with COLLECT. The CAST function (available at least as far back as Oracle 8.0 and possibly further) can be used to turn the results of the COLLECT into a type of our choosing. Note that this doesn't stop Oracle creating system-generated types to support the SQL statement, but it does make the results easier to work with.
In the following example, we'll create a standard VARCHAR2 collection type and CAST the results of our collected employee names.
SQL> CREATE OR REPLACE TYPE varchar2_ntt AS TABLE OF VARCHAR2(4000); 2 /
Type created.
SQL> SELECT deptno 2 , CAST(COLLECT(ename) AS varchar2_ntt) AS emps 3 FROM emp 4 GROUP BY 5 deptno;
DEPTNO EMPS ---------- --------------------------------------------------------------------- 10 VARCHAR2_NTT('CLARK', 'KING') 20 VARCHAR2_NTT('SMITH', 'JONES', 'SCOTT', 'ADAMS', 'FORD') 30 VARCHAR2_NTT('ALLEN', 'WARD', 'MARTIN', 'BLAKE', 'TURNER', 'JAMES') 40 VARCHAR2_NTT('MILLER') 4 rows selected.
Note that if you are casting collections of numbers, Oracle can be particularly fussy about precisions and scales, as the following example demonstrates. We'll create a general collection of number and then attempt to cast a collection of employee salaries (the EMP.SAL column is defined as NUMBER(7,2)).
SQL> CREATE OR REPLACE TYPE number_ntt AS TABLE OF NUMBER; 2 /
Type created.
SQL> SELECT deptno 2 , CAST(COLLECT(sal) AS number_ntt) AS sals 3 FROM emp 4 GROUP BY 5 deptno;
, CAST(COLLECT(sal) AS number_ntt) AS sals * ERROR at line 2: ORA-22814: attribute or element value is larger than specified in type
This is rather a confusing problem, as the unconstrained NUMBER type should easily incorporate a NUMBER(7,2). To wrap this up, however, there are two simple solutions. We can either make the collected column fit the type or the type fit the column, as shown below. First we'll make the column fit the type.
SQL> SELECT deptno 2 , CAST(COLLECT(CAST(sal AS NUMBER)) AS number_ntt) AS sals 3 FROM emp 4 GROUP BY 5 deptno;
DEPTNO SALS ---------- ------------------------------------------------------------------------- 10 NUMBER_NTT(2450, 5000) 20 NUMBER_NTT(800, 2975, 3000, 1100, 3000) 30 NUMBER_NTT(1600, 1250, 1250, 2850, 1500, 950) 40 NUMBER_NTT(1300) 4 rows selected.
Secondly we'll make the type fit the column.
SQL> CREATE TYPE number_7_2_ntt AS TABLE OF NUMBER(7,2); 2 /
Type created.
SQL> SELECT deptno 2 , CAST(COLLECT(sal) AS number_7_2_ntt) AS sals 3 FROM emp 4 GROUP BY 5 deptno;
DEPTNO SALS ---------- ------------------------------------------------------------------------- 10 NUMBER_7_2_NTT(2450, 5000) 20 NUMBER_7_2_NTT(800, 2975, 3000, 1100, 3000) 30 NUMBER_7_2_NTT(1600, 1250, 1250, 2850, 1500, 950) 40 NUMBER_7_2_NTT(1300) 4 rows selected.
string aggregation using collect
We'll now see a practical demonstration of what the COLLECT function can be used for. One of the FAQs of Oracle developer forums is how to aggregate multiple strings into a single value. From releases of Oracle 8.0 onwards, there have been numerous methods for doing this. The most well-known method undoubtedly utilises Tom Kyte's "STRAGG" user-defined aggregate function. The STRAGG function is popular because it is extremely easy to use and faster than any pre-9i method. The COLLECT function, when combined with a function to turn the elements of a collection into a string, is faster still. We'll look at this below.
We'll continue with our standard VARCHAR2_NTT collection type, but we'll also require a "collection-to-string" function as follows.
SQL> CREATE FUNCTION to_string ( 2 nt_in IN varchar2_ntt, 3 delimiter_in IN VARCHAR2 DEFAULT ',' 4 ) RETURN VARCHAR2 IS 5 6 v_idx PLS_INTEGER; 7 v_str VARCHAR2(32767); 8 v_dlm VARCHAR2(10); 9 10 BEGIN 11 12 v_idx := nt_in.FIRST; 13 WHILE v_idx IS NOT NULL LOOP 14 v_str := v_str || v_dlm || nt_in(v_idx); 15 v_dlm := delimiter_in; 16 v_idx := nt_in.NEXT(v_idx); 17 END LOOP; 18 19 RETURN v_str; 20 21 END to_string; 22 /
Function created.
Now we are ready to demonstrate string aggregation using the COLLECT function. We'll again collect the employee names per department, but this time we will display them in a comma-delimited string.
SQL> SELECT deptno 2 , TO_STRING(CAST(COLLECT(ename) AS varchar2_ntt)) AS emps 3 FROM emp 4 GROUP BY 5 deptno;
DEPTNO EMPS ---------- -------------------------------------------------- 10 CLARK,KING,MILLER 20 SMITH,JONES,SCOTT,ADAMS,FORD 30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
Now we can compare the performance of this method to the STRAGG implementation. We'll start by building a larger dataset to work with. We'll create a table with four sets of DBA_OBJECTS data.
SQL> CREATE TABLE t 2 AS 3 SELECT MOD(ROWNUM,100) AS id 4 , CAST('A' AS VARCHAR2(1)) AS val 5 FROM dba_objects 6 , TABLE(varchar2_ntt('A','B','C','D'));
Table created.
SQL> SELECT COUNT(*) FROM t;
COUNT(*) ---------- 193900 1 row selected.
SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER,'T');
PL/SQL procedure successfully completed.
Now we have almost 200K rows to work with, we'll aggregate the VAL column into a delimited string, using STRAGG. We'll use a TIMER package for wall-clock timings and autotrace for statistics.
SQL> set autotrace traceonly statistics SQL> exec timer.snap();
PL/SQL procedure successfully completed.
SQL> SELECT id 2 , STRAGG(val) AS vals 3 FROM t 4 GROUP BY 5 id; 100 rows selected.
Statistics ---------------------------------------------------------- 221 recursive calls 9 db block gets 551 consistent gets 395 physical reads 0 redo size 5213 bytes sent via SQL*Net to client 388 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 3 sorts (memory) 1 sorts (disk) 100 rows processed
SQL> exec timer.show('STRAGG');
[STRAGG] 7.20 seconds PL/SQL procedure successfully completed.
Now we have a rough timing for STRAGG, we can move onto the COLLECT function. The syntax is not quite as simple as STRAGG, as we've seen, but the time-savings are significant.
SQL> exec timer.snap();
PL/SQL procedure successfully completed.
SQL> SELECT id 2 , TO_STRING(CAST(COLLECT(val) AS varchar2_ntt)) AS vals 3 FROM t 4 GROUP BY 5 id; 100 rows selected.
Statistics ---------------------------------------------------------- 4441 recursive calls 111 db block gets 2519 consistent gets 109 physical reads 22040 redo size 5213 bytes sent via SQL*Net to client 388 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 104 sorts (memory) 0 sorts (disk) 100 rows processed
SQL> exec timer.show('COLLECT');
[COLLECT] 1.21 seconds PL/SQL procedure successfully completed.
We can see that this is significantly faster. Yet we might also notice that many of the statistics are showing considerably more work being performed by Oracle in support of COLLECT. Most interestingly the number of recursive calls and sorts are much higher than for the STRAGG method. How can the COLLECT function be faster? The answer is not displayed by autotrace; it is context-switching. In the STRAGG implementation, there's a context-switch for every value being aggregated (in our example, roughly 193,000). Yet in the COLLECT example, we are only context-switching 100 times (once for every call to TO_STRING). As we know from the 8i days when BULK COLLECT was making headway, context-switching penalties can be high and we can see this once again.
further reading
For a good summary of common string-aggregation techniques, see this article by Tim Hall and this article by William Robertson.
For a copy of STRAGG, see this thread on Ask Tom. This thread also contains a CONCAT_ALL function by James Padfield which is essentially a re-factored STRAGG but allowing slightly more flexibility with delimiters. For a copy of the TIMER function used in the examples in this article, see the Utilities page on this site.
The Credit goes to the below website(s):
This comment has been removed by the author.
ReplyDeleteoracle fusion Cloud HCM online training at erptree.com is worlds best online training center.
ReplyDeletewe have excelent knowledge sharing Platform. we have almost all country students as our
subscribers for online course.We have 10+ years of experience we can serve various ascent people.
we have user friendly website where you will be provided with all the required details and Self-paced
DEMO videos.we have our branches in pune, gurgaon, noida, india, usa, uk, uae, oracle fusion hcm training, fusion Procurement training, fusion hcm, scm training