tag:blogger.com,1999:blog-27375325773368368172008-09-03T21:12:14.718+01:00LishblogApplication Development with Oracle and JavaMark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comBlogger19125tag:blogger.com,1999:blog-2737532577336836817.post-42722398898121667562008-07-01T20:44:00.004+01:002008-06-30T19:58:53.683+01:00JDBC Batch Updates with OracleThe JDBC batch update feature allows multiple SQL statements to be submitted to Oracle at the same time. This can have a significant impact on performance by reducing network round trips. <br /><br />With Oracle, there are two batch update techniques; standard JDBC batch updates and Oracle batch updates. We will look at both of these and demonstrate the performance benefits of each.<br /><br />This simple table is used for our tests.<blockquote><pre>CREATE TABLE batch_test<br /> ( id NUMBER, <br /> name VARCHAR2(50)<br /> );</pre></blockquote>Table created.<h2 style="font-size:100%;color:#2D6E89">Without Batching</h2>First we will execute an INSERT statement ten thousand times without batching. This will give us a baseline to compare with the batched examples later on. <br /><br />Autocommit has been disabled (as it always should be) and the updates are committed explicitly after one hundred rows.<blockquote style="margin:1em 0px"><pre>public static void testNoBatching() throws SQLException {</pre><pre> Connection con = ConnectionFactory.getNewConnection();<br /> PreparedStatement pstmt = <br /> con.prepareStatement("INSERT INTO batch_test (id, name) " +<br /> "VALUES (?, rpad('x', 50, 'x'))");</pre><pre> long start = System.currentTimeMillis();<br /> for (int i = 1; i <= 10000; i++) {<br /> pstmt.setInt(1, i);<br /> pstmt.executeUpdate();<br /> if (i % 100 == 0) {<br /> con.commit();<br /> }<br /> }</pre><pre> System.out.printf("Inserting 10,000 rows took %4.2f seconds\n", <br /> (System.currentTimeMillis() - start) / 1000f);<br /> <br /> pstmt.close();<br /> con.close();<br />}</pre><pre>------------------------------------------------------------------------------</pre><pre>Inserting 10,000 rows took 12.83 seconds</pre></blockquote><br />This test uses a PreparedStatement and bind variables to ensure that the SQL is executed as efficiently as possible. Even so, without batching, the inserts took nearly 13 seconds.<h2 style="font-size:100%;color:#2D6E89">Standard JDBC Batch Updates</h2>With standard JDBC batch updates, the SQL statements are batched and submitted explicitly.<blockquote style="margin:1em 0px"><pre>public static void testStandardBatching() throws SQLException {</pre><pre> Connection con = ConnectionFactory.getNewConnection();<br /> PreparedStatement pstmt = <br /> con.prepareStatement("INSERT INTO batch_test (id, name) " +<br /> "VALUES (?, rpad('x', 50, 'x'))");</pre><pre> if (!con.getMetaData().supportsBatchUpdates()) {<br /> throw new SQLException("Batch updates not supported");<br /> }</pre><pre> long start = System.currentTimeMillis();<br /> for (int i = 1; i <= 10000; i++) {<br /> pstmt.setInt(1, i);<br /> <strong>pstmt.addBatch();</strong><br /> if (i % 100 == 0) {<br /> <strong>int[] updateCounts = pstmt.executeBatch();</strong><br /> con.commit();<br /> }<br /> }</pre><pre> System.out.printf("Inserting 10,000 rows took %4.2f seconds\n", <br /> (System.currentTimeMillis() - start) / 1000f);</pre><pre> pstmt.close();<br /> con.close();<br />}</pre><pre>------------------------------------------------------------------------------</pre><pre>Inserting 10,000 rows took 0.40 seconds</pre></blockquote>The addBatch method adds an SQL statement to the current batch. At an appropriate point the batch is submitted to Oracle by calling the executeBatch method. Here we submit the batch after one hundred inserts, just before we commit. executeBatch returns an array of integers which contains the number of rows affected by each statement in the batch.<br /><br />This time, the entire test took less than half a second, a fraction of the time taken to perform the original test.<h2 style="font-size:100%;color:#2D6E89">Oracle Batch Updates</h2>Oracle update batching takes a different approach. The batch size is set on the OraclePreparedStatement object and SQL statements are implicitly batched by the JDBC driver until the batch size is reached. At this point the SQL is automatically submitted to Oracle.<blockquote style="margin:1em 0px"><pre>public static void testOracleBatching() throws SQLException {</pre><pre> Connection con = ConnectionFactory.getNewConnection();<br /> PreparedStatement pstmt = <br /> con.prepareStatement("INSERT INTO batch_test (id, name) " +<br /> "VALUES (?, rpad('x', 50, 'x'))");</pre><pre> <br /> <strong>((OraclePreparedStatement)pstmt).setExecuteBatch (100);</strong> <br /> long start = System.currentTimeMillis();<br /> for (int i = 1; i <= 10000; i++) {<br /> pstmt.setInt(1, i);<br /> int updateCount = pstmt.executeUpdate();<br /> if (i % 100 == 0) {<br /> con.commit();<br /> }<br /> }</pre><pre> System.out.printf("Inserting 10,000 rows took %4.2f seconds\n", <br /> (System.currentTimeMillis() - start) / 1000f);</pre><pre> pstmt.close();<br /> con.close();<br />}</pre><pre>------------------------------------------------------------------------------</pre><pre>Inserting 10,000 rows took 0.39 seconds</pre></blockquote>This time we implement batching with a single call to the setExecuteBatch method on OraclePreparedStatement. The update count returned from executeUpdate will return zero if the call is adding the statement to the batch, or the total number of rows affected by all the update statements if the batch is being submitted to Oracle. Once again, with the batch size set to one hundred, the ten thousand rows are inserted in less than half a second.<h2 style="font-size:100%;color:#2D6E89">Summary</h2>These examples show that there can be a significant improvement in performance when batch updates are used rather than single statement updates. In our tests, execution times were reduced from over twelve seconds to less than half a second when batching was introduced.<br /><br />Of course, not every application is suited to batch updates but, when used appropriately, the performance benefits can be considerable.Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-90662439410043339362008-06-30T19:18:00.006+01:002008-07-14T19:12:03.479+01:00Using Byte and Char with Character DatatypesCharacter datatypes such as CHAR, VARCHAR2 and CLOB can be defined using byte or character length semantics as follows.<blockquote><pre>CREATE TABLE t(x VARCHAR2(10), <br /> y VARCHAR2(20 BYTE), <br /> z VARCHAR2(30 CHAR));</pre><pre>Table created.</pre></blockquote>If CHAR or BYTE is not specified then the value defined by the NLS_LENGTH_SEMANTICS parameter is used as the default. We can check the current value of this parameter as follows.<blockquote><pre>SELECT value<br /> FROM v$nls_parameters<br /> WHERE parameter = 'NLS_LENGTH_SEMANTICS';</pre><pre>VALUE <br />---------------<br />BYTE</pre><pre>1 rows selected</pre></blockquote>In the above example, table t was created with columns x and y with maximum lengths of 10 and 20 bytes respectively and column z with a maximum length of 30 characters.<br /><br />But what does this actually mean?<h2 style="font-size:100%;color:#2D6E89">Character Set</h2>During database creation, the character set to be used by CHAR, VARCHAR2 and CLOB datatypes is specified. These are some of the values available for selection.<pre><br /> US7ASCII ASCII 7-bit American (SB)<br /> WE8ISO8859P1 ISO 8859-1 West European 8-bit (SB)<br /> EE8ISO8859P2 ISO 8859-2 East European 8-bit (SB)<br /> JA16VMS JVMS 16-bit Japanese (MB)<br /> KO16KSCCS KSCCS 16-bit Korean (MB)<br /> AL16UTF16 Unicode 3.2 UTF-16 Universal character set (MB) <br /> AL32UTF8 Unicode 3.2 UTF-8 Universal character set (MB)</pre><pre><br /> SB = Single Byte<br /> MB = Multi Byte</pre>The character set for a database can be determined with the following SQL.<blockquote><pre>SELECT value<br /> FROM v$nls_parameters<br /> WHERE parameter = 'NLS_CHARACTERSET';</pre><pre>VALUE <br />---------------<br />WE8ISO8859P1</pre><pre>1 rows selected</pre></blockquote><p>When using a single byte character set such as WE8ISO8859P1, a character is always held in a single byte so specifying the BYTE or CHAR qualifier has no effect on storage allocation. However, with a multi-byte character set such as EE8ISO8859P2, a single character may require up to 4 bytes of storage. In this case it may be useful to specify the length of the column in characters and let Oracle figure out how many bytes to allocate. <br /><br />Consideration must be given to the maximum number of bytes that can be allocated. When using BYTE semantics the maximum is 2000 for CHAR datatypes, and 4000 for VARCHAR2. However, when using the CHAR length qualifier the maximum size that can be specified may be reduced in order to accommodate the character set.<h2 style="font-size:100%;color:#2D6E89">National Character Set</h2>The Unicode or national character set is also specified when creating the database. For example.</p><pre><br /> AL32UTF8 UTF-8<br /> AL16UTF16 UTF-16</pre><br />This setting is used by the Unicode datatypes NCHAR, NVARCHAR2 and NCLOB and the size of the column is always specified in characters.<br /><br />Use the following SQL to determine the national character set details for a database.<blockquote><pre>SELECT VALUE<br /> FROM nls_database_parameters<br /> WHERE parameter = 'NLS_NCHAR_CHARACTERSET';</pre><pre>VALUE <br />--------------- <br />UTF8</pre><pre>1 rows selected</pre></blockquote>Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-50485620714174715822008-06-30T19:09:00.004+01:002008-06-30T19:15:41.800+01:00Fast Dual in OracleOracle 10g introduces FAST DUAL. Now, when the the DUAL table is accessed, no logical I/O is performed which improves performance. <br /><br />We can demonstrate this by <a href="http://blog.lishman.com/2008/02/how-to-generate-rows-in-oracle_2477.html"> generating a million rows</a> using the DUAL table.<blockquote><pre>set timi on<br />set autot trace stat</pre><pre>SELECT rownum FROM dual CONNECT BY LEVEL <= 1000000;</pre><pre>1000000 rows selected.</pre><pre>Elapsed: 00:00:08.07</pre><pre>Statistics<br />----------------------------------------------------------<br /> 0 recursive calls<br /> 0 db block gets<br /> 0 consistent gets<br /> 0 physical reads<br /> 0 redo size<br /> 14380219 bytes sent via SQL*Net to client<br /> 733838 bytes received via SQL*Net from client<br /> 66668 SQL*Net roundtrips to/from client<br /> 1 sorts (memory)<br /> 0 sorts (disk)<br /> 1000000 rows processed</pre></blockquote>The statistics trace shows that 1 million rows have been processed without any logical I/O being performed.Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-53012947318398016652008-06-14T19:48:00.002+01:002008-06-30T18:53:05.079+01:00Oracle Quick Tip: Divide a result set into groups using NTILEThe <a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions103.htm#SQLRF00680"><code>NTILE</code></a> analytic function divides a result set into a specified number of groups or buckets.<br /><br />The following SQL takes a <a href="http://blog.lishman.com/2008/02/how-to-generate-rows-in-oracle_2477.html">generated result set </a>and, using <code>NTILE</code>, adds a column containing three distinct values with an equal number of rows for each value.<blockquote><pre>SELECT id, val, ntile(3) OVER(ORDER BY id) nt<br />FROM (<br /> SELECT rownum id, 'xyz' || rownum val<br /> FROM dual<br /> CONNECT BY LEVEL <=12<br />)<br />ORDER BY id;</pre><pre>ID VAL NT<br />-------- ------------ ---------<br />1 xyz1 1<br />2 xyz2 1<br />3 xyz3 1<br />4 xyz4 1<br />5 xyz5 2<br />6 xyz6 2<br />7 xyz7 2<br />8 xyz8 2<br />9 xyz9 3<br />10 xyz10 3<br />11 xyz11 3<br />12 xyz12 3</pre><pre>12 rows selected</pre></blockquote>Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-80268639845202078602008-06-13T21:03:00.014+01:002008-06-17T19:57:37.239+01:00Oracle Flashback TableThe <a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_9012.htm#SQLRF01802"><code>FLASHBACK TABLE</code> </a>command can be used to restore an earlier state of a table in the event of human or application error.<h2 style="font-size:100%;color:#2D6E89">Restoring a dropped table</h2>The first use of <code>FLASHBACK TABLE</code> is to retrieve a table from the recycle bin after it has been dropped.<blockquote><pre>DROP TABLE dept;</pre><pre>Table dropped.</pre><br/><pre>SELECT *<br />FROM dept;</pre><pre>SQL Error: ORA-00942: table or view does not exist</pre><br/><pre>SELECT object_name, original_name<br />FROM recyclebin;</pre><pre>OBJECT_NAME ORIGINAL_NAME <br />------------------------------ -------------------------------- <br />BIN$QzJJ24I1hx7gQAAK+wFg5Q==$0 DEPT</pre><pre>1 rows selected</pre><br/><pre><STRONG>FLASHBACK TABLE dept TO BEFORE DROP;</STRONG></pre><pre>Flashback complete.</pre><br/><pre>SELECT *<br />FROM dept;</pre><pre>DEPTNO DNAME LOC <br />---------------------- -------------- ------------- <br />10 ACCOUNTING NEW YORK <br />20 RESEARCH DALLAS <br />30 SALES CHICAGO <br />40 OPERATIONS BOSTON</pre><pre>4 rows selected</pre></blockquote>The <code>RENAME TO</code> clause can be used to give the table a new name during the restore. This is useful if a new table has been created with the same name since the old one was deleted.<blockquote><pre>FLASHBACK TABLE dept TO BEFORE DROP RENAME TO dept_before_drop;</pre><pre>Flashback complete.</pre></blockquote>Any indexes or triggers associated with the table will still have their recycled names and will need to be re-instated manually. Also, views and procedures which rely on the table will have been invalidated and will need to be recompiled.<h2 style="font-size:100%;color:#2D6E89">Restoring a table to an earlier state</h2><code>FLASHBACK TABLE</code> can also be used to revert the state of a table back to a specified time or SCN.<br /><br />Let's see this in action using the EMP table in the Oracle demo schema.<blockquote><pre>SELECT deptno, count(*)<br />FROM emp<br />GROUP BY deptno;</pre><pre>DEPTNO COUNT(*) <br />------------ --------------<br />10 3 <br />20 5 <br />30 6</pre><pre>3 rows selected</pre></blockquote>Suppose we wish to delete all entries for department 10 but we mistype the SQL and commit before we notice our mistake.<blockquote><pre>DELETE FROM emp<br />WHERE deptno >= 10;</pre><pre>14 rows deleted</pre><br/><pre>COMMIT;</pre><pre>Commit complete.</pre><br/><pre>SELECT deptno, count(*)<br />FROM emp<br />GROUP BY deptno;</pre><pre>DEPTNO COUNT(*) <br />------------ --------------</pre><pre>0 rows selected</pre></blockquote>All the rows have been deleted and the transaction committed. But fear not, flashback technology was introduced to deal with just this kind of problem. We can restore the table back to the state it was in, say, one minute ago. Before we can do this, however, we must ensure that row movement is enabled on the table.<blockquote><pre>ALTER TABLE emp ENABLE ROW MOVEMENT;</pre><pre>Table altered.</pre><br/><pre><strong>FLASHBACK TABLE emp TO TIMESTAMP systimestamp - INTERVAL '1' MINUTE;</strong></pre><pre>Flashback complete.</pre><br/><pre>SELECT deptno, count(*)<br />FROM emp<br />GROUP BY deptno;</pre><pre>DEPTNO COUNT(*) <br />------------ --------------<br />10 3 <br />20 5 <br />30 6</pre><pre>3 rows selected</pre></blockquote>The table has been 'rewound' to a prior point in time. All the rows that existed on the table one minute ago have been restored. It is as if the delete never happened.<br /><br />Alternatively, we could restore to a particular SCN. To use this method we must know the exact SCN we wish to restore to, so we will need to record the value before the update is attempted.<blockquote><pre>SELECT current_scn<br />FROM v$database;</pre><pre>CURRENT_SCN <br />---------------------- <br />3779819703</pre><pre>1 rows selected</pre><br/><pre>DELETE FROM emp<br />WHERE deptno >= 10;</pre><pre>14 rows deleted</pre><br/><pre>COMMIT;</pre><pre>COMMIT succeeded.</pre><br/><pre><strong>FLASHBACK TABLE emp TO SCN 3779819703;</strong></pre><pre>FLASHBACK TABLE succeeded.</pre><br/><pre>SELECT deptno, count(*)<br />FROM emp<br />GROUP BY deptno;</pre><pre>DEPTNO COUNT(*) <br />------------ --------------<br />10 3 <br />20 5 <br />30 6</pre><pre>3 rows selected</pre></blockquote><h2 style="font-size:100%;color:#2D6E89">Restore Point</h2>Oracle 10g Release 2 introduced the concept of restore points. A restore point is a name which is associated with the SCN of the database at the time the restore point was created. We can reference a restore point name in the <code>FLASHBACK TABLE</code> command.<br /><br />For example.<blockquote><pre>CREATE RESTORE POINT before_changes;</pre><pre>Restore point created.</pre></blockquote>If we run in to any problems during our updates, we can simply rewind one or more tables back to the restore point.<blockquote><pre>DELETE FROM emp<br />WHERE deptno >= 10;</pre><pre>14 rows deleted</pre><br/><pre>COMMIT;</pre><pre>COMMIT succeeded.</pre><br/><pre><strong>FLASHBACK TABLE emp TO RESTORE POINT before_changes;</strong></pre><pre>FLASHBACK TABLE succeeded.</pre><br/><pre>SELECT deptno, count(*)<br />FROM emp<br />GROUP BY deptno;</pre><pre>DEPTNO COUNT(*) <br />------------ --------------<br />10 3 <br />20 5 <br />30 6</pre><pre>3 rows selected</pre></blockquote><h2 style="font-size:100%;color:#2D6E89">Restrictions</h2>There are are some rules and restrictions to be aware of when using <code>FLASHBACK TABLE</code>.<UL><LI>This feature is only available with Oracle Enterprise Edition.<br /><LI>You will need the appropriate privileges (such as FLASHBACK ANY TABLE) to execute these commands. See the <a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_9012.htm#SQLRF01802">Oracle documentation </a>for more details.<br /><LI>Oracle may restore rows to a different location and allocate new rowids. Any references to the original rowids will no longer be valid.<br /><LI>There must be sufficient redo data available in the database to restore the data. This can be guaranteed for a predetermined period of time if the <a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28320/initparams251.htm#REFRN10225"><code>UNDO_RETENTION</code></a> initialization parameter is used.<br /><LI>You cannot restore a table to an earlier state across any DDL operations that change the structure of the table (including <code>TRUNCATE</code>).<br /><LI>Restore points were only introduced at 10g Release 2.<br /></UL>Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-34864636857561796012008-06-13T20:51:00.005+01:002008-06-17T19:44:34.063+01:00Precision and scale do not specify column size<p>Precision and scale do not determine the physical storage allocation for a number, only the maximum size of the value that can be stored in it. A NUMBER column will occupy as much or as little space as it needs to, up to a maximum of 22 bytes. The physical size will grow and shrink depending on the size of the number to be held. In this sense, it behaves in a similar way to a VARCHAR2 datatype which also dynamically allocates enough bytes to hold the value up to the maximum specified size.<br /><br />To demonstrate this we will use the <a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions217.htm#SQLRF06162"><code>VSIZE</code></a> function which returns the number of bytes in the internal representation of a column.</p><blockquote><pre>CREATE TABLE t (<br /> num_col NUMBER(16)<br />);</pre><pre>CREATE TABLE succeeded.</pre><br/><pre>INSERT INTO t (num_col) <br />VALUES (1234567890);</pre><pre>1 rows inserted</pre><br/><pre>SELECT num_col, VSIZE(num_col) <br />FROM t;</pre><pre>NUM_COL VSIZE(NUM_COL) <br />---------------------- ---------------------- <br />1234567890 6</pre><pre>1 rows selected</pre></blockquote>If we change the value we see that the internal size, the number of bytes required to hold our number, has changed.<blockquote><pre>UPDATE t <br />SET num_col = 1;</pre><pre>1 rows updated</pre><br/><pre>SELECT num_col, VSIZE(num_col) <br />FROM t;</pre><pre>NUM_COL VSIZE(NUM_COL) <br />---------------------- ---------------------- <br />1 2</pre><pre>1 rows selected</pre></blockquote>So when we specify a precision and scale for a number we are not telling Oracle to reserve a fixed number of bytes for every number in the column. We are only providing the maximum size and the number of decimal places. Oracle will store the number in the minimum number of bytes necessary to hold the value.Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-10188363088375165082008-06-13T19:59:00.007+01:002008-06-30T20:13:07.661+01:00Oracle Quick Tip: Setting the Session Date FormatThe date or timestamp format for the current session can be altered with the following commands.<blockquote><pre>ALTER SESSION SET NLS_DATE_FORMAT = [format model];<br />ALTER SESSION SET NLS_TIMESTAMP_FORMAT = [format model];</pre></blockquote>The current settings for the session can seen with this SQL.<blockquote><pre>SELECT *<br />FROM v$nls_parameters<br />WHERE parameter IN ('NLS_DATE_FORMAT', 'NLS_TIMESTAMP_FORMAT');</pre><pre>PARAMETER VALUE <br />--------------------- -----------------------------<br />NLS_DATE_FORMAT DD-MON-RR<br />NLS_TIMESTAMP_FORMAT DD-MON-RR HH24.MI.SSXFF</pre><pre>2 rows selected</pre></blockquote>This can be seen when a date and timestamp are displayed.<blockquote><pre>SELECT sysdate "Date", to_timestamp(sysdate) "Timestamp" FROM dual;</pre><pre>Date Timestamp <br />------------- -----------------------------------<br />13-JUN-08 13-JUN-08 22.29.27.000000000</pre><pre>1 rows selected</pre></blockquote>To change the format, issue the <code>ALTER SESSION</code> command and specify the new format model.<blockquote><pre>ALTER SESSION SET nls_date_format = 'dd-mon-yyyy hh24:mi:ss';</pre><pre>Session altered.</pre><br/><pre>ALTER SESSION SET nls_timestamp_format = 'dd-mon-yy hh:mi:ss';</pre><pre>Session altered.</pre></blockquote>The date and timestamp are now displayed with the revised format.<blockquote><pre>SELECT sysdate "Date", to_timestamp(sysdate) "Timestamp" FROM dual;</pre><pre>Date Timestamp <br />--------------------- -----------------------------------<br />13-jun-2008 22:31:21 13-jun-08 10:31:21</pre><pre>1 rows selected</pre></blockquote>Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-84211835364310441232008-05-20T21:08:00.009+01:002008-06-17T19:48:01.379+01:00Oracle Pipelined FunctionsIn a <a href="http://blog.lishman.com/2008/05/oracle-table-functions.html">previous article </a>we saw that a collection can be returned from a PL/SQL function and used in a query as if it were a table.<br /><br />Rows from a collection returned by a table function can also be pipelined. This means that the rows are returned to the client as they are produced, rather than all at once when the function has run to completion. This can improve performance, especially with large result sets, because the client can start to process the results without having to wait for the function to finish.<h2 style="font-size:100%;color:#2D6E89">CSV to Column</h2>We will develop a simple pipelined function to read data from an <a href="http://blog.lishman.com/2008/03/oracle-external-tables.html">external table </a>which contains the following contact data.<blockquote><pre>"mrblond@abc.com","m","25"," movies, music"<br />"mrsred@jkl.com","f","33","cooking , reading, yoga, travel"<br />"mrpink@def.com","m","31"," movies, reading,sport"<br />"mrsbrown@ghi.com","f","22","cycling, swimming"</pre></blockquote>The function will pivot the comma separated list of interests and present them in a column which contains a single interest like so.<blockquote><pre>mrblond@abc.com m 25 movies<br />mrblond@abc.com m 25 music<br />mrsbrown@ghi.com f 22 cycling<br />mrsbrown@ghi.com f 22 swimming<br />mrpink@def.com m 31 movies<br />mrpink@def.com m 31 reading<br />mrpink@def.com m 31 sport</pre></blockquote>First we create a directory object and an <a href="http://blog.lishman.com/2008/03/oracle-external-tables.html">external table </a>to access the contact details in the flat file.<blockquote><pre>CREATE OR REPLACE DIRECTORY tmp_dir AS '/tmp';</pre><pre>Directory created.</pre><br/><pre>CREATE TABLE contact_ext(<br /> email VARCHAR2(100),<br /> sex CHAR(1),<br /> age NUMBER,<br /> interests VARCHAR2(500)<br /> )<br /> ORGANIZATION EXTERNAL<br /> (DEFAULT DIRECTORY tmp_dir <br /> ACCESS PARAMETERS(<br /> records delimited BY newline <br /> fields terminated BY ',' <br /> optionally enclosed BY '"') <br /> LOCATION('contacts.txt')<br />);</pre><pre>Table created.</pre></blockquote>Next we create a contact object to represent the new table structure and a nested table of this type.<blockquote><pre>CREATE OR REPLACE type contact_type AS OBJECT<br />(email VARCHAR2(100), <br /> sex char(1), <br /> age NUMBER, <br /> interest VARCHAR(50));</pre><pre>Type created.</pre><br/><pre>CREATE OR REPLACE TYPE contact_table_type AS<br /> TABLE OF contact_type;</pre><pre>Type created.</pre></blockquote>The table function is then created using the nested table as the return type. The <code>PIPELINED</code> keyword indicates that the function will return rows iteratively.<blockquote><pre>CREATE OR REPLACE FUNCTION contact_list <br />RETURN contact_table_type <strong>PIPELINED</strong><br />AS<br /> l_interests dbms_utility.uncl_array;<br /> l_count NUMBER;<br />BEGIN</pre><pre> FOR rec IN<br /> (SELECT *<br /> FROM contact_ext)<br /> LOOP<br /> dbms_utility.comma_to_table(list => rec.interests, <br /> tablen => l_count, <br /> tab => l_interests);</pre><pre> FOR i IN 1 .. l_count<br /> LOOP<br /> <strong>PIPE ROW</strong> (CONTACT_TYPE(rec.email, <br /> rec.sex,<br /> rec.age,<br /> TRIM(l_interests(i))));<br /> END LOOP;<br /> END LOOP;<br />END;</pre><pre>Function created.</pre></blockquote>We take each CSV list of values from the interests column of <code>CONTACT_EXT</code> and convert it into a nested table using the <code>COMMA_TO_TABLE</code> procedure of the <code>DBMS_UTILITY</code> package. We then iterate through the nested table and return a series of <code>CONTACT_TYPE</code>s using the <code>PIPE ROW</code> command.<br /><br />This procedure can now be used as a table in a query.<blockquote><pre>SELECT * FROM TABLE (contact_list);</pre><pre>EMAIL SEX AGE INTEREST <br />----------------------- ------- ------ -------------<br />mrblond@abc.com m 25 movies<br />mrblond@abc.com m 25 music<br />mrsbrown@ghi.com f 22 cycling<br />mrsbrown@ghi.com f 22 swimming<br />mrpink@def.com m 31 movies<br />mrpink@def.com m 31 reading<br />mrpink@def.com m 31 sport<br />mrsred@jkl.com f 33 cooking<br />mrsred@jkl.com f 33 reading<br />mrsred@jkl.com f 33 yoga<br />mrsred@jkl.com f 33 travel</pre><pre>11 rows selected</pre></blockquote><h2 style="font-size:100%;color:#2D6E89">Column to CSV</h2>Now let's do this in reverse. Suppose we are given the data in the above format and we need to return each email address on a separate line with the interests shown as a comma separated list.<br /><br />We could use the <code>SYS_CONNECT_BY_PATH</code> function to provide an <a href="http://blog.lishman.com/2008/05/how-to-pivot-column-to-csv-list-in.html">SQL only solution</a> but for the purposes of this exercise we will develop a bespoke PL/SQL pipelined function.<br /><br />First, we create a table to work with.<blockquote><pre>CREATE TABLE contact<br />AS SELECT * FROM TABLE (contact_list);</pre><pre>Table created.</pre></blockquote>This time, the function will take the contents of the table column and build up a comma separated string of values for each email address. The <code><a href="http://blog.lishman.com/2008/04/lead-and-lag-analytical-functions-in.html">LEAD</a></code> function helps us to identify when an email is about to change by looking at the next email address in the result set. Once again, rows are returned as they are produced using the <code>PIPE ROW</code> command.<blockquote><pre>CREATE OR REPLACE<br />FUNCTION contact_csv <br />RETURN contact_table_type <strong>PIPELINED</strong><br />AS<br /> l_interests VARCHAR2(1000);<br /> l_count NUMBER;<br />BEGIN</pre><pre> FOR row IN<br /> (SELECT <br /> email, <br /> LEAD(email, 1) OVER (ORDER BY email) AS next_email,<br /> sex, <br /> age, <br /> interest<br /> FROM contact<br /> ORDER BY email)<br /> LOOP</pre><pre> l_interests := l_interests || ',' || row.interest;</pre><pre> IF (row.email != row.next_email OR<br /> row.next_email IS NULL) THEN<br /> <strong>PIPE ROW</strong> (CONTACT_TYPE(row.email, <br /> row.sex,<br /> row.age,<br /> LTRIM(l_interests,',')));<br /> l_interests := null;<br /> END IF;<br /> END LOOP;<br />END;</pre><pre>Function created.</pre></blockquote>We can confirm the results using the following SQL.<blockquote><pre>SELECT * <br />FROM TABLE(contact_csv);</pre><pre>EMAIL SEX AGE INTEREST <br />---------------------- ------- ------- ------------------------------<br />mrblond@abc.com m 25 movies,music<br />mrpink@def.com m 31 movies,reading,sport<br />mrsbrown@ghi.com f 22 cycling,swimming<br />mrsred@jkl.com f 33 cooking,reading,yoga,travel</pre><pre>4 rows selected</pre></blockquote><h2 style="font-size:100%;color:#2D6E89">Summary</h2>Oracle pipelined functions are table functions which return data iteratively. This allows the client to start processing the output from the function call before the entire collection has been populated. <a href="http://blog.lishman.com/2008/05/oracle-table-functions.html"></a>Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-59873887873376060212008-05-20T20:40:00.010+01:002008-06-13T19:47:05.087+01:00How to pivot a column to a CSV list in Oracle<p>Suppose we have a column of data that we wish to display as a CSV list. One solution is to develop a <a href="http://blog.lishman.com/2008/05/oracle-pipelined-functions.html">pipelined table function </a>in PL/SQL, but how can we achieve the same results using just SQL? This article shows how the Oracle <code>SYS_CONNECT_BY_PATH</code> function can be used to solve the problem.<h2 style="font-size:100%;color:#2D6E89">Example</h2>Using the employee table from the Oracle demo schema, we can easily display a list of employee names for each department.</p><blockquote><pre>SELECT deptno, ename<br />FROM emp<br />ORDER BY deptno;</pre><pre>DEPTNO ENAME <br />------------- ---------- <br />10 CLARK <br />10 KING <br />10 MILLER <br />20 SMITH <br />20 ADAMS <br />20 FORD <br />20 SCOTT <br />20 JONES <br />30 ALLEN <br />30 BLAKE <br />30 MARTIN <br />30 JAMES <br />30 TURNER <br />30 WARD</pre><pre>14 rows selected</pre></blockquote>But suppose we wish to format the names as a comma separated list as follows.<blockquote><pre>DEPTNO EMPLOYEES<br />------------ ------------------------------------<br />10 CLARK,KING,MILLER<br />20 SMITH,ADAMS,FORD,SCOTT,JONES<br />30 ALLEN,BLAKE,MARTIN,JAMES,TURNER,WARD</pre><pre>3 rows selected</pre></blockquote><h2 style="font-size:100%;color:#2D6E89">SYS_CONNECT_BY_PATH</h2>The technique described here uses the <code>SYS_CONNECT_BY_PATH</code> function to provide the CSV column. Let's remind ourselves of what <code>SYS_CONNECT_BY_PATH</code> actually does by displaying a a hierarchy of employees.<blockquote><pre>SELECT <br /> empno,<br /> mgr,<br /> level,<br /> SYS_CONNECT_BY_PATH(ename, '/') path <br /> FROM emp<br /> START WITH ename ='KING'<br /> CONNECT BY PRIOR empno = mgr;</pre><pre>EMPNO MGR LEVEL PATH<br />------- ---------- --------- --------------------------<br />7839 1 /KING<br />7566 7839 2 /KING/JONES<br />7788 7566 3 /KING/JONES/SCOTT<br />7876 7788 4 /KING/JONES/SCOTT/ADAMS<br />7902 7566 3 /KING/JONES/FORD<br />7369 7902 4 /KING/JONES/FORD/SMITH<br />7698 7839 2 /KING/BLAKE<br />7499 7698 3 /KING/BLAKE/ALLEN<br />7521 7698 3 /KING/BLAKE/WARD<br />7654 7698 3 /KING/BLAKE/MARTIN<br />7844 7698 3 /KING/BLAKE/TURNER<br />7900 7698 3 /KING/BLAKE/JAMES<br />7782 7839 2 /KING/CLARK<br />7934 7782 3 /KING/CLARK/MILLER</pre><pre>14 rows selected</pre></blockquote><code>SYS_CONNECT_BY_PATH</code> returns the path of a column value from root to node, with column values separated by a '/' (or whichever character we specify) for each row returned by <code>CONNECT BY</code> condition. <br /><br />Now this function is only valid in hierarchical queries and unfortunately, in our example, there is no natural hierarchy in our data that we can use. Therefore, we must manufacture one using analytic functions.<br /><br /><h2 style="font-size:100%;color:#2D6E89">Analytic Functions</h2>Let's add a couple more columns to our original query.<blockquote><pre>SELECT deptno,<br /> ename,<br /> row_number() over(PARTITION BY deptno ORDER BY deptno) rnum,<br /> count(*) over(PARTITION BY deptno) tot<br />FROM emp;</pre><pre>DEPTNO ENAME RNUM TOT <br />------------ ---------- ------ ----- <br />10 CLARK 1 3 <br />10 KING 2 3 <br />10 MILLER 3 3 <br />20 SMITH 1 5 <br />20 ADAMS 2 5 <br />20 FORD 3 5 <br />20 SCOTT 4 5 <br />20 JONES 5 5 <br />30 ALLEN 1 6 <br />30 BLAKE 2 6 <br />30 MARTIN 3 6 <br />30 JAMES 4 6 <br />30 TURNER 5 6 <br />30 WARD 6 6</pre><pre>14 rows selected</pre></blockquote>The RNUM column creates an artificial hierarchy within department which allows us to identify the parent using rnum – 1. TOT tells us the total number of employees for the current department. We use this to identify the employees who are at the bottom of the hierarchy (where rnum=tot).<br /><br /><h2 style="font-size:100%;color:#2D6E89">Complete Solution</h2>So, putting all this together, we get the following SQL.<blockquote><pre>SELECT <br /> deptno,<br /> LTRIM(SYS_CONNECT_BY_PATH(ename, ','), ',') employees<br /> FROM (<br /> SELECT deptno,<br /> ename,<br /> row_number() over(PARTITION BY deptno ORDER BY deptno) rnum,<br /> count(*) over(PARTITION BY deptno) tot<br /> FROM emp<br /> )<br />WHERE rnum=tot<br />START WITH rnum = 1 <br />CONNECT BY PRIOR rnum = rnum -1 AND PRIOR deptno = deptno;</pre><pre>DEPTNO EMPLOYEES<br />------------ ------------------------------------<br />10 CLARK,KING,MILLER<br />20 SMITH,ADAMS,FORD,SCOTT,JONES<br />30 ALLEN,BLAKE,MARTIN,JAMES,TURNER,WARD</pre><pre>3 rows selected</pre></blockquote>We create an arbitrary hierarchy of employees within department and select only those employees at the bottom of the hierarchy. Then we use <code>SYS_CONNECT_BY_PATH</code> with a comma delimiter to format the employee names as a CSV list. <code>LTRIM</code> removes the first comma from our list. <br /><br />Clearly, this is not the most intuitive SQL statement ever written. However, we could use this statement to create a view which would simplify access to the data. Alternatively, we could write a bespoke <a href="http://blog.lishman.com/2008/05/oracle-pipelined-functions.html">pipelined table function </a>to provide a programmatic solution.Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-86537823727531644102008-05-20T20:34:00.004+01:002008-05-20T20:38:36.019+01:00Oracle Quick Tip: Querying Buffer Cache UsageThis SQL statement shows which objects in our schema are using the most blocks in the data buffer cache.<blockquote><pre>SELECT object_name, object_type type, COUNT(*) blocks<br />FROM v$bh<br /> JOIN user_objects ON object_id = objd<br />GROUP BY object_name, object_type<br />ORDER BY blocks DESC;</pre><pre>OBJECT_NAME TYPE BLOCKS <br />----------------------------- -------- ------<br />MGMT_METRICS_RAW_PK INDEX 1960<br />MGMT_METRICS_1HOUR_PK INDEX 1183<br />MGMT_METRICS_1DAY_PK INDEX 349<br />MGMT_METRICS TABLE 149<br />MGMT_SYSTEM_PERF_LOG_IDX_01 INDEX 62<br />MGMT_CURRENT_METRICS_PK INDEX 29<br />MGMT_SYSTEM_PERFORMANCE_LOG TABLE 21<br />MGMT_TARGETS TABLE 14<br />MGMT_METRIC_COLLECTIONS TABLE 14<br />MGMT_JOB_EMD_STATUS_QUEUE TABLE 14</pre></blockquote>To display all objects, regardless of schema, use <code>DBA_OBJECTS</CODE> instead of <code>USER_OBJECTS</code>.Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-34266377589989372562008-05-20T19:59:00.012+01:002008-06-13T19:25:02.789+01:00Oracle Table FunctionsAn Oracle table function returns a collection that can be queried as if it were a table. <br /><br />For example.<blockquote><pre>SELECT * FROM TABLE(my_function);</pre></blockquote>The <code>TABLE</code> keyword informs Oracle that the collection produced by <code>my_function</code> should be treated as a table in this query.<h2 style="font-size:100%;color:#2D6E89">Example</h2>To demonstrate this, let's create a simple function which generates a list of dates for a specified range, which could be used as follows.<blockquote><pre>SELECT column_value gen_date<br />FROM TABLE (date_list('1-jan-08', '5-jan-08'));</pre><pre>gen_date <br />----------------<br />01-jan-08<br />02-jan-08<br />03-jan-08<br />04-jan-08<br />05-jan-08</pre><pre>5 rows selected</pre></blockquote>Incidentally, <a href="http://blog.lishman.com/2008/02/how-to-generate-rows-in-oracle_2477.html">this article</a> provides a simple solution for generating rows, including dates in a range, using the <code>CONNECT BY LEVEL</code> clause.<h2 style="font-size:100%;color:#2D6E89">The DATE_LIST function</h2>First, we create a collection to be used as the return type from the function. Here we use a nested table but a <code>VARRAY</code> would also be valid.<blockquote><pre>CREATE TYPE interval_tab<br /> AS TABLE OF date;<br />/</pre><pre>Type created.</pre></blockquote>Now we can go ahead and create the <code>date_list</code> function.<blockquote><pre>CREATE OR REPLACE FUNCTION date_list(p_start_date IN DATE, <br /> p_end_date IN DATE) <br />RETURN interval_tab AS<br /> l_interval interval_tab := interval_tab();<br /> l_days binary_integer := p_end_date - p_start_date + 1;<br />BEGIN</pre><pre>IF (l_days < 0) THEN<br /> RETURN null;<br /> END IF;</pre><pre>l_interval.extend(l_days);</pre><pre>FOR i IN 1 .. l_days<br /> LOOP<br /> l_interval(i) := p_start_date + i -1;<br /> END LOOP;</pre><pre>RETURN l_interval;<br />END;<br />/</pre><pre>Function created.</pre></blockquote>This function populates a nested table by adding an incrementing value to the start date. <h2 style="font-size:100%;color:#2D6E89">Using the table function</h2>One use for this function would be to generate a list of dates to be used in an outer join. For example, the following query displays the number of schema objects created on each day of a specified date range.<blockquote><pre>SELECT to_char(column_value, 'dd-Mon-yy') created, nvl(cnt, 0) cnt<br />FROM TABLE(date_list('1-feb-08', '5-feb-08'))<br />LEFT OUTER JOIN<br /> (SELECT TRUNC(created, 'dd') created, COUNT(*) cnt<br /> FROM all_objects<br /> GROUP BY TRUNC(created, 'dd'))<br />ON created = column_value<br />ORDER BY column_value;</pre><pre>CREATED CNT <br />--------- --------<br />01-Feb-08 4 <br />02-Feb-08 0 <br />03-Feb-08 0 <br />04-Feb-08 1 <br />05-Feb-08 2</pre><pre>5 rows selected</pre></blockquote>Using the <code>date_list</code> function and an outer join we can display a row per day, including those days that have no entries in the <code>ALL_OBJECTS</code> table.</br></br>Oracle table functions can also be <a href="http://blog.lishman.com/2008/05/oracle-pipelined-functions.html">pipelined</a> which allows the data to be returned iteratively as it is produced, rather than all at once when the function has run to completion.Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-84983369344694763462008-04-15T18:33:00.020+01:002008-06-13T19:44:02.618+01:00LEAD and LAG Analytical Functions in OracleThe <code>LEAD</code> and <code>LAG</code> analytical functions provide access to more than one row of a table at the same time, without the need for a self join. <code>LEAD</code> allows access to a row at a physical offset beyond the current row and <code>LAG</code> allows access to a row at a physical offset prior to the current row.<br /><br />Here are a couple of examples which show <code>LEAD</code> and <code>LAG</code> in action.<h4>Upper and Lower Bounds</h4>The <code>SALGRADE</code> table, supplied as part of the Oracle demo schema, holds lower and upper salary values for a grade as follows.<blockquote><pre>SELECT *<br />FROM salgrade<br />ORDER BY grade;</pre><pre>GRADE LOSAL HISAL <br />----------- ------------ ---------- <br />1 1 1200 <br />2 1201 1400 <br />3 1401 2000 <br />4 2001 3000 <br />5 3001 9999</pre><pre>5 rows selected.</pre></blockquote>Another way to maintain this information is to store only the upper bound and derive the lower bound from the the previous upper bound plus one.<br /><br />First, we create our new table containing just the grade and high salary values.<blockquote><pre>CREATE TABLE new_salgrade AS<br />SELECT grade, hisal<br />FROM salgrade;</pre><pre>CREATE TABLE succeeded.</pre><br/><pre>SELECT *<br />FROM new_salgrade;</pre><pre>GRADE HISAL <br />----------- -----------<br />1 1200 <br />2 1400 <br />3 2000 <br />4 3000 <br />5 9999</pre><pre>5 rows selected.</pre></blockquote>Using <code>LAG</code> we can construct a view which will produce the data in the same format as the original table.<blockquote><pre>CREATE VIEW salgrade_v AS<br />SELECT <br /> grade,<br /> LAG(hisal, 1, 0) OVER (ORDER BY hisal) + 1 AS losal,<br /> hisal<br />FROM new_salgrade;</pre><pre>CREATE VIEW succeeded.</pre><br/><pre>SELECT * <br />FROM salgrade_v<br />ORDER BY grade;</pre><pre>GRADE LOSAL HISAL <br />---------- ---------- ----------- <br />1 1 1200 <br />2 1201 1400 <br />3 1401 2000 <br />4 2001 3000 <br />5 3001 9999</pre><pre>5 rows selected.</pre></blockquote>One advantage with this technique is that the table is easier to maintain. If a salary grade is changed, then only the <code>HISAL</code> on one row needs to be updated. The <code>LOSAL</code> on the next row is adjusted automatically.<blockquote><pre>UPDATE new_salgrade<br />SET hisal = 2200<br />WHERE grade = 3;</pre><pre>1 rows updated</pre><br/><pre>COMMIT;</pre><pre>Commit complete.</pre><br/><pre>SELECT * <br />FROM salgrade_v;</pre><pre>GRADE LOSAL HISAL <br />----------- ---------- -------------<br />1 1 1200 <br />2 1201 1400 <br />3 1401 <strong>2200</strong> <br />4 <strong>2201</strong> 3000 <br />5 3001 9999</pre><pre>5 rows selected.</pre></blockquote><h2 style="font-size:100%;color:#2D6E89">External Log File</h2>The next example processes metric data which has been collected and recorded in an external log file. <br /><br />The contents of the file look like this.<blockquote><pre>21-sep-07 13:51:12<br />type: metric<br />low: 33<br />avg: 121<br />high: 230</pre><pre>21-sep-07 19:23:55<br />type: metric<br />low: 66<br />avg: 75<br />high: 120</pre><pre>21-sep-07 23:40:17<br />type: metric<br />low: 13<br />avg: 98<br />high: 554</pre><pre>22-sep-07 01:39:22<br />type: metric<br />low: 121<br />avg: 375<br />high: 520</pre><pre>22-sep-07 10:57:15<br />type: metric<br />low: 43<br />avg: 99<br />high: 148</pre></blockquote>Suppose we are required to produce the following report with one line for each metric sample.<blockquote><pre>recorded_at duration low avg high<br />-------------------- ------------- ---- ---- ----<br />dd-mon-yy hh:mi:ss 9 9:99:99.9 999 999 999<br />dd-mon-yy hh:mi:ss 9 9:99:99.9 999 999 999<br />dd-mon-yy hh:mi:ss 9 9:99:99.9 999 999 999</pre></blockquote>We will use an <a href="http://blog.lishman.com/2008/03/oracle-external-tables.html">external table </a>to access the contents of the file.<blockquote><pre>CREATE OR REPLACE DIRECTORY tmp_dir AS '/tmp';</pre><pre>Directory created.</pre><br/><pre>CREATE TABLE event_log (<br /> rec VARCHAR2(200))<br /> ORGANIZATION EXTERNAL(<br /> DEFAULT DIRECTORY tmp_dir<br /> ACCESS PARAMETERS(<br /> records delimited by newline <br /> fields <br /> missing field VALUES ARE NULL<br /> reject rows with all null fields)<br /> LOCATION('event.log'));</pre><pre>Table created.</pre><br/><pre>SELECT *<br />FROM event_log<br />WHERE rownum <=10;</pre><pre>REC<br />-----------------------------<br />21-sep-07 13:51:12<br />type:<br />metric<br />low: 33<br />avg: 121<br />high: 230<br />21-sep-07 19:23:55<br />type:<br />metric<br />low: 66<br />avg: 75<br />high: 120</pre><pre>10 rows selected.</pre></blockquote>Now we can start to construct our SQL statement. <br /><br />First, we will remove any labels from the records using <code>REGEXP_REPLACE</code>. Labels are identified as a single word at the start of a line followed by a colon and a space.<blockquote><pre>SELECT REGEXP_REPLACE(rec,'^[[:alpha:]]*: ', null) rec<br />FROM event_log<br />WHERE rownum <= 10;</pre><pre>REC<br />-----------------------------<br />21-sep-07 13:51:12<br />metric<br />33<br />121<br />230<br />21-sep-07 19:23:55<br />metric<br />66<br />75<br />120</pre><pre>10 rows selected</pre></blockquote>Next, we use the <code>LEAD</code> function to to merge several records from the file into a single row for each sample.<blockquote><pre>SELECT <br /> TO_TIMESTAMP(recorded_at, 'dd-mon-yy hh24:mi:ss') recorded_at,<br /> TO_NUMBER(low) low, <br /> TO_NUMBER(avg) avg, <br /> TO_NUMBER(high) high<br />FROM (<br /> SELECT <br /> rec,<br /> LAG(rec, 1, null) OVER (ORDER BY rownum) recorded_at,<br /> LEAD(rec, 1, null) OVER (ORDER BY rownum) low,<br /> LEAD(rec, 2, null) OVER (ORDER BY rownum) avg,<br /> LEAD(rec, 3, null) OVER (ORDER BY rownum) high<br /> FROM (<br /> SELECT REGEXP_REPLACE(rec,'^[[:alpha:]]*: ', null) rec<br /> FROM event_log<br /> )<br />)<br />WHERE rec = 'metric';</pre><pre>recorded_at low avg high<br />------------------ ------ ------ ------<br />21-SEP-07 13.51.12 33 121 230 <br />21-SEP-07 19.23.55 66 75 120 <br />21-SEP-07 23.40.17 13 98 554 <br />22-SEP-07 01.39.22 121 375 520 <br />22-SEP-07 10.57.15 43 99 148</pre><pre>5 rows selected.</pre></blockquote>And finally, now we have all the required columns with the correct data types, we can use the <code>LAG</code> function to calculate the duration between the current sample and the previous one.<blockquote><pre>SELECT<br /> recorded_at,<br /> recorded_at - LAG(recorded_at, 1, null) <br /> OVER (ORDER BY recorded_at) duration,<br /> low,<br /> avg,<br /> high<br />FROM (<br /> SELECT <br /> TO_TIMESTAMP(recorded_at, 'dd-mon-yy hh24:mi:ss') recorded_at,<br /> TO_NUMBER(low) low, <br /> TO_NUMBER(avg) avg, <br /> TO_NUMBER(high) high<br /> FROM (<br /> SELECT <br /> rec,<br /> LAG(rec, 1, null) OVER (ORDER BY rownum) recorded_at,<br /> LEAD(rec, 1, null) OVER (ORDER BY rownum) low,<br /> LEAD(rec, 2, null) OVER (ORDER BY rownum) avg,<br /> LEAD(rec, 3, null) OVER (ORDER BY rownum) high<br /> FROM (<br /> SELECT REGEXP_REPLACE(rec,'^[[:alpha:]]*: ', null) rec<br /> FROM event_log<br /> )<br /> )<br /> WHERE rec = 'metric'<br />);</pre><pre>recorded_at duration low avg high<br />------------------- ------------ ------ ----- ------<br />21-SEP-07 13.51.12 33 121 230 <br />21-SEP-07 19.23.55 0 5:32:43.0 66 75 120 <br />21-SEP-07 23.40.17 0 4:16:22.0 13 98 554 <br />22-SEP-07 01.39.22 0 1:59:5.0 121 375 520 <br />22-SEP-07 10.57.15 0 9:17:53.0 43 99 148</pre><pre>5 rows selected.</pre></blockquote>So there we have it. In this example, we read an external file, stripped out the labels and and used <code>LEAD</code> to pivot several rows into columns. Then we used <code>LAG</code> to calculate the duration between adjacent samples.Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-38469137524151239772008-04-05T13:30:00.011+01:002008-06-13T19:25:55.066+01:00Using SOUNDEX with a Function Based Index in OracleThe <code>SOUNDEX</code> function returns a phonetic representation of an English word. It can be used to compare words that are spelled differently, but sound alike.<br /><br />For example.<blockquote><pre>SELECT ename<br />FROM emp<br />WHERE SOUNDEX(ename) = SOUNDEX('janes');</pre><pre>ENAME <br />---------- <br />JONES <br />JAMES</pre><pre>2 rows selected</pre></blockquote>However, a regular index on the <code>ENAME</code> column will not be used by Oracle if we include the <code>SOUNDEX</code> function.<blockquote><pre>CREATE INDEX emp_ix1 ON emp(ename);</pre><pre>Index created.</pre><br/><pre>SELECT ename<br />FROM emp<br />WHERE SOUNDEX(ename) = SOUNDEX('janes');</pre><pre>Execution Plan<br />----------------------------------------------------------<br /> 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=1 Bytes=6)<br /> 1 0 <strong>TABLE ACCESS (FULL) OF 'EMP' (TABLE)</strong> (Cost=4 Card=1 Bytes=6)</pre></blockquote>Instead, we must use a function based index.<blockquote><pre>CREATE INDEX emp_ix2 ON emp(SOUNDEX(ename));</pre><pre>Index created.</pre><br/><pre>SELECT ename<br />FROM emp<br />WHERE SOUNDEX(ename) = SOUNDEX('janes');</pre><pre>Execution Plan<br />----------------------------------------------------------<br /> 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1 Bytes=6)<br /> 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (TABLE) (Cost=2 Card=1 Bytes=6)<br /> 2 1 <strong>INDEX (RANGE SCAN) OF 'EMP_IX2' (INDEX)</strong> (Cost=1 Card=1)</pre></blockquote>This time, the function based index is used to retrieve the data.<br /><br />Incidentally, a Java equivalent is included as part of the Apache Commons project.<blockquote><pre>import org.apache.commons.codec.language.Soundex;</pre><pre>..</pre><pre>public boolean soundsLike(String firstWord, String secondWord) {<br /> Soundex s = new Soundex();<br /> return s.soundex(firstWord).equals(s.soundex(secondWord));<br />}</pre></blockquote>Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-48902218526733192852008-04-05T13:16:00.004+01:002008-04-05T13:29:44.285+01:00RLWRAP<code>rlwrap</code> is a Linux command which provides line editing, persistent history and word completion for command line tools such as sqlplus and rman. <br /><br />Simply type <code>rlwrap</code> before your command as follows.<blockquote><pre>rlwrap sqlplus scott/tiger</pre></blockquote>Now you can use the up and down arrow keys to retrieve your history, even across invocations of the command. You can also edit the line using the left and right keys, deleting and inserting text as required.<br /><br />The -c option enables file completion which can be useful when running scripts. If you type @ and the start of a file name and press tab, <code>rlwrap</code> will complete the name of the file for you. <br /><br />You can also specify a file containing your own list of words to be auto-completed.<blockquote><pre>rlwrap -c -f mywordlist.txt sqlplus scott/tiger</pre></blockquote>This file could contain a list of tables and views in your schema, for example.Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-91085643927452304292008-03-28T19:52:00.009Z2008-06-13T19:20:53.576+01:00Using ROWNUM in Oracle<code>ROWNUM</code> is an Oracle pseudo column which numbers the rows in a result set.<blockquote><pre>SELECT rownum, table_name<br />FROM user_tables;</pre><pre>ROWNUM TABLE_NAME <br />------------- -----------------<br />1 EMP<br />2 DEPT<br />3 BONUS<br />4 SALGRADE<br />5 DUMMY</pre><pre>5 rows selected</pre></blockquote>Here is a summary of how <code>ROWNUM</code> can be used.<h2 style="font-size:100%;color:#2D6E89">Limiting Rows</h2><code>ROWNUM</code> can be used to limit the number of rows returned by a query in a similar way to <code>LIMIT</code> in Postgres and MySql, <code>TOP</code> in SQL Server and <code>FETCH FIRST</code> in DB2.<blockquote><pre>SELECT rownum, table_name<br />FROM user_tables<br />WHERE rownum <=3;</pre><pre>ROWNUM TABLE_NAME <br />------------- -----------------<br />1 EMP <br />2 DEPT <br />3 BONUS</pre><pre>3 rows selected</pre></blockquote><h2 style="font-size:100%;color:#2D6E89">ROWNUM with DML</h2>The use of <code>ROWNUM</code> is not restricted to select statements. It can be used with DML statements that update the database too.<blockquote><pre>CREATE TABLE o AS<br />SELECT *<br />FROM all_objects<br />WHERE rownum <= 1000;</pre><pre>Table created</pre><br/><pre>UPDATE o<br />SET object_id = rownum,<br />created = created + INTERVAL '1' MINUTE * rownum <br />WHERE rownum <= 100;</pre><pre>100 rows updated</pre><br/><pre>DELETE FROM o<br />WHERE OWNER = 'SYS'<br />AND rownum = 1;</pre><pre>1 rows deleted</pre></blockquote><code>ROWNUM</code> is particularly useful, when used in conjunction with the <code>CONNECT BY LEVEL</code> clause, for creating arbitrary rows in the database. See the article on <a href="http://blog.lishman.com/2008/02/how-to-generate-rows-in-oracle_2477.html">generating rows in Oracle </a>for more details.<h2 style="font-size:100%;color:#2D6E89">Offsetting Rows</h2>Rows can also be skipped at the beginning of a result set using <code>ROWNUM</code>.<blockquote><pre>SELECT rnum, table_name<br />FROM<br /> (SELECT rownum rnum, table_name<br /> FROM user_tables)<br />WHERE rnum > 2;</pre><pre>RNUM TABLE_NAME <br />-------- ---------------- <br />3 SALGRADE <br />4 DUMMY <br />5 DEPT</pre><pre>3 rows selected</pre></blockquote>You will notice that an inline view has been introduced to transform the <code>ROWNUM</code> pseudo column into a 'real' column before we do the comparison. <br /><br />It is tempting to write the above SQL as follows.<blockquote><pre>SELECT table_name<br />FROM user_tables<br />WHERE rownum > 2;</pre><pre>TABLE_NAME <br />------------------------------</pre><pre>0 rows selected</pre></blockquote>However, this query will always return zero rows, regardless of the number of rows in the table.<br /><br />To explain this behaviour, we need to understand how Oracle processes <code>ROWNUM</code>. When assigning <code>ROWNUM</code> to a row, Oracle starts at 1 and only only increments the value when a row is selected; that is, when all conditions in the WHERE clause are met. Since our condition requires that <code>ROWNUM</code> is greater than 2, no rows are selected and <code>ROWNUM</code> is never incremented beyond 1.<br /><br />The bottom line is that conditions such as the following will work as expected.<br /><br /><CODE>.. WHERE rownum = 1;<br /><br />.. WHERE rownum <= 10;</CODE><br /><br /><br />While queries with these conditions will always return zero rows.<br /><br /><CODE>.. WHERE rownum = 2;<br /><br />.. WHERE rownum > 10;</CODE><h2 style="font-size:100%;color:#2D6E89">Top-n Query</h2>Typically, a top-n query sorts data into the required sequence and then limits the output to a subset of rows.<br /><br />For example, suppose we wish to retrieve the top three earners from our employee table.<blockquote><pre>SELECT ename, sal<br />FROM (<br /> SELECT ename, sal<br /> FROM emp<br /> ORDER BY sal DESC)<br />WHERE rownum <=3;</pre><pre>ENAME SAL <br />---------- ---------<br />KING 5000 <br />SCOTT 3000 <br />FORD 3000</pre><pre>3 rows selected</pre></blockquote>The inline view (the inner select) sorts the rows and passes the result up to the outer select. The outer select then limits the output to three rows.<br /><br />It may seem more natural to use the following SQL.<blockquote><pre>SELECT ename, sal<br />FROM emp<br />WHERE rownum <=3<br />ORDER BY sal DESC;</pre><pre>ENAME SAL <br />---------- ---------------------- <br />ALLEN 1600 <br />WARD 1250 <br />SMITH 800</pre><pre>3 rows selected</pre></blockquote>However, this does not give us the result we want because Oracle assigns the <code>ROWNUM</code> values to the rows before it does the sort.<br /><br />In this example, Oracle will retrieve three rows from the table, any three rows, and sort only these three rows. We really need Oracle to sort all the rows and then return the first three. The inline view will ensure that this will happen.<h2 style="font-size:100%;color:#2D6E89">Sort Performance</h2>Limiting rows on a sorted result set using <code>ROWNUM</code> can also provide an added performance benefit. Rather than physically sorting all the rows to retrieve just the top few, Oracle maintains an array which contains just the highest or the lowest values (depending on whether we specified ASC or DESC in the ORDER BY clause). The size of the array will be the number of rows we wish to return. As rows are processed, only the highest (or lowest) values are retained in the array. All other rows are discarded.<h2 style="font-size:100%;color:#2D6E89">Pagination</h2>Next, we will see how <code>ROWNUM</code> is used to select a range of rows from within a result set. This is useful if we are to provide pagination on a web screen, for example.<br /><br />Suppose we are paging through the employee table in name order and we wish to display rows six to ten inclusive.<blockquote><pre>SELECT rnum, ename, job<br />FROM<br /> (SELECT /*+ FIRST_ROWS(10) */ rownum rnum, ename, job<br /> FROM<br /> (SELECT ename, job<br /> FROM emp<br /> ORDER BY ename)<br /> WHERE rownum <= 10<br /> )<br />WHERE rnum > 5;</pre><pre>RNUM ENAME JOB <br />-------- ---------- --------- <br />6 JAMES CLERK <br />7 JONES MANAGER <br />8 KING PRESIDENT <br />9 MARTIN SALESMAN <br />10 MILLER CLERK</pre><pre>5 rows selected</pre></blockquote>We use nested inline views to retrieve and sort the data and then apply the range check using <code>ROWNUM</code>. We have split the upper and lower bound check, which allows Oracle to use <code>COUNT(STOPKEY)</code> in the execution plan when checking for <code>ROWNUM <= 10</code>. This is a performance optimization which, along with the sorting optimization described earlier, will ensure that our query runs efficiently as the table grows. <br /><br />The <code>FIRST_ROWS(n)</code> hint also tells Oracle to optimize the query so that the first n rows are returned as quickly as possible.<h2 style="font-size:100%;color:#2D6E89">Summary</h2><code>ROWNUM</code> provides a mechanism for returning a subset or range of rows from a query. It can be misleading at first if not properly understood but, once mastered, is invaluable for limiting result set output for pagination and top-n style queries.<br /><br />For more information on <code>ROWNUM</code>, see <a href="http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html">Tom Kytes article </a>on OTN.Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-83358925051462429322008-03-22T11:01:00.013Z2008-03-24T20:21:11.196ZHow to monitor Oracle index usageWe can use the following command to determine whether a particular index is being used by Oracle.<blockquote><pre>ALTER INDEX &LT;index_name&GT; MONITORING USAGE;</pre></blockquote>If the specified index is used before monitoring is disabled, then the USED column of the relevant V$OBJECT_USAGE row, is set to YES.<br /><br />Let's see this in action. <br /><br />First, we create a table using the <a href="http://blog.lishman.com/2008/02/how-to-generate-rows-in-oracle_2477.html"><code>CONNECT BY LEVEL</code></a> clause to generate a thousand rows, with an index on the ID column.<blockquote><pre>CREATE TABLE index_usage AS<br />SELECT rownum id, 'xyz' || rownum name<br />FROM dual<br />CONNECT BY LEVEL <= 1000;</pre><pre>Table created.</pre><br/><pre>CREATE INDEX index_usage_ix1 ON INDEX_USAGE (id);</pre><pre>Index created.</pre></blockquote>Then we start monitoring the usage of the index and display the results from the V$OBJECT_USAGE view.<blockquote><pre>ALTER INDEX index_usage_ix1 MONITORING USAGE;</pre><pre>Index altered.</pre><br/><pre>SELECT monitoring, used, start_monitoring, end_monitoring<br />FROM v$object_usage<br />WHERE index_name = 'INDEX_USAGE_IX1';</pre><pre>MONITORING USED START_MONITORING END_MONITORING <br />---------- ---- ------------------- ------------------- <br /><strong>YES NO</strong> 03/22/2008 11:44:20</pre><pre>1 rows selected</pre></blockquote>Next, we access the table using the index, disable the monitoring and display the results again.<blockquote><pre>SELECT *<br />FROM index_usage<br />WHERE id = 123;</pre><pre>ID NAME <br />---------------------- -------------------<br />123 xyz123</pre><pre>1 rows selected</pre><pre>Execution Plan<br />----------------------------------------------------------<br /> 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1 Bytes=10)<br /> 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'INDEX_USAGE' (TABLE) (Cost=2 Card=1 Bytes=10)<br /> 2 1 INDEX (RANGE SCAN) OF 'INDEX_USAGE_IX1' (INDEX) (Cost=1 Card=1)</pre><br/><pre>ALTER INDEX index_usage_ix1 NOMONITORING USAGE;</pre><pre>Index altered.</pre></br><pre>SELECT monitoring, used,start_monitoring, end_monitoring<br />FROM v$object_usage<br />WHERE index_name = 'INDEX_USAGE_IX1';</pre><pre>MONITORING USED START_MONITORING END_MONITORING <br />---------- ---- ------------------- ------------------- <br /><strong>NO YES</strong> 03/22/2008 11:44:20 03/22/2008 11:47:35</pre><pre>1 rows selected</pre></blockquote>This shows us that the index was used during our monitoring period.<br /><br />This feature can help us to identify unused indexes which are candidates for removal. Removing unnecessary indexes can improve performance by reducing overhead during updates.Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-28730840520100790802008-03-15T10:21:00.035Z2008-03-15T14:39:40.686ZJDBC Fetch Size<p>Fetch size is a performance hint which tells the JDBC driver how many rows should be fetched from the database when more rows are needed by the client. Increasing this value can improve overall query performance, especially with large result sets.<br /><br />The required fetch size can be set on a <code>Statement</code> or a <code>ResultSet</code> object using the <code>setFetchSize</code> method. If used with a <code>Statement</code> then all <code>ResultSets</code> returned by that <code>Statement</code> will have the same fetch size. The default is ten.<br /><br />Setting a JDBC fetch size that is too large or too small can degrade performance. Usually, a fetch size of one half or one quarter of the total expected result size is optimal.<h2 style="font-size:100%;color:#2D6E89">Example</h2><p>To see the effect of using different fetch sizes with Oracle, we first create a table containing a million rows using the <a href="http://blog.lishman.com/2008/02/how-to-generate-rows-in-oracle_2477.html"><code>CONNECT BY LEVEL</code></a> clause.<blockquote><pre>CREATE TABLE one_million_rows AS<br />SELECT rownum id, rpad('x', 20, 'x') || rownum a, sysdate b<br />FROM dual<br />CONNECT BY LEVEL <= 1000000;</pre><pre>Table created.</pre></blockquote><p>Next, we call the following method several times with different fetch sizes and record the elapsed time taken to retrieve the data. SQL tracing is also enabled, allowing us to create a TKPROF report for each run.<blockquote><pre>public void testFetchSize(int fetchSize) throws SQLException {</PRE><PRE> Connection con = ConnectionFactory.getNewConnection();<br /> Statement stmt = con.createStatement();</PRE><PRE> stmt.execute(<br /> "ALTER SESSION SET tracefile_identifier = " +<br /> "FETCH_SIZE_" + fetchSize);<br /> stmt.execute(<br /> "ALTER SESSION SET events " +<br /> "'10046 trace name context forever, level 12'");</PRE><PRE> ResultSet rs = stmt.executeQuery(<br /> "SELECT /* FETCH SIZE " + fetchSize + " */ * " +<br /> "FROM one_million_rows");<br /> rs.setFetchSize(fetchSize);</PRE><PRE> long start = System.currentTimeMillis();<br /> while(rs.next());<br /> System.out.printf("Fetch size %d took %4.2f seconds\n", <br /> fetchSize, <br /> (System.currentTimeMillis() - start) / 1000f);</PRE><PRE> stmt.execute(<br /> "ALTER SESSION SET events '10046 trace name context off'");<br /> <br />}</pre></blockquote><p>For this exercise we have used event 10046 to enable SQL tracing (rather than sql_trace) because we need to capture the wait events in addition to the execution statistics. We will see why later. <br /><br />Note: Starting with Oracle 10g release 2, the preferred method of enabling SQL tracing is with the <code>DBMS_SESSION</code> package. The <code>ALTER SESSION SET EVENTS</code> command shown above can now be enabled with the following procedure call.<blockquote><pre>DBMS_SESSION.SESSION_TRACE_ENABLE(waits=>TRUE, binds=>TRUE);</pre></blockquote><h2 style="font-size:100%;color:#2D6E89">Results</h2>Here is a summary of the results.<br/><br/><TABLE WIDTH=200 BORDER=1 BGCOLOR="#FFFFFF" CELLPADDING=3 CELLSPACING=1><COL width=100><COL width=100><TR ALIGN=CENTER><TD width=100>Fetch Size</TD ALIGN=CENTER><TD width=100>Elapsed Time (secs)</TD></TR><TR VALIGN=MIDDLE ALIGN=CENTER><TD width=100>10</TD><TD width=100>93.78</TD></TR><TR VALIGN=MIDDLE ALIGN=CENTER><TD width=100>50</TD><TD width=100>29.48</TD></TR><TR VALIGN=MIDDLE ALIGN=CENTER><TD width=100>100</TD><TD width=100>16.76</TD></TR><TR VALIGN=MIDDLE ALIGN=CENTER><TD width=100>200</TD><TD width=100>9.58</TD></TR><TR VALIGN=MIDDLE ALIGN=CENTER><TD width=100>500</TD><TD width=100>6.22</TD></TR><TR VALIGN=MIDDLE ALIGN=CENTER><TD width=100>1000</TD><TD width=100>4.86</TD></TR><TR VALIGN=MIDDLE ALIGN=CENTER><TD width=100>5000</TD><TD width=100>3.91</TD></TR><TR VALIGN=MIDDLE ALIGN=CENTER><TD width=100>10000</TD><TD width=100>3.79</TD></TR><TR VALIGN=MIDDLE ALIGN=CENTER><TD width=100>50000</TD><TD width=100>3.93</TD></TR><TR VALIGN=MIDDLE ALIGN=CENTER><TD width=100>100000</TD><TD width=100>3.98</TD></TR></TABLE><br />Clearly, the fetch size has a considerable impact on the overall performance of our query. The difference between the fastest, which was under four seconds, and the slowest, which was over a minute and a half, is certainly not trivial.<br /><br />We can also see that performance starts to degrade slightly with a fetch size of somewhere between ten and fifty thousand. From these results, it would appear that ten thousand is a reasonable value for our fetch size.<h2 style="font-size:100%;color:#2D6E89">TKPROF</h2>We can also see the difference in performance with the TKPROF reports produced during the tests. Here, we compare a fetch size of ten to a fetch size of ten thousand.<blockquote><pre>SELECT /* <strong>FETCH SIZE 10</strong> */ * <br />FROM<br /> one_million_rows</pre><pre><br />call count cpu elapsed query rows<br />------- ------ -------- ---------- ---------- ----------<br />Parse 1 0.00 0.00 0 0<br />Execute 1 0.00 0.00 0 0<br />Fetch <strong>100001 7.40 6.77</strong> <strong>101220</strong> 1000000<br />------- ------ -------- ---------- ---------- ----------<br />total 100003 7.40 6.77 101220 1000000</pre><pre>Elapsed times include waiting on following events:<br /> Event waited on Times Max. Wait Total Waited<br /> -------------------------------- Waited ---------- ------------<br /> SQL*Net message to client 100001 0.00 0.37<br /> SQL*Net message from client 100001 0.05 <strong>65.21</strong><br />************************************************************************</pre></blockquote><br/><blockquote><pre>SELECT /* <strong>FETCH SIZE 10000</strong> */ *<br />FROM<br /> one_million_rows</pre><pre><br />call count cpu elapsed query rows<br />------- ------ -------- ---------- ---------- ----------<br />Parse 1 0.00 0.00 0 0<br />Execute 1 0.00 0.00 0 0<br />Fetch <strong>101 1.98 3.03</strong> <strong>11573</strong> 1000000<br />------- ------ -------- ---------- ----------- -----------<br />total 103 1.98 3.03 11573 1000000</pre><pre>Elapsed times include waiting on following events:<br /> Event waited on Times Max. Wait Total Waited<br /> -------------------------------- Waited ---------- ------------<br /> SQL*Net message to client 101 0.00 0.00<br /> SQL*Net message from client 101 0.11 <strong>0.77</strong><br /> SQL*Net more data to client 17740 0.07 1.35<br />************************************************************************</pre></blockquote><p>Note: The report has been condensed to fit on the page.<br /><br />Increasing the fetch size to ten thousand has more than halved the CPU and elapsed times and reduced consistent (logical) reads by almost ninety percent. However, the most significant gain is in the reduction of time spent waiting for a response from the client, from over a minute to less than a second.<a href="http://blog.lishman.com/2008/02/how-to-generate-rows-in-oracle_2477.html"></a><a href="http://blog.lishman.com/2008/02/how-to-generate-rows-in-oracle_2477.html"></a>Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-13450986547990378182008-03-09T12:53:00.013Z2008-03-15T10:41:59.660ZOracle External TablesAn external table enables you to access data in a flat file as if it were in a table in the database. <br /><br />External tables can be used to read the contents of a log file or spread sheet export, for example, or to import information into a data warehouse as part of an ETL operation.<br /><br /><h2 style="font-size:100%;color:#2D6E89">A Simple Example</h2>Suppose we have a file containing city populations in our /tmp directory and we wish to access this data from within our database.<blockquote><pre>Tokyo,Japan,33600000<br />Seoul,South Korea,23400000<br />Mexico City,Mexico,22400000<br />New York,USA,21900000<br />Bombay,India,21600000<br />Delhi,India,21500000<br />Sao Paulo,Brazil,20600000<br />Los Angeles,USA,18000000<br />Shanghai,China,17500000<br />Osaka,Japan,16700000</pre></blockquote><p>First, we create a DIRECTORY object which will reference the /tmp directory in our file system.<blockquote><pre>CREATE OR REPLACE DIRECTORY tmp_dir AS '/tmp';</pre><pre>Directory created.</pre></blockquote><p>Note: This command format will work with Windows and Linux.<br /><br />Now, we create an external table by specifying the name of the file we wish to query and the directory where it can be located. We will rely on the default values of comma for a field separator and new line as the record delimiter, to describe our data.<blockquote><pre>CREATE TABLE pop_ext (<br /> city VARCHAR2(30),<br /> country VARCHAR2(30),<br /> population NUMBER)<br /> ORGANIZATION EXTERNAL<br /> (DEFAULT DIRECTORY tmp_dir <br /> LOCATION('pop.txt'));</pre><pre>Table created.</pre></blockquote><p>The file can now be queried as if it were a table in the database. <br /><blockquote><pre>SELECT * <br />FROM pop_ext;</pre><pre>CITY COUNTRY POPULATION <br />----------------- ---------------- ------------------<br />Tokyo Japan 33600000 <br />Seoul South Korea 23400000 <br />Mexico City Mexico 22400000 <br />New York USA 21900000 <br />Bombay India 21600000 <br />Delhi India 21500000 <br />Sao Paulo Brazil 20600000 <br />Los Angeles USA 18000000 <br />Shanghai China 17500000 <br />Osaka Japan 16700000</pre><pre>10 rows selected</pre></blockquote><p>External tables are read-only. An attempt to update the table will result in the following error message, complete with some useful advice on the appropriate action to take.<blockquote><pre>UPDATE pop_ext <br />SET population = 0;</pre><pre>30657.0000 - "operation not supported on external organized table"<br />*Cause: User attempted on operation on an external table which is<br /> not supported.<br />*Action: Don't do that!</pre></blockquote><p>We could, of course, create a 'real' table containing the city populations if we wish to maintain the data.<blockquote><pre>CREATE TABLE pop AS<br />SELECT * FROM pop_ext;</pre><pre>Table created.</pre></blockquote><h2 style="font-size:100%;color:#2D6E89">A More Detailed Example</h2>In this example we have a file which contains bank holidays in the UK for 2007 to 2009.<blockquote><pre>"Holiday" 2007 2008 2009<br />"New Year" 01/01/07 01/01/08 01/01/09<br />"Good Friday" 06/04/07 21/03/08 10/04/09<br />"Easter Monday" 09/04/07 24/03/08 13/04/09<br />"Early May" 07/05/07 05/05/08 04/05/09<br />"(First Monday in May)"<br />"Spring Bank" 28/05/07 26/05/08 25/05/09<br />"(Last Monday in May)"<br />"Summer" 27/08/07 25/08/08 31/08/09<br />"(Last Monday in August)"<br />"Christmas Day" 25/12/07 25/12/08 25/12/09<br />"Boxing Day" 26/12/07 26/12/08 28/12/09</pre></blockquote><p>This time, we will include a few more options in the table definition to describe the format of the data and to exclude the comment lines (the text surrounded by brackets).<blockquote><pre>CREATE OR REPLACE DIRECTORY log_dir AS '/logs';</pre><pre>Directory created.</pre><pre><br />CREATE TABLE bank_hol_ext(<br /> hol_desc VARCHAR2(50),<br /> hol_2007 DATE,<br /> hol_2008 DATE,<br /> hol_2009 DATE<br /> )<br /> ORGANIZATION EXTERNAL<br /> (TYPE oracle_loader <br /> DEFAULT DIRECTORY tmp_dir <br /> ACCESS PARAMETERS(<br /> records delimited BY newline <br /> load when (hol_2007 != BLANKS)<br /> badfile log_dir:'bank_holidays.bad'<br /> logfile log_dir:'bank_holidays.log'<br /> discardfile log_dir:'bank_holidays.dsc'<br /> skip 1<br /> fields terminated by '\t' <br /> optionally enclosed by '"' <br /> missing field values are null<br /> (<br /> hol_desc,<br /> hol_2007 DATE "dd/mm/yy",<br /> hol_2008 DATE "dd/mm/yy",<br /> hol_2009 DATE "dd/mm/yy"<br /> )<br /> ) <br />LOCATION('hols.txt')) <br />REJECT LIMIT UNLIMITED;</pre><pre>Table created.</pre><pre><br />SELECT * <br />FROM bank_hol_ext;</pre><pre><br />HOL_DESC HOL_2007 HOL_2008 HOL_2009<br />----------------------------- -------------- -------------- ------------<br />New Year's Day 01-jan-2007 01-jan-2008 01-jan-2009<br />Good Friday 06-apr-2007 21-mar-2008 10-apr-2009<br />Easter Monday 09-apr-2007 24-mar-2008 13-apr-2009<br />Early May Bank Holiday 07-may-2007 05-may-2008 04-may-2009<br />Spring Bank Holiday 28-may-2007 26-may-2008 25-may-2009<br />Summer Bank Holiday 27-aug-2007 25-aug-2008 31-aug-2009<br />Christmas Day 25-dec-2007 25-dec-2008 25-dec-2009<br />Boxing Day 26-dec-2007 26-dec-2008 28-dec-2009</pre><pre>8 rows selected</pre></blockquote><p>Here, we have specified that the first row (the title) should be skipped, that fields may be surrounded by quotes and that a tab is used as the field delimiter. We also tell Oracle the format of the date fields and that we don't wish to impose a reject limit for bad records. <br /><br />The logfile, badfile and discardfile clauses specify the location and file names for these file types. The logfile contains messages generated by the external tables utility while it is accessing data in the data file. The badfile contains the rows that could not be loaded due to errors (for example, a date was not valid for the format model) and the discardfile contains the rows excluded by the condition in the LOAD WHEN clause.<br /><br />The default behaviour is to create these files in the same directory as the data file. However, if we only have read access to this directory, we can use these clauses to create the files in another location. In our example, we have chosen to write the files to the /logs directory.<br /><br />If we wish, we can create another external table to view the contents of these files. Here, we define a table to query the log file, which contains a single column with a fixed width of four thousand bytes.<blockquote><pre>CREATE TABLE bank_hol_rej_ext (<br /> rec VARCHAR2(4000))<br /> ORGANIZATION EXTERNAL<br /> (DEFAULT DIRECTORY log_dir <br /> ACCESS PARAMETERS(<br /> records delimited by newline <br /> fields <br /> missing field VALUES ARE NULL<br /> (<br /> rec position(1:4000)<br /> )<br /> )<br /> LOCATION('bank_holidays.log'));</pre><pre>Table created.</pre><pre><br />SELECT *<br />FROM bank_hol_rej_ext<br />WHERE rownum <=10 <br />AND rec IS NOT NULL;</pre><pre>REC<br />---------------------------------------------------------------<br /> LOG file opened at 03/04/08 21:23:35<br />Field Definitions for table BANK_HOL_EXT<br /> Record format DELIMITED BY NEWLINE<br /> Data in file has same endianness as the platform<br /> Rows with all null fields are accepted<br /> Load when (HOL_2007 != BLANKS)<br /> Fields in Data Source:<br /> HOL_DESC CHAR (255)<br /> Terminated by " "<br /> Enclosed by """ and """</pre><pre>10 rows selected</pre></blockquote><p>We can change the table definition to reference a different file using the ALTER TABLE command.<blockquote><pre>ALTER TABLE bank_hol_rej_ext LOCATION ('bank_holidays.dsc');</pre><pre>Table altered.</pre></blockquote>This allows us to query the discard file with the same external table.<blockquote><pre>SELECT * <br />FROM bank_hol_rej_ext;</pre><pre>REC<br />----------------------------------<br />"(First Monday in May)"<br />"(Last Monday in May)"<br />"(Last Monday in August)"</pre><pre>3 rows selected.</pre></blockquote><h2 style="font-size:100%;color:#2D6E89">Data Pump</h2>Oracle 10g introduced the option to create an external table using data constructed from within the database.<blockquote><pre>CREATE TABLE emp_ext<br /> ORGANIZATION EXTERNAL<br /> (<br /> TYPE ORACLE_DATAPUMP<br /> DEFAULT DIRECTORY tmp_dir<br /> LOCATION ('emp.dp')<br /> )<br /> AS<br /> SELECT empno, ename, job, sal, comm, dname<br /> FROM emp JOIN dept USING (deptno);</pre><pre>Table created.</pre></blockquote><p>This SQL will create a flat file in the /tmp directory containing the results of the SELECT statement. Unfortunately, the file is a proprietary binary format which can only be read by Data Pump. It would be useful to have the option to produce a pure CSV format which could easily be read by other tools and utilities. However, the file is readable across platforms, which allows us to move it between Windows and Linux and create another external table on the target machine, to read the data.<blockquote><pre>CREATE TABLE emp_ext (<br /> empno NUMBER(4,0), <br /> ename VARCHAR2(10), <br /> job VARCHAR2(9), <br /> sal NUMBER(7,2), <br /> comm NUMBER(7,2), <br /> dname VARCHAR2(14)<br /> )<br /> ORGANIZATION EXTERNAL<br /> (<br /> TYPE ORACLE_DATAPUMP<br /> DEFAULT DIRECTORY tmp_dir<br /> LOCATION ('emp.dp')<br /> );</pre><pre>Table created.</pre></blockquote><p>This can be useful for transferring arbitrary data from one database to another where a database link is not available.Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.comtag:blogger.com,1999:blog-2737532577336836817.post-28410815156547785232008-03-01T02:00:00.004Z2008-03-09T21:01:25.806ZHow to generate rows in Oracle<br/>Here is a a simple and efficient way to generate arbitrary rows in Oracle.<br /><blockquote><pre>SELECT &lt;columns&gt;<br />FROM dual <br />CONNECT BY LEVEL <= n;</pre></blockquote>where <code>n</code> is the number of rows you wish to produce.<br /><br />For example.<br /><blockquote><pre>SELECT rownum <br />FROM dual <br />CONNECT BY LEVEL <= 5;</pre><pre> ROWNUM<br />----------<br /> 1<br /> 2<br /> 3<br /> 4<br /> 5</pre><pre>5 rows selected</pre></blockquote><h2 style="font-size:100%;color:#2D6E89">Test Data</h2><p>One use for this technique is to quickly generate the contents of a table. The following example uses a CTAS statement to create a table and populate it with test data.<br /><blockquote><pre>CREATE TABLE test_data(<br /> id PRIMARY KEY, <br /> group_id NOT NULL,<br /> created_at NOT NULL,<br /> text NOT NULL) AS<br />SELECT rownum,<br /> MOD(rownum, 5),<br /> TO_DATE('1-jan-07', 'dd-mon-yy') + INTERVAL '1' MINUTE * rownum,<br /> CAST ('xyz' || rownum AS VARCHAR2(50))<br />FROM dual <br />CONNECT BY LEVEL <= 10000;</pre><pre>Table Created.</pre><br/><pre>SELECT * <br />FROM test_data <br />WHERE id <= 10;</pre><pre> ID GROUP_ID CREATED_AT TEXT<br />---------- ---------- ------------------- -----------<br /> 1 1 01-jan-07 00:01:00 xyz1<br /> 2 2 01-jan-07 00:02:00 xyz2<br /> 3 3 01-jan-07 00:03:00 xyz3<br /> 4 4 01-jan-07 00:04:00 xyz4<br /> 5 0 01-jan-07 00:05:00 xyz5<br /> 6 1 01-jan-07 00:06:00 xyz6<br /> 7 2 01-jan-07 00:07:00 xyz7<br /> 8 3 01-jan-07 00:08:00 xyz8<br /> 9 4 01-jan-07 00:09:00 xyz9<br /> 10 0 01-jan-07 00:10:00 xyz10</pre><pre>10 rows selected.</pre></blockquote><p>We now have ten thousand rows of test data with a sequential number for the primary key, a group item which contains five distinct values, a date field which increments by one minute for each row and a text field containing unique values.<br /><h2 style="font-size:100%;color:#2D6E89">Sparse Data</h2><p>Another way we can utilise row generation is when dealing with sparse data.<br /><br />Suppose we wish to use the data from the previous example to show the total number of rows by day.<br /><blockquote><pre>SELECT TRUNC(created_at, 'dd') created_on, count(*)<br />FROM test_data<br />GROUP BY TRUNC(created_at, 'dd');</pre><pre>CREATED_ON COUNT(*)<br />---------- ----------<br />01-jan-07 1439<br />02-jan-07 1440<br />03-jan-07 1440<br />04-jan-07 1440<br />05-jan-07 1440<br />06-jan-07 1440<br />07-jan-07 1361</pre><pre>7 rows selected.</pre></blockquote><p>However, let us assume that we must display a row for every day within a specified date range, even if the row count is zero. This is often the case if the data is used to produce a graph, for example.<br /><br />It just so happens that our query produced a row per day between the 1st and 7th of January, but suppose the required date range is the 1st to the 10th of January and our table has a couple of days with no data. We will delete the rows for the 3rd and 5th of January to simulate this.<br /><blockquote><pre>DELETE FROM test_data <br />WHERE TRUNC(created_at, 'dd') IN (TO_DATE('3-jan-2007', 'dd-mon-yyyy'),<br /> TO_DATE('5-jan-2007', 'dd-mon-yyyy'));</pre><pre>2880 rows deleted.</pre><br/><pre>SELECT TRUNC(created_at, 'dd') created_on, count(*)<br />FROM test_data<br />GROUP BY TRUNC(created_at, 'dd');</pre><pre>CREATED_ON COUNT(*)<br />---------- ---------<br />01-jan-07 1439<br />02-jan-07 1440<br />04-jan-07 1440<br />06-jan-07 1440<br />07-jan-07 1361</pre><pre>5 rows selected.</pre></blockquote><p>Because we are dealing with sparse data, row counts for only five days are returned by our query. However, the requirement is to produce a row for each of the ten days in our range. We will use the CONNECT BY LEVEL clause, inline views and an outer join to provide the solution.<br /><br />First, we construct a query to return a row for each day in a specified range.<br /><blockquote><pre>SELECT TO_DATE('1-jan-2007', 'dd-mon-yyyy') + (rownum - 1) created_on<br />FROM dual<br />CONNECT BY LEVEL <= TO_DATE('10-jan-2007', 'dd-mon-yyyy') -<br /> TO_DATE('1-jan-2007', 'dd-mon-yyyy') + 1;</pre><pre>CREATED_ON<br />----------<br />01-jan-07<br />02-jan-07<br />03-jan-07<br />04-jan-07<br />05-jan-07<br />06-jan-07<br />07-jan-07<br />08-jan-07<br />09-jan-07<br />10-jan-07</pre><pre>10 rows selected.</pre></blockquote>This query is then joined with an aggregate query on the test data using an outer join.<br /><blockquote><pre>SELECT created_on,<br /> NVL(qty, 0) qty<br />FROM<br /> (SELECT TO_DATE('1-jan-2007', 'dd-mon-yyyy') + (rownum - 1) created_on<br /> FROM dual<br /> CONNECT BY LEVEL <= TO_DATE('10-jan-2007', 'dd-mon-yyyy') -<br /> TO_DATE('1-jan-2007', 'dd-mon-yyyy') + 1)<br />LEFT OUTER JOIN<br /> (SELECT TRUNC(created_at, 'dd') created_on,<br /> COUNT(*) qty<br /> FROM test_data<br /> GROUP BY TRUNC(created_at, 'dd'))<br />USING(created_on)<br />ORDER BY created_on;</pre><pre>CREATED_ON QTY<br />---------- ----------<br />01-jan-07 1439<br />02-jan-07 1440<br />03-jan-07 0<br />04-jan-07 1440<br />05-jan-07 0<br />06-jan-07 1440<br />07-jan-07 1361<br />08-jan-07 0<br />09-jan-07 0<br />10-jan-07 0</pre><pre>10 rows selected.</pre></blockquote><br />Now we are guaranteed to return a row for every day in our range, even for the days with no data.Mark Lishmanhttp://www.blogger.com/profile/04691975099200266330noreply@blogger.com