Whenever you execute a SQL statement, a component of the database known as the optimizer must decide how best to access the data operated on by that statement. To figure out the optimal execution path for a statement, the optimizers consider the following:
- The syntax you’ve specified for the statement
- Any conditions that the data must satisfy (the WHERE clauses)
- The database tables your statement will need to access
- All possible indexes that can be used in retrieving data from the table
- The Oracle RDBMS version
- The current optimizer mode
- SQL statement hints
- All available object statistics (generated via the ANALYZE command)
- The physical table location (distributed SQL)
- INIT.ORA settings (parallel query, async I/O, etc.)
UNDERSTANDING THE COST-BASED OPTIMIZER
The information required by the cost-based optimizer is available once a table has been analyzed via the ANALYZE command, or via the DBMS_STATS facility. Plz note The ANALYZE command and the DBMS_STATS functions collect statistics about tables, clusters, and indexes, and store those statistics in the data dictionary.
The RDBMS kernel defaults to using the cost-based optimizer under a number of situations, including the following:
- OPTIMIZER_MODE = CHOOSE has been specified in the INIT.ORA file, and statistics exist for at least one table involved in the statement
- An ALTER SESSION SET OPTIMIZER_MODE = CHOOSE command has been executed, and statistics exist for at least one table involved in the statement
- An ALTER SESSION SET OPTIMIZER_MODE = FIRST_ROWS (or ALL_ROWS) command has been executed, and statistics exist for at least one table involved in the statement
- A statement uses the FIRST_ROWS or ALL_ROWS hint (e.g., SELECT /*+ FIRST_ROWS */. . .)
ANALYZE command
The way that you analyze your tables can have a dramatic effect on your SQL performance. If your DBA forgets to analyze tables or indexes after a table re-build, the impact on performance can be devastating. If your DBA analyzes each weekend, a new threshold may be reached and Oracle may change its execution plan. The new plan will more often than not be an improvement, but will occasionally be worse.
If you do want to analyze frequently, use DBMS_STATS.EXPORT_SCHEMA_STATS to back up the existing statistics prior to re-analyzing. This gives you the ability to revert back to the previous statistics if things screw up. When you analyze, you can have Oracle look at all rows in a table (ANALYZE COMPUTE) or at a sampling of rows (ANALYZE ESTIMATE). Typically, When a table is analyzed using ANALYZE, all associated indexes are analyzed as well. If an index is subsequently dropped and recreated, it must be re-analyzed. Be aware that the procedures DBMS_STATS.GATHER_SCHEMA_STATS and GATHER_TABLE_STATS analyze only tables by default, not their indexes. When using those procedures, you must specify the CASCADE=>TRUE option for indexes to be analyzed as well.
Following are some sample ANALYZE statements:
ANALYZE TABLE EMP ESTIMATE STATISTICS SAMPLE 5 PERCENT FOR ALL INDEXED COLUMNS;
ANALYZE INDEX EMP_NDX1 ESTIMATE STATISTICS SAMPLE 5 PERCENT FOR ALL INDEXED COLUMNS;
ANALYZE TABLE EMP COMPUTE STATISTICS FOR ALL INDEXED COLUMNS;
If you analyze a table by mistake, you can delete the statistics. For example:
ANALYZE TABLE EMP DELETE STATISTICS;
Analyzing can take an excessive amount of time if you use the COMPUTE option on large objects. We find that on almost every occasion, ANALYZE ESTIMATE 5 PERCENT on a large table forces the optimizer make the same decision as ANALYZE COMPUTE.
Inner workings of the cost-based optimizer
The cost-based optimizer is flexible and can adapt to its environment. This adaptation is possible only once the necessary underlying object statistics have been refreshed (re-analyzed). What is constant is the method by which the cost-based optimizer calculates each possible execution plan and evaluates its cost (efficiency).
The cost-based optimizer’s functionality can be (loosely) broken into the following steps:
1. Parse the SQL (check syntax, object privileges, etc.).
2. Generate a list of all potential execution plans.
3. Calculate (estimate) the cost of each execution plan using all available object statistics.
4. Select the execution plan with the lowest cost.
The cost-based optimizer will be used only if at least one table within a SQL statement has statistics (table statistics for unanalyzed tables are estimated). If no statistics are available for any table involved in the SQL, the RDBMS will resort to the rule-based optimizer, unless the cost-based optimizer is forced via statement-level HINTS or by an optimizer goal of ALL_ROWS or FIRST_ROWS.
To understand how the cost-based optimizer works and, ultimately, how to exploit it, we need to understand how it thinks.
Primary key and/or UNIQUE index equality —A UNIQUE index’s selectivity is recognized as 100%. No other indexed access method is more precise. For this reason, a unique index is always used when available.
Non-UNIQUE index equality —For non-UNIQUE indexes, index selectivity is calculated. The cost-based optimizer makes the assumption that the table (and subsequent indexes) have uniform data spread unless you use the FOR ALL INDEXED COLUMNS option of the ANALYZE. That option will make the cost-based optimizer aware of how the data in the indexed columns is skewed.
Range evaluation —For index range execution plans, selectivity is evaluated. This evaluation is based on a column’s most recent high-value and low-value statistics. Again, the cost-based optimizer makes the assumption that the table (and subsequent indexes) have uniform data spread unless you use the FOR ALL INDEXED COLUMNS option when analyzing the table.
Range evaluation over bind variables—For index range execution plans, selectivity is guessed. Prior to Oracle9i, because bind variable values are not available at parse time (values are passed to the cursor after the execution plan has been decided), the optimizer cannot make decisions based on bind variable values. The optimizer assumes a rule of thumb of 25% selectivity for unbounded bind variable ranges (e.g., WHERE dept_no = :b1) and 50% selectivity for bounded ranges (WHERE dept_no > :b1 AND dept_no < :b2). Beginning with Oracle9i, the cost-based optimizer obtains bind variable values prior to determining an execution plan.
System resource usage —By default, the cost-based optimizer assumes that you are the only person accessing the database. Oracle9i gives you the ability to store information about system resource usage, and can make much better informed decisions based on workload (read up on the DBMS_STATS.GATHER_SYSTEM_STATS package).
Current statistics are important —The cost-based optimizer can make poor execution plan choices when a table has been analyzed but its indexes have not been, or when indexes have been analyzed but not the tables.
No comments:
Post a Comment