Tuesday, October 7, 2014

What is an execution plan?

What is an execution plan?



If you have to tackle performance problems in an Oracle database one of the most important skills you need to acquire is the ability to interpret execution plans, and in this series of articles I’ll be telling you how to do just that.
We’ll start with a couple of articles that look at some of the ways we can access execution plans, checking which pieces of information can be found in each of the different sources for execution plans and reviewing which method is best suited in which circumstances.
Once we’ve done that we’ll move on to the basics of interpreting simple execution plans, and introduce the one rule that can be used to interpret most execution plans – provided the rule is used with a little care. At the same time we’ll highlight the point that the plan we get may be very different from the plan we expect thanks to the optimizer’s ability to transform our SQL.
Inevitably there will be exceptions to the rule, so after a couple of articles on the simpler plans we’ll move on to more complex plans, including updates, deletes, subquery factoring, and scalar subqueries in the select list. From there we can move on to distributed queries, the impact of partitioned objects on plans, and finally have a few words about parallel execution.
We won’t be able to cover all the operations and options that might appear in execution plans – but we should be able to cover enough material to handle most of the plans you’re likely to see on a typical Oracle system.

What is an execution plan?

When we write an SQL statement we may have some idea of what we think Oracle will do to generate the result. An execution plan is the mechanism Oracle uses to show how our SQL can be turned into a series of execution steps to produce the required result. Whether the plan is a prediction made before the query is executed, or is demonstrably the actual plan used at run-time, the approach we take to interpret the plan is the same. Depending when and how we acquired the plan, though, the level of detail (and even the degree of confidence that we’re looking at the right plan) varies. Consider, for example, this query:
select
 t1.v1, t2.v1
from
 t1, t2
where
 t1.n2 =       :b1
and t2.id =       t1.id
and t2.n2 between :b2 and :b3
;
It’s a simple two-table join; there’s a single column join condition, and there are two “single-table” predicates eliminating data. When we examine this query we want to be able to answer the following questions – which aren’t necessarily independent of each other
  • Which table will Oracle access first?
  • How will Oracle access that table, by index or tablescan?
  • How much data will it find?
  • Which table will Oracle access next, and how? (The “which” is easy in this case)
  • What join mechanism will Oracle use to join the two tables?
  • How much data will be generated through the effects of the join predicates?
  • How much data will be discarded by predicates that have to be applied after the join?
  • Is there any significant difference between Oracle’s predictions and what actually happens (or happened)?
  • Can we quickly identify why any such differences appeared, and whether they matter?
  • Does Oracle’s strategy match our expectations?
  • Can we see why Oracle didn’t pick the strategy we thought was a good one?
In this case we might assume that the Oracle should pick a tiny number of rows from t1, taking advantage of an index on the n2 column to find them; then do a nested loop join into t2 using the primary key index that we have on t2 to find out whether or not there is a row in t2 that matches each row found in t1; finally discarding most of the resulting rows because they fail the range test in the last line of our SQL.
So here are a couple of execution plans for the query, introducing a couple of the issues we have to consider before we even start working on any interpretation. These plans came from an instance of 11.2.0.4.

Using a “predictive” method:

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |    32 |   832 |    46   (3)| 00:00:01 |
|*  1 |  FILTER                       |       |       |       |            |          |
|*  2 |   HASH JOIN                   |       |    32 |   832 |    46   (3)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL          | T1    |    32 |   416 |    24   (5)| 00:00:01 |
|   4 |    TABLE ACCESS BY INDEX ROWID| T2    |   500 |  6500 |    22   (0)| 00:00:01 |
|*  5 |     INDEX RANGE SCAN          | T2_N2 |    45 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_NUMBER(:B2)<=TO_NUMBER(:B3))
2 - access("T2"."ID"="T1"."ID")
3 - filter("T1"."N2"=TO_NUMBER(:B1))
5 - access("T2"."N2">=TO_NUMBER(:B2) AND "T2"."N2"<=TO_NUMBER(:B3))

Using a “reactive” method

----------------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |       |       |    33 (100)|          |
|*  1 |  FILTER                        |       |       |       |            |          |
|   2 |   NESTED LOOPS                 |       |    17 |   442 |    33   (0)| 00:00:01 |
|   3 |    NESTED LOOPS                |       |    17 |   442 |    33   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1    |    17 |   221 |    16   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | T1_N2 |    17 |       |     1   (0)| 00:00:01 |
|*  6 |     INDEX UNIQUE SCAN          | T2_PK |     1 |       |     0   (0)|          |
|*  7 |    TABLE ACCESS BY INDEX ROWID | T2    |     1 |    13 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(:B3>=:B2)
5 - access("T1"."N2"=:B1)
6 - access("T2"."ID"="T1"."ID")
7 - filter(("T2"."N2">=:B2 AND "T2"."N2"<=:B3))
The plans are obviously different – even though I produced them from the same session one right after the other. The main differences show up in two places; first in the “body” of the plan, where one plan has six lines and the other has eight; then in the Predicate Information, where one plan shows explicit coercion (e.g. to_number(:B3)) of all our bind variables and the other shows no coercion at all. If we’re going to depend on execution plans to help us produce efficient systems we need to know why such contradictions can appear, and learn how to identify how far we can trust the execution plans that Oracle gives us. So let’s look at some of the commoner options for acquiring execution plans, and the limitations of each option.

Acquiring Execution plans

Although there are various graphical tools that will produce execution plans at the touch of a button I’m going to stick with methods for using command-line SQL to produce plans, mainly taking advantage of Oracle’s dbms_xplan package. Whatever tools you use, the information you can get is fairly similar – but if you want to share plans with members of the various fora, list servers, and groups on the Internet you get the best public response if you stick with a well-known format, and dbms_xplan is the de facto standard for Oracle.

Explain plan

From an SQL*Plus session:
explain plan for {your select statement}
select * from table(dbms_xplan.display);
This is the simplest bare-bones method for getting a “prediction” of the plan that Oracle would use if you were to execute your statement; and it’s how I got the first of my two plans above. There are several problems with this strategy – two of them showed up, with different levels of visibility, in my plan.
First, if your query contains bind variables (as mine did) the “explain plan” feature has no idea of the data types of those bind variables, it assumes they are of type character – that’s why I got all those to_number() coercions in the Predicate Information – and the resulting coercions can make a huge difference to the shape of the plan since they may make it impossible for the optimizer to consider some of the indexes that would otherwise be available.
Secondly, Oracle has used “bind-variable peeking” for many years to check for actual values of incoming bind variables when it first optimizes a statement. But “explain plan” doesn’t even try; it has no idea of actual values, so it uses a few basic rules to estimate the selectivity of predicates involving bind variables. Some of the rules are sensible some are just guesses – typically using 1% or 5% as the selectivity. Poor estimates of selectivity lead to poor estimates of cardinality (rows) and bad plans.
Side note: You might notice in the first plan an example of a strange contradiction that guesses can product: in lines 4 and 5. The index range scan predicts 45 rowids will be found, but the table access predicts 500 rows will be return – clearly not possible if you’re only going to have 45 rowids to find those rows. There is a guess relating to range-based predicates on indexes that uses a minimum selectivity of 0.45%, while the equivalent but contradictory guess for tables is 5%.
There are a few more details about “explain plan” that you might use occasionally. The call has a few extra options; the manuals show it as:
explain plan
set statement_id = ‘{string}’
into {schema}.{table}@{db_link}
for {statement};
By default the statement_id is null, and the target table is called plan_table (which, in modern versions of Oracle is a public synonym for the global temporary table sys.plan_table$). In line with the table and statement_id, the call to dbms_xplan has a couple of parameters that make it possible for you to explain several statements and then report only the ones you want, the function declaration is:
dbms_xplan.display({plan_table},{statement_id},{format options},{filter option})
By leaving the first two parameters null (or explicitly supplying NULL) you will be selecting the most recent statement you have explained into PLAN_TABLE. As far as format options are concerned, there are many more pieces of information written to the plan table which you can choose to display – and we’ll be looking at some of those later on in the series. The filter option allows you to limit the rows you return from the plan_table – but I’ve never needed to use it on a production system.

Autotrace

There is a special variant of “explain plan” built into SQL*Plus; this is the autotrace option which you can enable with the “set” command:
set autotrace on
set autotrace traceonly
set autotrace traceonly explain
set autotrace traceonly statistics
set autotrace off
When autotrace is enabled SQL*Plus can report the execution plan and execution stats of any statement you execute. You can limit the SQL*Plus output to just the plan, just the stats, both plan and stats, and you can choose to suppress the normal output from the statement (with the traceonly option). For example, had I set autotrace to “traceonly statistics” and then run my original statement I would have seen only the following output:
1 row selected.

Statistics
----------------------------------------------------------
   1  recursive calls
   0  db block gets
   36  consistent gets
   0  physical reads
   0  redo size
   471  bytes sent via SQL*Net to client 
   415  bytes received via SQL*Net from client
   2  SQL*Net roundtrips to/from client
   0  sorts (memory)
   0  sorts (disk)
   1  rows processed
This can be convenient if you want to get an idea of how much work a query will have to do without actually displaying the result set, or saving it to disk, on the client machine – but it’s not something I’ve needed to do often.
There is a little trap with autotrace – it’s not telling you the actual execution plan, it’s simply doing an “explain plan” and calling “dbms_xplan.display” in the background; moreover, if you “set autotrace traceonly explain” and issue a select statement then, since you’ve said you don’t want to see the actual query output, SQL*Plus doesn’t even run the statement. However, after you’ve been checking plans for many select statements, it’s easy to forget that if you execute an insert, update, delete or merge statement the statement WILL run – fortunately reporting the number of rows inserted, updated, deleted or merge so that you get a reminder to rollback.

dbms_xplan.display_cursor()

The only other option I want to mention in this article is the function that allows you to pull actual execution plans from memory after a query has run. Its definition is:
dbms_xplan.display_cursor({sql_id},{child_number},{format options})
In its simplest form (running from SQL*Plus with no parameters) it returns the execution plan for the last SQL statement you’ve executed. For various reasons you may find that when you do this you see a report about being unable to fetch the plan – this can indicate an odd timing effect with cursor invalidation, but the commonest reason is that you haven’t “set serveroutput off”, and the missing plan is for the call to dbms_output that SQL*Plus has injected after the statement you’ve just run, in which case you will see the following:
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  b3s1x9zqrvzvc, child number 0

BEGIN DBMS_OUTPUT.ENABLE(1000000); END;

NOTE: cannot fetch plan for SQL_ID: b3s1x9zqrvzvc, CHILD_NUMBER: 0
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)

8 rows selected.
I used a call to dbms_xplan.display_cursor() to get the second plan of the two plans above. It’s what Oracle actually did for the specific values for the bind variables I used, with the current object statistics and optimizer environment for my session. It is still, however, reporting the optimizer estimates of how many rows each step of the plan will return – not the actual numbers of rows found at runtime – we’ll come to that topic in the next article.
There’s plenty more to say about dbms_xplan.display_cursor() and the ways in which you can use it; but I’ll stop at this point with just one warning. Although it shows you the actual execution plan generated (usually) with the actual bind values you supplied when you ran the query, this doesn’t guarantee that the plan you see is the plan that last appeared on the production system, or the plan that will appear in the future on the production system.
There are many reasons why you can be fooled by execution plans if you’re not looking at exactly what the end-user did at the moment they did it, on the live system; the reasons include things like:
  • Timing and choice of actual bind values
  • Optimizer environment and object statistics
  • Name resolution
A call to dbms_xplan.display_cursor() is generally less likely to mislead than a call to explain plan followed by a call to dbms_xplan.display() but it still requires some intelligent thought if you want to be sure that you aren’t looking at the wrong execution plan.

Conclusion

In this article we’ve seen how easy it is to get reports of execution plans, but noted that plans come in two “flavours” – predictive, and actual. We’ve also noted that the presence of bind variables in queries means that the “predictive” method is likely to give you a misleading idea of what the final execution plan will be.
There’s a general, and reasonably accurate, belief that running the query and checking the actual execution plan afterwards is more likely to give you the plan that would appear at run-time on the production system; however this is still dependent on your working environment being sufficiently similar to the end-users’ run-time environment.
We noted that the “volume” information (rows, bytes) from the execution plan is still predictive even when we report the actual execution plan – but in the next session we’ll see how we can get the actual volume figures as well, and how this helps us to recognize why the optimizer may have chosen a plan that doesn’t match our expectation.

No comments:

Post a Comment