Database Tuning
There are basically two forms of tuning:
o Speed: short response time
o High throughput scalability: higher load at a comparable response time
or throughput.
During this course, methods to identify and resolve bottlenecks will be
discussed. The result of tuning should be visible to users, either as a
decrease in the time it takes to perform a task, or as an increase in
the number of concurrent sessions.
Tuning is performed because either a problem already exists or the DBA
wishes to prevent problems from occurring.
Some examples of items to be monitored are critical table growth,
changes in the statements users execute, and I/O distribution across
devices.
I will explain where waits and bottlenecks exist, and how to resolve
these
Problems.
Tuning can be divided into different phases:
o Application design and programming
o Database configuration
o Adding a new application
o Troubleshooting and tuning
Proactive Tuning Considerations during Development
o Tune the design.
o Tune the application.
o Tune memory.
o Tune I/O.
o Tune contention.
o Tune the operating system
Tuning Steps During Production
o Locate the bottleneck by using tools.
o Determine the reason for the bottleneck.
o Resolve the cause.
o Check that the bottleneck has been resolved.
Tuning Steps during Production
The tuning methodology for production systems works by resolving
problems when they
occur:
1. Locate a bottleneck by using tools, such as STATSPACK, UTLBSTAT and
UTLESTAT, or Oracle
Enterprise Manager.
2. The bottleneck usually manifests itself as a wait event. Determine the reason for the
wait event.
3. Resolve the cause of the wait. This could mean changing the size of a member of the
System Global Area.
4. Check that the change has produced a positive effect on the system by running the
application again, and then using the tools used in step 1.
5. Repeat the process if your goals have not yet been achieved.
Diagnostic and Tuning Tools
Alert.log file :
You can check the time between log switches in the alert.log file.
The alert log file consists of a chronological log of
messages and errors.
The following information is logged in the alert log file:
o Internal errors (ORA-600) and block corruption errors (ORA-1578 or ORA-1498)
o Operations that affect
database structures and parameters, and statements such as
CREATE DATABASE,
STARTUP, SHUTDOWN, ARCHIVE LOG, and RECOVER
o The values of all nondefault initialization parameters at the time the
instance starts
o The location of the ALERT.LOG file is given by the parameter
BACKGROUND_DUMP_DEST
Background trace files
These files are created by the background processes. In general these
files contain diagnostic information, not information regarding
performance tuning.
By using events,
Oracle Support
can write information to these files regarding performance.
User trace files
User trace files can be generated by server processes at the user's or
DBA's request.
Emergency Performance Techniques
This chapter contains the following sections:
Introduction to Emergency Performance Techniques
Steps in the Emergency Performance Method
Introduction to Emergency Performance Techniques
Introduction to Emergency Performance Techniques
An emergency situation, a component of the system has changed to
transform it from a reliable, predictable system to one that is
unpredictable and not satisfying user requests.
In this case, the role of the performance engineer is to rapidly
determine what has
changed and take appropriate actions to resume normal service as quickly
as
possible. In many cases, it is necessary to take immediate action, and a
rigorous
Performance improvement project is unrealistic.
After addressing the immediate performance problem, the performance
engineer
must collect sufficient debugging information either to get better
clarity on the
performance problem or to at least ensure that it does not happen again.
However, shortcuts are taken in various stages because of the timely
nature of the
problem. Keeping detailed notes and records of facts found as the
debugging
process progresses is essential for later analysis and justification of
any remedial
actions. This is analogous to a doctor keeping good patient notes for
future
reference.
Steps in the Emergency Performance Method
The Emergency Performance Method is as follows:
1. Survey the performance problem and collect the symptoms of the
performance
problem. This process should include the following:
User feedback on how the system is underperforming. Is the problem
throughput or response time?
Ask the question, "What has changed since we last had good
performance?"
This answer can give clues to the problem; however, getting unbiased
answers in an escalated situation can be difficult.
2. Sanity-check the hardware utilization of all components of the
application
system. Check where the highest CPU utilization is, and check the disk,
memory usage, and network performance on all the system components. This
quick process identifies which tier is causing the problem. If the
problem is in
Steps in the Emergency Performance Method
Emergency Performance Techniques 3-3
the application, then shift analysis to application debugging.
Otherwise, move
on to database
server analysis.
3. Determine if the
database server is constrained on CPU or if it is spending time
waiting on wait events. If the database server is CPU-constrained, then
investigate the following:
Sessions that are consuming large amounts of CPU at the operating system
level
Sessions or statements that perform many buffer gets at the
database level
(check V$SESSTAT, V$SQL)
Execution plan changes causing sub-optimal
SQL execution (these can be
difficult to locate)
Incorrect setting of initialization parameters
Algorithmic issues as a result of code changes or upgrades of all
components
If the database sessions are waiting on events, then follow the wait events listed
in V$SESSION_WAIT to determine what is causing serialization. In cases of
massive contention for the library cache, it might not be possible to logon or
submit SQL to the database. In this case, use historical data to determine why
there is suddenly contention on this latch. If most waits are for I/O, then
sample the SQL being run by the sessions that are performing all of the I/Os.
4. Apply emergency action to stabilize the system. This could involve actions that
take parts of the application off-line or restrict the workload that can be applied
to the system. It could also involve a system restart or the termination of job in
process. These naturally have service level implications.
5. Validate that the system is stable. Having made changes and restrictions to the
system, validate that the system is now stable, and collect a reference set of
statistics for the database.
No comments:
Post a Comment