Oracle Performance Engineering Guide: AWR, ASH and SQL Monitor in 19c and 23ai
Master the essential tools for diagnosing and resolving real-world performance issues
Production Environment Context
Oracle Database 19.18.21 with Oracle Grid Infrastructure 19.18 | 3-Node RAC Cluster | 8.5 TB OLTP Database | 3,200+ concurrent sessions | Peak TPS: 2,100 | 24/7 mission-critical
It's 3 AM on a Tuesday. The monitoring dashboard lights up red. Response times have jumped from 200ms to 5+ seconds. Users are reporting timeouts on critical batch jobs. Your manager's Slack message is already waiting: "Database issue?" You log into the database, check CPU utilization (45%), memory (78% used), disk I/O latency (120ms). Everything looks elevated but not catastrophically bad. Where do you even start investigating?
This is where AWR (Automatic Workload Repository), ASH (Active Session History), and SQL Monitor become your diagnostic lifeline. Over the past 15+ years managing large-scale Oracle databases in production, I have debugged thousands of performance incidents—from runaway SQL queries consuming 800GB of I/O in 10 minutes, to massive lock contention blocking 400+ sessions, to redo log I/O stalls freezing the entire database. These three tools have never failed me. They transform performance troubleshooting from educated guessing into data-driven root cause analysis.
In this guide, you will learn the exact methodology I use in production: how to leverage AWR, ASH, and SQL Monitor to identify root causes in under 15 minutes, pinpoint the exact problematic SQL statements, analyze their execution plans, and implement fixes. Whether you're managing a 3-node RAC cluster running Oracle 19c or a cloud-native 23ai environment, the diagnostic principles remain constant.
Let's dig into real production scenarios and techniques.
- Understanding Oracle Performance Diagnostics: The AWR Foundation
- Active Session History (ASH): Real-Time Visibility into Database Activity
- SQL Monitor: Drill Down into Individual SQL Statement Execution
- Reading the Top 5 Timed Events: Where to Focus Your Efforts
- Practical Workflow: From Problem Detection to Root Cause Analysis
- Advanced Tuning Strategies: Wait Events, Lock Analysis, and I/O Optimization
- Common Pitfalls and Anti-Patterns in Performance Tuning
- Automation and Best Practices for Ongoing Performance Management
- FAQ: Real Questions from Production DBAs
- Related Reading from Real Production Systems
1. Understanding Oracle Performance Diagnostics: The AWR Foundation
The Automatic Workload Repository (AWR) is the backbone of Oracle performance diagnostics. Every hour, Oracle automatically captures a snapshot of database activity—CPU usage, I/O operations, SQL execution patterns, wait events, and memory consumption. These snapshots are stored in the SYSAUX tablespace for up to 8 days by default. In production, you should increase this to at least 30 days.
What Gets Captured in AWR Snapshots?
- System Metrics: CPU utilization, physical I/O, wait event statistics, library cache activity
- SQL Performance Data: Top SQL by CPU, disk reads, and elapsed time
- Wait Events: Database locks, latches, I/O waits, log file synchronization
- Memory Allocation: SGA usage, buffer cache hit ratios, shared pool statistics
- Redo and Archive Logs: Log generation rates, checkpoint activity
The default snapshot interval is 60 minutes. For high-transaction databases or during troubleshooting windows, you might want to tighten this to 15 or 30 minutes.
Check AWR Snapshot Settings
-- Check current AWR settings
SQL> SELECT dbid, db_name, wr_retention, wr_interval, topnsql
FROM dba_hist_wr_control;
-- Output example:
-- DBID DB_NAME WR_RETENTION WR_INTERVAL TOPNSQL
-- 1234567890 PROD_DB 10080 60 30
Modify Snapshot Interval
-- Change snapshot interval to 15 minutes, retention to 30 days
SQL> BEGIN
DBMS_WORKLOAD_REPOSITORY.modify_snapshot_settings(
retention => 43200,
interval => 15
);
END;
/
-- Verify changes applied
SQL> SELECT dbid, wr_retention, wr_interval
FROM dba_hist_wr_control;
-- Note: For high-volume systems, 15-minute snapshots
-- equal approximately 100GB per month
-- Monitor v$sysaux_occupants to track AWR usage
2. Active Session History (ASH): Real-Time Visibility into Database Activity
If AWR is your historical lens, ASH (Active Session History) is your real-time microscope. ASH captures sessions that are actively consuming database time—CPU, I/O waits, locks—approximately every second. By default, the last 1 hour of ASH data is kept in memory (in V$ACTIVE_SESSION_HISTORY). Older data is written to disk and merged into AWR snapshots.
Critical Detail: ASH uses circular buffer sampling. Not every session is captured every second. Only one sample per 10 milliseconds of active session time is recorded. In a 3,000-session database, this means roughly 300 samples per second. If a session runs for only 50 milliseconds, it might not appear in ASH at all. This is why SQL Monitor (which captures actual execution) is essential for short-lived queries.
When you have a performance problem happening right now, ASH is your first diagnostic tool. AWR report generation can take 5-10 minutes on large databases. ASH gives you answers in seconds and does not require waiting for the next snapshot.
When to Use ASH Reports
| Scenario | Time Window | Why ASH |
|---|---|---|
| Ongoing, real-time slowdown | Last 5-60 minutes | Immediate visibility; no delay |
| Batch job hangs suddenly | Last 10 minutes | Pinpoint exact hang condition |
| Session-level blocking locks | Last 5-15 minutes | See blocking session ID and wait chain |
| Spike in disk I/O | Last 2-5 minutes | Isolate which SQL is reading |
-- Connect as SYSDBA
sqlplus / as sysdba
-- Generate ASH report for last 30 minutes (HTML format)
SQL> @$ORACLE_HOME/rdbms/admin/ashrpt.sql
-- When prompted:
-- Enter report type: HTML
-- Enter begin time: -30 (for last 30 minutes, relative)
-- Press Enter for end time (defaults to now)
The ASH report will provide:
- Top Wait Events: What sessions are waiting for (CPU, I/O, locks)
- Top SQL: Which SQL statements are consuming the most time
- Top Sessions: Which user or application is causing the load
- Activity Timeline: When the problem started and peaked
3. SQL Monitor: Drill Down into Individual SQL Statement Execution
SQL Monitor is your surgical instrument. While AWR and ASH tell you what is slow, SQL Monitor shows you why—down to the execution plan step, the number of rows processed, buffer gets, physical reads, and wait events per operation.
Starting in Oracle 19c, any SQL statement that executes for more than 5 seconds is automatically monitored and retained for 24 hours. In Oracle 23ai, the monitoring threshold is now configurable, and AI-powered recommendations are automatically generated for problematic statements.
Key Advantage: SQL Monitor captures actual execution statistics, not estimated values. If a query runs for 45 seconds but the execution plan shows "Estimated Rows=100, Actual Rows=10,000,000," that is immediate proof of a cardinality estimation problem—your first action should be gathering statistics, not rewriting SQL.
Key Advantages of SQL Monitor
- Per-Operation Breakdown: See CPU, I/O, and rows for each step in the execution plan
- Actual vs. Estimated Rows: Detect cardinality mismatches—a root cause of many bad plans
- Parallel Execution Insights: For parallel query statements, see load balancing and skew
- Buffer Gets and Reads: See logical and physical I/O at the operation level
- Real-Time Monitoring: Monitor a running query while it is still executing
-- Find SQL_ID of the statement (from AWR or ASH)
-- Example: Let us say SQL_ID is '8h4kpnf0zqr2a'
SET LONG 20000 LONGCHUNKSIZE 20000 PAGESIZE 0
SET LINESIZE 1000 FEEDBACK OFF VERIFY OFF TRIMSPOOL ON
SET HEAD OFF ECHO OFF
-- Generate report to screen
SQL> SELECT DBMS_SQLTUNE.REPORT_SQL_MONITOR(
sql_id => '8h4kpnf0zqr2a',
type => 'HTML',
report_level => 'ALL'
) FROM DUAL;
-- Or save to file
SQL> SPOOL /tmp/sql_monitor_report.html
SELECT DBMS_SQLTUNE.REPORT_SQL_MONITOR(
sql_id => '8h4kpnf0zqr2a',
type => 'HTML'
) FROM DUAL;
SQL> SPOOL OFF
In Oracle 23ai, the DBMS_SQLDIAG.REPORT_SQL function is even more powerful—it captures optimizer statistics, schema metadata, and configuration parameters alongside the execution plan.
4. Reading the Top 5 Timed Events: Where to Focus Your Efforts
This is the critical insight: Most performance issues fall into a handful of categories. AWR highlights them as "Top 5 Timed Events"—the wait events consuming the most database time.
The Most Common Top 5 Events (and What They Mean)
| Wait Event | What It Means | Root Cause | Diagnostic Query |
|---|---|---|---|
| CPU time | Actual CPU consumption (non-idle) | Inefficient SQL, missing or bad indexes, poor execution plans, high recursive SQL | SELECT * FROM v$sqlarea WHERE executions > 1000 ORDER BY cpu_time DESC; |
| db file sequential read | Single-block I/O (typically index reads) | Too many index seeks, unselective indexes, high clustering factor | SELECT name FROM v$event_name WHERE name LIKE '%sequential%'; |
| db file scattered read | Multi-block I/O (full table or index scans) | Full table scans, missing partition pruning, undersized indexes | SELECT * FROM dba_hist_event_name WHERE event_name LIKE '%scattered%'; |
| log file sync | Waiting for redo log write to disk (commit and LGWR synchronization) | Undersized redo logs, slow disk I/O, high commit frequency, inadequate redo log groups | SELECT group#, thread#, bytes/1024/1024 MB FROM v$logfile; |
| enq: TX - row lock contention | Row-level locking (transaction enqueue) | Concurrent DML on same rows, missing indexes on foreign key columns, ITL exhaustion | SELECT blocking_session, sid FROM v$session WHERE blocking_session IS NOT NULL; |
5. Practical Workflow: From Problem Detection to Root Cause Analysis
The 4-Step Performance Troubleshooting Framework
Here is the systematic approach I use in every production incident:
Step 1: Confirm the Problem and Set the Time Window
- When did the problem start? (exact time from user or monitoring tool)
- Is it still happening now?
- What changed? (new job, code release, data growth)
- Identify AWR snapshots for the affected period
-- Real production scenario: Problem reported at 03:15 AM
-- Find snapshots bracketing the incident window
SQL> SELECT snap_id,
TO_CHAR(begin_interval_time, 'DD-MON-YYYY HH24:MI') begin_time,
TO_CHAR(end_interval_time, 'DD-MON-YYYY HH24:MI') end_time,
ROUND((end_interval_time - begin_interval_time)*24*60, 2)
duration_min
FROM dba_hist_snapshot
WHERE begin_interval_time >= TRUNC(SYSDATE) - 1
AND begin_interval_time <= SYSDATE
ORDER BY snap_id DESC;
-- Output example (3-node RAC):
-- SNAP_ID BEGIN_TIME END_TIME DURATION_MIN
-- 3156 10-MAR-26 03:00:00 10-MAR-26 04:00:00 60.02
-- 3155 10-MAR-26 02:00:00 10-MAR-26 03:00:00 59.98
-- 3154 10-MAR-26 01:00:00 10-MAR-26 02:00:00 60.01
Step 2: Generate AWR Report and Review Top 5 Timed Events
-- Connect as SYSDBA to any RAC instance
sqlplus / as sysdba
-- For single-instance database:
SQL> @$ORACLE_HOME/rdbms/admin/awrrpt.sql
-- When prompted by awrrpt.sql:
-- Report type: Enter 'html' (default is 'text')
-- Begin snap ID: 3154 (start of problem window)
-- End snap ID: 3156 (end of problem window)
-- For RAC databases, generate cluster-wide report:
SQL> @$ORACLE_HOME/rdbms/admin/awrgrpt.sql
-- Note: AWR report generation locks in several DBA_HIST tables
-- On 8.5TB or larger databases, this can consume 20-40GB
-- of temporary space and take 10-15 minutes.
-- Run during low-activity windows.
What to Look For in the AWR Report:
- "Top 5 Timed Events" (Section 1) - Where your database spent most time
- "Load Profile" - Transactions per second, logical reads per second, physical reads per second
- "Top 20 Foreground Events by Total Wait Time" - Full wait event breakdown
- "Top SQL by Elapsed Time" - Which statements consumed most time
- "Instance Activity Stats" - Buffer cache hit ratio, parse ratio, redo allocation
- "Cache Size Advisory" - Whether increasing SGA would help
Step 3: Identify the Culprit SQL
AWR reports include a "Top SQL" section. Look for:
- Top SQL by Elapsed Time: Most total time spent
- Top SQL by CPU Time: Most CPU consumed (CPU problem)
- Top SQL by Physical Reads: Most disk I/O (I/O problem)
Grab the SQL_ID from the report. Then generate a SQL Monitor report for that statement.
Step 4: Drill Into SQL Monitor for the Detailed Execution Plan
SQL Monitor will show you:
- Which operation in the plan is slow
- Actual rows versus estimated rows (cardinality mismatch?)
- Buffer gets and reads per operation
- For parallel statements: work distribution and skew
If you see "estimated rows = 1,000 but actual rows = 1,000,000," that is a cardinality problem. The optimizer made a bad decision based on stale statistics. Solution: use ANALYZE TABLE with COMPUTE STATISTICS or GATHER_TABLE_STATS.
6. Advanced Tuning Strategies: Wait Events, Lock Analysis, and I/O Optimization
Diagnosing Specific Wait Events
The Top 5 Timed Events tells you what is slow. Use V$ views to understand why.
For "db file sequential read" (Index I/O Problem)
-- Find the most used tables
SQL> SELECT owner, table_name, num_rows
FROM dba_tables
WHERE owner NOT IN ('SYS', 'SYSTEM')
ORDER BY num_rows DESC;
-- Check existing indexes on the table
SQL> SELECT index_name, column_name, column_position
FROM dba_ind_columns
WHERE table_name = 'YOUR_TABLE'
ORDER BY index_name, column_position;
-- Look for missing indexes in slow SQL
-- If your WHERE clause is on columns A and B:
-- CREATE INDEX idx_name ON table_name(A, B);
-- Verify index is being used
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR
('sql_id', 0, 'ALLSTATS'));
For "enq: TX - row lock contention" (Locking Problem)
Row lock contention in RAC is particularly complex due to cross-instance deadlock detection. Distinguish between:
- Row-level locks: Two sessions updating the same row (wait event: "enq: TX")
- ITL exhaustion: Transaction list too small (default 2-4 entries per block), causes "buffer busy waits"
- Global locks (RAC): Cross-instance locks (wait event: "global lock")
-- Step 1: Find blocking sessions
SQL> SELECT blocking_session, blocking_inst_id, sid, serial#,
username, sql_id, event, seconds_in_wait
FROM v$session
WHERE blocking_session IS NOT NULL
ORDER BY blocking_session;
-- Step 2: Check for ITL exhaustion (common cause)
SQL> SELECT table_name, pct_free, ini_trans, max_trans
FROM dba_tables
WHERE table_name = 'ORDERS';
-- Step 3: If ini_trans is low, increase it
-- (requires table rebuild)
-- ALTER TABLE orders MOVE PCTFREE 20 INITRANS 10;
-- ALTER TABLE orders ENABLE ROW MOVEMENT;
For "log file sync" (Redo and Commit Problem)
-- Check redo log file sizes
SQL> SELECT group#, member, bytes/1024/1024 AS size_mb
FROM v$logfile
ORDER BY group#;
-- If logs are small (less than 500MB), increase them
SQL> ALTER DATABASE ADD LOGFILE GROUP 4
('/path/to/redo04.log') SIZE 1000M;
-- Monitor redo generation rate
SQL> SELECT inst_id, stat_name, value
FROM gv$sysstat
WHERE stat_name IN ('redo size',
'redo log space requests');
7. Common Pitfalls and Anti-Patterns in Performance Tuning
Anti-Pattern 1: Tuning by Hunches, Not Data
Anti-Pattern 2: Not Checking Cardinality Before Rewriting SQL
Anti-Pattern 3: Creating Indexes Without Testing
Anti-Pattern 4: Ignoring Baseline Performance
Anti-Pattern 5: Ignoring Interconnect Latency (RAC Databases)
SELECT wait_count, time_waited FROM v$event_name WHERE name = 'gc cr request'; If time_waited divided by wait_count is greater than 5 milliseconds per request, investigate network connectivity.
8. Automation and Best Practices for Ongoing Performance Management
Set Up Automated Baseline Comparisons
-- Create a baseline for a known-good period
-- (example: 8 AM - 10 AM off-peak)
SQL> BEGIN
DBMS_WORKLOAD_REPOSITORY.create_baseline(
start_snap_id => 12340, -- Snapshot at 8:00 AM
end_snap_id => 12341, -- Snapshot at 9:00 AM
baseline_name => 'BASELINE_OFFPEAK'
);
END;
/
-- Later, when something goes slow, compare against baseline
SQL> SELECT DBMS_WORKLOAD_REPOSITORY.compare_baseline_awr(
db_name => 'PROD_DB',
baseline_name => 'BASELINE_OFFPEAK',
snap_id1 => 12340,
snap_id2 => 12350
) FROM DUAL;
Monitor AWR Regularly (Monthly)
- Set up a cron job to generate AWR reports weekly
- Watch for trending: Is CPU increasing? Disk I/O? Memory pressure?
- Act early. Do not wait for a crisis.
Document Your Findings
Keep a log of performance issues: time, root cause, fix applied, result. Over time, you will see patterns. If row lock contention is your issue number 2 three times this year, that is a signal to redesign that batch process.
9. FAQ: Real Questions from Production DBAs
DBMS_STATS.CREATE_EXTENDED_STATS(table_name, '(col1,col2)');SELECT * FROM dba_scheduler_jobs WHERE job_name LIKE '%GATHER_STATS%';DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SCHEMA', tabname=>'LARGE_TABLE', estimate_percent=>10);Licensing Compliance and Free Alternatives
Important Note: AWR, ASH, and SQL Monitor are licensed features within Oracle Diagnostics Pack and require separate licensing from the base Oracle Database.
If You Have Diagnostics Pack Licensed:
- AWR, ASH, SQL Monitor, SQL Tuning Advisor - all available
- No additional configuration required
- Fully supported on Oracle 19c and 23ai
If You Do Not Have Diagnostics Pack:
- Statspack (Free): Legacy tool, similar to AWR but without licensing restrictions. Use:
@$ORACLE_HOME/rdbms/admin/spcreate.sql - Dynamic Performance Views (Free): Included with base database. Use V$SYSSTAT, V$SESSION, V$SQLAREA, V$SESSION_WAIT for real-time diagnostics
- Execution Plans (Free): Use
DBMS_XPLAN.DISPLAY_CURSOR()orEXPLAIN PLAN FORto analyze bad plans - Oracle 23ai Automatic Optimizer Stats (Caution): Separate license, verify before enabling