⏱️ Estimated Reading Time: 12–14 minutes
📊 Master AWR, ASH, and SQL Monitor - The Oracle DBA's Performance Troubleshooting Trinity
Production was slow. Users were complaining. The application team blamed the database. I had 30 minutes to find the root cause before the CTO joined the war room call.
I opened an AWR report. Buffer busy waits: 87% of total wait time. But which sessions? Which objects? AWR doesn't tell you. That's when I switched to ASH. Three sessions hammering the same table block. SQL Monitor showed me the exact query execution plan with real-time wait events. Problem identified: missing index on a foreign key. Index created. Performance restored. Crisis averted.
AWR, ASH, and SQL Monitor are Oracle's three essential performance diagnostic tools. AWR gives you the big picture over time. ASH shows you what's happening right now at the session level. SQL Monitor reveals how individual queries execute in real-time. Together, they form the complete performance troubleshooting toolkit that every Oracle DBA needs to master.
This guide covers real production troubleshooting techniques using AWR, ASH, and SQL Monitor in Oracle 19c and 23ai. If you're tired of guessing at performance problems or spending hours digging through logs, these are the diagnostic queries and analysis patterns that actually work.
1. AWR Fundamentals: What It Measures and Why It Matters
Automatic Workload Repository (AWR) is Oracle's time-series performance data warehouse. Every hour (by default), Oracle captures a snapshot of database performance metrics.
What AWR Captures
| Metric Category |
What It Measures |
Why It Matters |
| Wait Events |
Where database spent time waiting |
Identifies bottlenecks (I/O, CPU, locks) |
| Top SQL |
Most resource-intensive queries |
Find queries to optimize first |
| System Statistics |
DB-wide metrics (commits, reads, writes) |
Understand workload characteristics |
| Memory Usage |
SGA, PGA, buffer cache hit ratios |
Memory tuning opportunities |
| I/O Statistics |
Datafile reads/writes, latency |
Storage performance issues |
AWR vs Statspack
AWR replaced Statspack in Oracle 10g. Key differences:
- ✅ AWR: Requires Diagnostics Pack license (paid), automatic snapshots, more detailed data
- ✅ Statspack: Free, manual snapshots, less detail but still useful
- ❌ Common mistake: Running AWR reports without proper licensing (Oracle audits this)
-- Connect as SYS or user with DBA privileges
@?/rdbms/admin/awrrpt.sql
-- Or generate programmatically
SELECT output FROM TABLE(DBMS_WORKLOAD_REPOSITORY.AWR_REPORT_HTML(
l_dbid => (SELECT dbid FROM v$database),
l_inst_num => 1,
l_bid => 12345, -- Begin snapshot ID
l_eid => 12350 -- End snapshot ID
));
-- Find recent snapshot IDs
SELECT
snap_id,
TO_CHAR(begin_interval_time, 'YYYY-MM-DD HH24:MI') AS snap_time,
TO_CHAR(end_interval_time, 'YYYY-MM-DD HH24:MI') AS end_time
FROM dba_hist_snapshot
WHERE begin_interval_time > SYSDATE - 2
ORDER BY snap_id DESC;
AWR Configuration Best Practices
-- Check current AWR settings
SELECT
snap_interval,
retention
FROM dba_hist_wr_control;
-- Modify AWR snapshot frequency and retention
BEGIN
DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(
retention => 43200, -- 30 days (in minutes)
interval => 30 -- Snapshot every 30 minutes
);
END;
/
-- Create manual snapshot (useful during performance issues)
EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT;
2. Reading AWR Reports: Focus on These Sections First
AWR reports are 50-100 pages long. Here's what actually matters when troubleshooting.
Section 1: Top 5 Timed Foreground Events
This is the most important section. It shows where your database spent the most time.
Example analysis:
| Event |
Wait Time (sec) |
% of Total |
What It Means |
| db file sequential read |
45,230 |
62% |
Single-block reads (index scans) |
| CPU time |
18,450 |
25% |
Actual CPU processing |
| log file sync |
5,890 |
8% |
Commits waiting for redo |
| db file scattered read |
2,120 |
3% |
Multi-block reads (full scans) |
| enq: TX - row lock |
1,450 |
2% |
Row-level lock contention |
Analysis: This system is I/O bound (62% in db file sequential read). Focus on:
- Which SQL statements cause most sequential reads
- Missing indexes forcing index scans instead of seeks
- Storage latency issues
Section 2: SQL Ordered by Elapsed Time
Shows the most expensive queries by total elapsed time. These are your tuning targets.
-- Top SQL by elapsed time (from AWR snapshots)
SELECT
sql_id,
plan_hash_value,
executions_delta AS executions,
ROUND(elapsed_time_delta/1000000, 2) AS elapsed_sec,
ROUND(cpu_time_delta/1000000, 2) AS cpu_sec,
ROUND(iowait_delta/1000000, 2) AS io_wait_sec,
buffer_gets_delta AS buffer_gets,
disk_reads_delta AS disk_reads
FROM dba_hist_sqlstat
WHERE snap_id BETWEEN 12345 AND 12350
AND executions_delta > 0
ORDER BY elapsed_time_delta DESC
FETCH FIRST 20 ROWS ONLY;
Section 3: Load Profile
Gives you per-second and per-transaction metrics. Compare against baseline to spot anomalies.
Key metrics to watch:
- Logical Reads/sec: Buffer cache activity
- Physical Reads/sec: Disk I/O activity
- Executions/sec: Query throughput
- Transactions/sec: Business transaction rate
- Commits/sec: Transaction commit rate
3. ASH Deep Dive: Real-Time Session Analysis
Active Session History (ASH) samples active sessions every second. Unlike AWR which gives you hourly summaries, ASH shows you exactly what each session was doing at specific points in time.
When to Use ASH vs AWR
Use AWR when:
- Analyzing performance over hours or days
- Finding trends and patterns
- Comparing current performance to historical baseline
Use ASH when:
- Troubleshooting active performance issues (right now)
- Identifying which sessions cause specific wait events
- Finding contention on specific objects or SQL statements
-- Real-time view of active sessions
SELECT
session_id,
session_serial#,
user_id,
sql_id,
sql_plan_hash_value,
event,
wait_class,
wait_time_micro/1000000 AS wait_time_sec,
blocking_session,
current_obj#,
program,
module
FROM v$active_session_history
WHERE sample_time > SYSDATE - INTERVAL '15' MINUTE
AND session_state = 'WAITING'
ORDER BY sample_time DESC;
-- What sessions spent most time waiting for (last hour)
SELECT
event,
wait_class,
COUNT(*) AS sample_count,
ROUND(COUNT(*) / SUM(COUNT(*)) OVER () * 100, 2) AS pct_of_samples,
ROUND(COUNT(*) * 1, 2) AS approx_seconds -- Each sample = 1 second
FROM v$active_session_history
WHERE sample_time > SYSDATE - INTERVAL '60' MINUTE
AND event IS NOT NULL
GROUP BY event, wait_class
ORDER BY sample_count DESC
FETCH FIRST 20 ROWS ONLY;
ASH: Finding Blocking Sessions
-- Find sessions causing locks
SELECT
blocking_session,
blocking_session_serial#,
COUNT(DISTINCT session_id) AS blocked_sessions,
COUNT(*) AS total_blocking_samples,
LISTAGG(DISTINCT event, ', ') WITHIN GROUP (ORDER BY event) AS wait_events
FROM v$active_session_history
WHERE blocking_session IS NOT NULL
AND sample_time > SYSDATE - INTERVAL '30' MINUTE
GROUP BY blocking_session, blocking_session_serial#
ORDER BY total_blocking_samples DESC;
4. SQL Monitor: Real-Time Query Execution Visibility
SQL Monitor tracks long-running or resource-intensive queries in real-time. It shows you the execution plan with actual row counts, timing, and wait events for each plan operation.
What Makes SQL Monitor Different
| Feature |
EXPLAIN PLAN |
SQL Monitor |
| Shows predicted plan |
Yes |
Yes |
| Shows actual execution |
No |
Yes |
| Actual row counts |
No (estimated) |
Yes (actual) |
| Wait events per operation |
No |
Yes |
| Real-time progress |
No |
Yes |
-- Enable SQL monitoring for a specific query
ALTER SESSION SET STATISTICS_LEVEL = ALL;
-- Or use hint for specific query
SELECT /*+ MONITOR */
employee_id, first_name, last_name
FROM employees
WHERE department_id = 50;
-- List monitored SQL statements
SELECT
sql_id,
status,
username,
TO_CHAR(sql_exec_start, 'YYYY-MM-DD HH24:MI:SS') AS exec_start,
elapsed_time/1000000 AS elapsed_sec,
cpu_time/1000000 AS cpu_sec,
buffer_gets,
disk_reads
FROM v$sql_monitor
WHERE sql_exec_start > SYSDATE - 1
ORDER BY sql_exec_start DESC;
-- Generate SQL Monitor report (HTML format)
SELECT DBMS_SQLTUNE.REPORT_SQL_MONITOR(
sql_id => 'a1b2c3d4e5f6g',
type => 'HTML',
report_level => 'ALL'
) AS report
FROM DUAL;
Reading SQL Monitor Reports
Key sections to analyze:
- Plan Statistics: Compare estimated vs actual rows (cardinality misestimates indicate stale statistics)
- Activity Breakdown: Which plan operations consumed most time
- Wait Events: What each operation waited for (I/O, CPU, locks)
- Parallel Execution: Are parallel queries balanced across slaves
5. Troubleshooting Scenario #1: CPU Bottleneck
Symptom: Users report application slowness. AWR report shows 80% of database time is CPU.
Step 1: Confirm CPU Saturation
-- OS CPU usage from AWR
SELECT
stat_name,
value
FROM dba_hist_osstat
WHERE snap_id = (SELECT MAX(snap_id) FROM dba_hist_snapshot)
AND stat_name IN ('NUM_CPUS', 'LOAD', 'BUSY_TIME', 'IDLE_TIME');
-- Top SQL by CPU time
SELECT
sql_id,
plan_hash_value,
executions_delta AS executions,
ROUND(cpu_time_delta/1000000, 2) AS cpu_sec,
ROUND((cpu_time_delta/1000000)/executions_delta, 4) AS cpu_per_exec
FROM dba_hist_sqlstat
WHERE snap_id BETWEEN 12345 AND 12350
AND executions_delta > 0
ORDER BY cpu_time_delta DESC
FETCH FIRST 20 ROWS ONLY;
Step 2: Identify CPU-Intensive SQL
Common causes of high CPU:
- ❌ Inefficient SQL: Full table scans on large tables
- ❌ Parse storms: Excessive hard parsing (no bind variables)
- ❌ Excessive function calls: PL/SQL functions in SQL WHERE clauses
- ❌ Too many connections: Each session consumes CPU
Step 3: Resolution
Quick wins:
- ✅ Add missing indexes to eliminate full table scans
- ✅ Convert literals to bind variables to reduce parsing
- ✅ Implement connection pooling to reduce session overhead
- ✅ Rewrite SQL to avoid expensive functions in WHERE clauses
6. Troubleshooting Scenario #2: I/O Wait Analysis
Symptom: AWR report shows 70% wait time in "db file sequential read" and "db file scattered read".
Understanding I/O Wait Events
| Wait Event |
What It Means |
Common Causes |
| db file sequential read |
Single-block read (index scan) |
Missing indexes, inefficient indexes |
| db file scattered read |
Multi-block read (full scan) |
Missing indexes, small tables (full scan OK) |
| direct path read |
Bypassing buffer cache |
Large parallel queries, LOBs |
| read by other session |
Waiting for block being read |
Hot blocks, buffer busy waits |
-- Top SQL by physical reads
SELECT
sql_id,
plan_hash_value,
executions_delta AS executions,
disk_reads_delta AS physical_reads,
ROUND(disk_reads_delta/NULLIF(executions_delta,0), 2) AS reads_per_exec,
ROUND((disk_reads_delta * 8192)/1024/1024, 2) AS data_read_mb
FROM dba_hist_sqlstat
WHERE snap_id BETWEEN 12345 AND 12350
AND executions_delta > 0
ORDER BY disk_reads_delta DESC
FETCH FIRST 20 ROWS ONLY;
Identifying Which Objects Cause I/O
-- Which tables/indexes cause most I/O
SELECT
owner,
object_name,
object_type,
SUM(physical_reads_delta) AS total_reads,
ROUND(SUM(physical_reads_delta * 8192)/1024/1024, 2) AS data_read_mb
FROM dba_hist_seg_stat s
JOIN dba_hist_seg_stat_obj o ON s.obj# = o.obj# AND s.dataobj# = o.dataobj#
WHERE snap_id BETWEEN 12345 AND 12350
GROUP BY owner, object_name, object_type
ORDER BY total_reads DESC
FETCH FIRST 20 ROWS ONLY;
7. Troubleshooting Scenario #3: Lock Contention
Symptom: Users report "hanging" transactions. AWR shows high "enq: TX - row lock contention" wait time.
-- Currently blocked sessions
SELECT
blocking_session,
sid AS blocked_sid,
serial#,
username,
sql_id,
event,
wait_time,
seconds_in_wait,
state
FROM v$session
WHERE blocking_session IS NOT NULL
ORDER BY seconds_in_wait DESC;
-- What the blocking session is doing
SELECT
s.sid,
s.serial#,
s.username,
s.status,
s.sql_id,
sq.sql_text
FROM v$session s
LEFT JOIN v$sql sq ON s.sql_id = sq.sql_id
WHERE s.sid IN (
SELECT DISTINCT blocking_session
FROM v$session
WHERE blocking_session IS NOT NULL
);
Historical Lock Analysis with ASH
-- Which SQL statements caused locks (last 2 hours)
SELECT
sql_id,
blocking_session,
COUNT(*) AS lock_samples,
ROUND(COUNT(*) * 1, 2) AS approx_wait_seconds,
MIN(sample_time) AS first_seen,
MAX(sample_time) AS last_seen
FROM dba_hist_active_sess_history
WHERE sample_time > SYSDATE - INTERVAL '2' HOUR
AND event LIKE '%enq: TX%'
AND blocking_session IS NOT NULL
GROUP BY sql_id, blocking_session
ORDER BY lock_samples DESC;
8. Essential Diagnostic Queries Every DBA Needs
These are the queries I run most frequently when troubleshooting performance issues.
-- Comprehensive health check
SELECT
'Active Sessions' AS metric,
COUNT(*) AS value
FROM v$session WHERE status = 'ACTIVE'
UNION ALL
SELECT
'Buffer Cache Hit Ratio',
ROUND((1 - (phy.value / (cur.value + con.value))) * 100, 2)
FROM v$sysstat cur, v$sysstat con, v$sysstat phy
WHERE cur.name = 'db block gets'
AND con.name = 'consistent gets'
AND phy.name = 'physical reads'
UNION ALL
SELECT
'Library Cache Hit Ratio',
ROUND((SUM(pins) - SUM(reloads)) / SUM(pins) * 100, 2)
FROM v$librarycache
UNION ALL
SELECT
'PGA Memory Used (MB)',
ROUND(SUM(value)/1024/1024, 2)
FROM v$sesstat s, v$statname n
WHERE s.statistic# = n.statistic#
AND n.name = 'session pga memory';
-- What's running slow right now
SELECT
sql_id,
child_number,
executions,
ROUND(elapsed_time/1000000, 2) AS elapsed_sec,
ROUND(cpu_time/1000000, 2) AS cpu_sec,
buffer_gets,
disk_reads,
ROUND(elapsed_time/NULLIF(executions,0)/1000000, 4) AS sec_per_exec
FROM v$sql
WHERE executions > 0
AND elapsed_time > 1000000 -- More than 1 second total
ORDER BY elapsed_time DESC
FETCH FIRST 10 ROWS ONLY;
9. FAQ
Do I need a Diagnostics Pack license to use AWR?
Yes. AWR requires Oracle Diagnostics Pack, which is a separately licensed Enterprise Edition option. If you don't have this license, Oracle can audit and charge retroactively for every CPU core where AWR was used. Alternative: Use Statspack (free) which provides similar functionality with less detail. To check if Diagnostics Pack is licensed, run: SELECT * FROM v$option WHERE parameter = 'Diagnostics Pack'. If you're using Standard Edition, AWR is not available at all—use Statspack instead.
What's the difference between ASH and AWR?
AWR is a time-series data warehouse with hourly snapshots showing aggregate performance metrics. ASH is a real-time sampling tool that captures what every active session is doing every second. Think of it this way: AWR = "What happened over the last hour?" ASH = "What's happening right now with this specific session?" Use AWR for historical analysis and trend identification. Use ASH for active troubleshooting and session-level diagnostics. ASH data is eventually rolled up into AWR, so AWR contains ASH history but with less granularity.
How long should I keep AWR snapshots?
Default is 8 days (11,520 minutes). For production databases, I recommend 30-45 days (43,200-64,800 minutes). This gives you enough history for month-over-month comparisons and investigating issues that occurred weeks ago. Storage cost is minimal—AWR snapshots typically use 50-200MB per day. For critical systems, consider 60-90 days. Beyond 90 days, diminishing returns—you're more likely referencing different workload patterns. Use DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS to adjust retention.
Should I mention AWR/ASH experience on my DBA resume?
Absolutely. Write: "Expert-level proficiency in Oracle performance diagnostics using AWR reports, ASH queries, and SQL Monitor for production troubleshooting. Reduced query execution time by X% through systematic analysis of wait events, execution plans, and system statistics." These tools are fundamental to Oracle DBA work—not mentioning them is like a developer not mentioning they know Git. Include specific examples: "Identified and resolved buffer busy wait contention using ASH historical data, reducing peak load response time from 8 seconds to 1.2 seconds."
10. Related Reading from Real Production Systems
About the Author
Chetan Yadav
Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL, and Cloud DBA with 15+ years of hands-on experience managing production databases across on-premises, hybrid, and cloud environments. He specializes in high availability architecture, performance tuning, disaster recovery, and database migrations.
Throughout his career, Chetan has resolved hundreds of production performance issues using AWR, ASH, and SQL Monitor. He has optimized mission-critical Oracle databases serving millions of transactions daily in finance, healthcare, and e-commerce sectors.
Chetan is passionate about mentoring early-career DBAs and sharing real-world production lessons that aren't found in documentation. His writing focuses on practical decision-making, career growth, and systematic troubleshooting methodologies that work under pressure.
This blog focuses on real-world DBA problems, career growth, and practical learning — not theoretical documentation or vendor marketing.
No comments:
Post a Comment