Oracle High CPU Usage: Causes and Fix in 19c
It is 3:12 AM. PagerDuty fires. The alert reads: Oracle host CPU at 94% for 8 minutes. You log into the server. The database process is consuming nearly every available core. The application is timing out. Users are getting errors. And nobody knows which query started it or when.
This is the oracle high cpu usage fix 19c scenario that every production DBA eventually faces. The frustrating part is not the incident itself — it is that in most cases the fix takes under ten minutes once you know exactly where to look. The problem is that without a structured diagnostic sequence, engineers burn 45 minutes checking the wrong things first.
This guide covers the three root causes behind the majority of Oracle 19c CPU spikes: runaway SQL and full table scans, parallel query overload, and latch contention in the shared pool. For each cause: exact diagnostic SQL, correct and wrong result interpretation, the fix, its trade-off, and the confirmation step.
Cause 1: Runaway SQL and Oracle CPU Spike from Bad Execution Plans
The most common cause of a sudden Oracle CPU spike in 19c is a single SQL statement consuming a disproportionate share of CPU — typically because it is performing a full table scan on a large table, using a suboptimal join method, or has had its execution plan invalidated by a statistics refresh or cursor invalidation event. In Oracle 19c, Adaptive Query Optimization can occasionally select an execution plan that was correct for initial statistics but behaves catastrophically under production data distribution, and the optimizer will not self-correct until the cursor ages out of the shared pool.
The threshold where a runaway SQL becomes a host-level incident rather than a slow query is when it consumes more CPU than the OS can schedule on its available cores, creating a queue of waiting processes. On a 16-core host, a single parallel query with DOP 32 can drive CPU to 100% in under a minute.
Step 1: Find the top CPU-consuming SQL right now.
-- Run this first during the incident SELECT sql_id, ROUND(cpu_time/1000000, 2) AS cpu_seconds, executions, ROUND(cpu_time/NULLIF(executions,0)/1000000,4) AS cpu_per_exec, SUBSTR(sql_text, 1, 80) sql_preview FROM v$sql WHERE last_active_time > SYSDATE - 1/24 ORDER BY cpu_time DESC FETCH FIRST 10 ROWS ONLY;
Correct result: CPU seconds spread across many SQL IDs, no single statement above 20% of total. Wrong result: One SQL ID showing 80%+ of cumulative CPU seconds. That is your runaway statement.
Step 2: Check what is currently running on CPU using ASH.
-- Shows sessions currently burning CPU right now SELECT ash.sql_id, ash.session_id, ash.session_serial#, ash.module, COUNT(*) AS ash_samples FROM v$active_session_history ash WHERE ash.sample_time > SYSDATE - 5/1440 AND ash.session_state = 'ON CPU' GROUP BY ash.sql_id, ash.session_id, ash.session_serial#, ash.module ORDER BY ash_samples DESC;
Correct result: No SQL ID appearing in more than 20% of samples over 5 minutes. Wrong result: One SQL ID dominating ASH samples consistently — this is the active runaway. Note the SESSION_ID and SESSION_SERIAL#.
Fix: Kill the session, then fix the plan.
-- Step A: Kill the runaway session immediately ALTER SYSTEM KILL SESSION '<sid>,<serial#>' IMMEDIATE; -- Step B: Lock the good plan using a SQL Profile -- First, get the sql_id and the plan hash of the good plan from AWR DBMS_SQLTUNE.IMPORT_SQL_PROFILE( sql_text => '<full SQL text>', profile => SQLPROF_ATTR('OPT_PARAM(''optimizer_features_enable'' ''19.1.0'')'), name => 'PROFILE_FIX_RUNAWAY_SQL', force_match => TRUE ); -- Step C: Flush only the bad cursor from shared pool EXEC SYS.DBMS_SHARED_POOL.PURGE('<address>,<hash_value>', 'C');
Cause 2: Parallel Query CPU Overload from Uncontrolled DOP in Oracle 19c
Parallel query is Oracle's most powerful performance feature and one of its most dangerous misuse patterns. In Oracle 19c with PARALLEL_DEGREE_POLICY = AUTO, the optimizer can autonomously decide to execute a full table scan or aggregation in parallel, potentially spawning PX slave processes well beyond what the host hardware can sustain. When the degree of parallelism exceeds the number of available CPU cores, the OS scheduler begins context-switching between PX slaves — and context switching itself consumes CPU, creating a feedback loop where adding more parallel slaves makes the query slower while driving CPU higher.
The specific threshold that changes behavior: on a host with N CPU cores, any query spawning more than N/2 PX slaves will begin competing with itself for CPU scheduler time. At N PX slaves, the overhead typically exceeds the parallelism benefit for most OLTP-adjacent workloads. This behavior is version-specific: 19c with AUTO policy is more aggressive than 12c in its autonomous DOP selection, and the PARALLEL_MIN_TIME_THRESHOLD parameter (default 10 seconds) may be set too low in environments that were migrated from earlier versions.
-- Step 1: Count active PX slaves per coordinator session SELECT qcsid, qcserial#, COUNT(*) AS px_slave_count, server_group FROM v$px_session GROUP BY qcsid, qcserial#, server_group ORDER BY px_slave_count DESC; -- Step 2: Check current parallel parameters SHOW PARAMETER parallel_max_servers; SHOW PARAMETER parallel_degree_policy; SHOW PARAMETER cpu_count;
Correct result: px_slave_count per coordinator below cpu_count / 2. Wrong result: Any coordinator showing px_slave_count above cpu_count — this coordinator is driving a CPU overload event. If parallel_degree_policy is AUTO and cpu_count is 32, a px_slave_count of 64 means the optimizer autonomously doubled available parallelism.
-- Immediate fix: limit DOP system-wide ALTER SYSTEM SET parallel_degree_policy = MANUAL; ALTER SYSTEM SET parallel_max_servers = 16; -- = cpu_count / 2 -- Raise the auto-parallel threshold to prevent future AUTO overselection ALTER SYSTEM SET parallel_min_time_threshold = 30; -- seconds -- Limit DOP for a specific table permanently ALTER TABLE schema.large_table PARALLEL 4; -- Confirm: verify no coordinator now exceeds cpu_count/2 slaves SELECT qcsid, COUNT(*) slave_count FROM v$px_session GROUP BY qcsid ORDER BY slave_count DESC;
Cause 3: Latch Contention and Library Cache CPU Spin in Oracle 19c
Latch contention is the CPU spike cause that most engineers do not consider first, because it does not involve a visible slow query. When multiple sessions compete for the same latch — most commonly the library cache latch or the shared pool latch — Oracle sessions spin in a CPU-burning busy-wait loop before escalating to a sleep. This spin-wait is intentional: it is faster than a context switch for short contention windows. But when contention is sustained, hundreds of sessions spinning simultaneously drive CPU utilization to the ceiling while appearing idle from an application perspective.
In Oracle 19c, the most common trigger for library cache latch contention is hard parsing at high frequency. Hard parsing occurs when a SQL statement cannot be matched to an existing cursor in the shared pool, typically because literals are used instead of bind variables, because the shared pool is undersized for the workload, or because a cursor invalidation event (statistics refresh, DDL on an underlying object) forced mass re-parsing. The shared pool latch protects the memory management structures of the shared pool itself and contends under heavy allocation and deallocation activity.
-- Step 1: Find latches with high sleep rates SELECT name, gets, misses, sleeps, spin_gets, ROUND(sleeps/NULLIF(gets,0)*100, 4) miss_pct FROM v$latch WHERE sleeps > 1000 ORDER BY sleeps DESC FETCH FIRST 10 ROWS ONLY; -- Step 2: Check hard parse rate SELECT name, value FROM v$sysstat WHERE name IN ('parse count (hard)', 'parse count (total)', 'execute count') ORDER BY name;
Correct result: miss_pct below 0.5% for library cache latch. Hard parse count below 5% of execute count. Wrong result: Library cache latch miss_pct above 1%, combined with hard parse count above 20% of execute count. This confirms literal SQL driving hard parses and latch contention.
-- Find SQL with high version counts (literal explosion) SELECT sql_id, version_count, SUBSTR(sql_text, 1, 100) sql_preview FROM v$sqlarea WHERE version_count > 20 ORDER BY version_count DESC FETCH FIRST 10 ROWS ONLY; -- Fix: resize shared pool online ALTER SYSTEM SET shared_pool_size = 2G SCOPE = BOTH; -- Enable cursor sharing for literal SQL (temporary relief) ALTER SYSTEM SET cursor_sharing = FORCE; -- Confirm: flush shared pool then check latch sleeps trend ALTER SYSTEM FLUSH SHARED_POOL; -- Re-run v$latch query above and compare sleeps delta
CURSOR_SHARING = FORCE is a temporary relief measure, not a permanent fix. It causes Oracle to replace literals with system-generated bind variables, which can produce suboptimal plans for queries where the optimizer needs literal values for histogram-based selectivity. The correct permanent fix is to modify the application to use bind variables. Use FORCE only during the incident window, then audit and fix the SQL.
What Most DBAs Miss When Diagnosing Oracle High CPU Usage
The diagnostic step that consistently gets skipped is correlating the CPU spike timestamp with the AWR snapshot boundary. When Oracle CPU usage spikes in production, the instinct is to run live queries against V$ views. But if the spike already peaked and is subsiding, V$ views show the current state — not the peak state. AWR captures a snapshot every 60 minutes by default. The peak CPU event may have started and ended within a single AWR interval, leaving no clean before/after comparison.
The correct approach for any Oracle CPU spike that lasts more than 15 minutes is to immediately take a manual AWR snapshot at the peak:
-- Take snapshot immediately at peak CPU (preserves state) EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT(); -- After incident, generate AWR report between the two snapshots -- Replace snap_id values with actual snapshot IDs SELECT * FROM TABLE( DBMS_WORKLOAD_REPOSITORY.AWR_REPORT_TEXT( l_dbid => &dbid, l_inst_num => 1, l_bid => &begin_snap_id, l_eid => &end_snap_id ) );
The second thing most engineers miss is checking V$OSSTAT for OS-level CPU breakdown. A high Oracle process CPU does not always mean Oracle is the problem — another OS process could be competing. V$OSSTAT shows the split between user CPU, system CPU, and idle time from the database's perspective, and it takes 30 seconds to check.
SELECT stat_name, value, ROUND(value / 1e6, 2) cpu_seconds FROM v$osstat WHERE stat_name IN ( 'NUM_CPUS', 'NUM_CPU_CORES', 'BUSY_TIME', 'IDLE_TIME', 'USER_TIME', 'SYS_TIME' ) ORDER BY stat_name;
Prevention: Monitoring Setup for Oracle CPU Spikes in 19c
Enable automatic AWR baselines for peak periods
A static AWR baseline captures the normal operating range for CPU and other metrics. Any deviation from baseline triggers a meaningful alert rather than a raw threshold alert that fires even during legitimate batch windows.
Run: EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE(start_snap_id => &normal_snap, end_snap_id => &normal_snap_end, baseline_name => 'NORMAL_PEAK');
Set a hard limit on PARALLEL_MAX_SERVERS
The default value is derived from cpu_count and is often too high for OLTP databases that occasionally run ad-hoc analytical queries. Set PARALLEL_MAX_SERVERS to cpu_count at startup and document the rationale. The most commonly skipped step: verifying this parameter survived a database restart after patching. Check it monthly.
Monitor hard parse rate daily via a scheduled query
Add a daily check of hard parse count as a percentage of execute count. Any day where hard parses exceed 10% of executes warrants investigation before it becomes a latch contention incident. Alert threshold: hard parse rate above 15% of execute count. Schedule this check to run at 08:00 daily and email results to the DBA team.
Frequently Asked Questions
V$SQL ordered by CPU_TIME descending for cumulative CPU consumers, or use V$ACTIVE_SESSION_HISTORY filtering on SESSION_STATE='ON CPU' for real-time identification. ASH is the faster diagnostic during an active incident because it shows what is running right now, while V$SQL shows what has consumed the most CPU since cursor load. Both queries are in the diagnostic section above with exact column lists and correct-vs-wrong result interpretation.PARALLEL_DEGREE_POLICY=AUTO can spawn 128 PX slaves for a single query. The OS scheduler cannot run 128 threads on 64 cores simultaneously, and the context-switching overhead itself burns CPU. A well-sized host gives Oracle more cores to misuse with AUTO parallelism. The fix is setting an explicit ceiling with PARALLEL_MAX_SERVERS and switching from AUTO to LIMITED or MANUAL degree policy for OLTP workloads.CURSOR_SHARING=FORCE as temporary relief, and fixing the application to use bind variables.ALTER SYSTEM KILL SESSION ... IMMEDIATE. Apply a SQL Profile to lock a good execution plan using DBMS_SQLTUNE.IMPORT_SQL_PROFILE. Limit parallel degree with ALTER SYSTEM SET PARALLEL_DEGREE_POLICY=MANUAL. Resize the shared pool with ALTER SYSTEM SET SHARED_POOL_SIZE=Xg SCOPE=BOTH. Each change takes effect within seconds. Confirm with the verification queries provided in each section above.Want the Complete Oracle Performance Toolkit?
Drop a comment on this post with the keyword below and I will send you the AWR Analysis Walkthrough PDF — includes the full diagnostic SQL pack, execution plan interpretation guide, and the monitoring dashboard queries I use in production.
Comment: AWR
No comments:
Post a Comment