Senior Oracle & Cloud DBA Real-World Databases • Cloud • Reliability • Careers LevelUp Careers Initiative
Monday, June 8, 2026
Oracle High CPU Usage: Causes and Fix in 19c
Monday, June 1, 2026
How to Read Oracle AWR Report in 19c: DBA Guide
How to Read an Oracle AWR Report in 19c
A practical reading order from real production incidents, not a feature tour.
An AWR report is a snapshot of where your instance spent its time. Reading it in the right order is half the battle.
02:14. The on-call page hit: checkout API p95 had jumped from 180 ms to 4.2 seconds. No errors. No node eviction. No failover. Just a database that had quietly gone slow under a normal load. The first artifact I pulled was a one-hour AWR report, and within four minutes it pointed straight at the cause.
If you have ever stared at a 30-page AWR report and not known where to look first, this guide is for you. Knowing how to read an Oracle AWR report in 19c is not about understanding every section. It is about reading a handful of sections in the right order so you can go from "the database is slow" to "this SQL on this object is the problem" in minutes. That is exactly what I did at 02:14, and it is the workflow I will walk you through here.
AWR (Automatic Workload Repository) takes regular snapshots of performance statistics and stores them in the SYSAUX tablespace. A report compares two snapshots and shows you the delta: what the instance did, where it waited, and which statements drove the load. The trick is to stop reading top to bottom and start reading by importance.
Junior and mid-level DBAs who can generate an AWR report but freeze when it comes to interpreting it, and senior engineers who want a tighter triage checklist for incidents. Examples use Oracle 19c, but the reading order applies to 12c and 18c as well.
Monday, May 25, 2026
Standby Redo Logs Not Applying Oracle Data Guard Fix
Standby Redo Logs Not Applying in Oracle Data Guard: Complete Fix Guide
MRP process troubleshooting, SRL configuration, apply lag resolution, and parallel apply tuning from 15 years of production Oracle environments.
It was 2:41 AM. PagerDuty fires. The on-call message reads: "Data Guard apply lag 38 minutes and climbing." I SSH into the standby. ORA-16766 stares back at me from the alert log. A quick check of V$MANAGED_STANDBY confirms it -- MRP0 is gone. No standby redo logs are applying. The business had a 4-hour RPO commitment. We had maybe 90 minutes before the DBA team had a very uncomfortable conversation with the CTO.
Standby redo logs not applying in Oracle Data Guard is one of the highest-stress incidents a production DBA faces. It is also one of the most fixable -- if you know the exact diagnostic tree. In this post I am going to walk through every root cause I have encountered across 15 years of Oracle production work, the precise SQL to diagnose each one, and the fix you run to get apply moving again.
Monday, May 11, 2026
MRP Process Not Running in Data Guard: Fix in Oracle 19c
MRP Process Not Running in Data Guard? Fix It Step-by-Step (Oracle 19c)
Oracle Database: 19.18.0.0.0 Enterprise Edition • Primary: 2-Node RAC, 4.8 TB OLTP, 2,800 TPS
Standby: Physical Standby with Active Data Guard enabled
Protection Mode: Maximum Availability (SYNC/AFFIRM) • Broker: Data Guard Broker enabled
The monitoring alert arrived at 2:48 AM: "Standby apply lag crossing 90 minutes." I connected to DGMGRL immediately. SHOW CONFIGURATION confirmed it: the MRP process was not running on the standby. Every transaction committed on the primary for the past 90 minutes was sitting unprocessed in Standby Redo Logs, and the gap was growing by the second.
In my 15 years managing Oracle production environments, a stopped MRP process is one of the most common Data Guard incidents I have resolved. It is not complicated once you know which of the five root causes you are dealing with. The problem is that each cause has a completely different fix, and applying the wrong one wastes critical time.
This guide gives you the exact decision path, the diagnostic commands to identify your specific cause, and the precise fix for each scenario. In most cases the MRP process not running in Data Guard is resolved in under 5 minutes.
Monday, May 4, 2026
ORA-16766 Error in Oracle Data Guard: Causes and Fix (19c Guide)
ORA-16766 Error in Oracle Data Guard: Causes and Fix (Oracle 19c Guide)
Full Error: ORA-16766: Redo Apply is stopped
This error appears in DGMGRL SHOW CONFIGURATION output against the standby database. It means the Managed Recovery Process (MRP) on the standby has stopped and redo is no longer being applied. The standby is diverging from the primary with every passing second.
It was 3:22 AM. The monitoring alert fired: "Data Guard configuration warning — ORA-16766 on standby." Apply lag had jumped from zero to 47 minutes in under an hour. The standby database was alive, connected, receiving redo, but not applying any of it.
ORA-16766 is one of the most common Oracle Data Guard errors in production Oracle 19c environments. It always means the same thing: the MRP process on the standby has stopped. But the reasons it stops, and the correct fix for each reason, are completely different.
This guide covers every root cause of ORA-16766 in Oracle 19c, the exact DGMGRL and SQL commands to diagnose it, and the step-by-step fix commands for each scenario. Most ORA-16766 errors are resolved in under 5 minutes once you know which cause you are dealing with.
Monday, April 27, 2026
How to Fix Data Guard Lag in Oracle 19c (6 Real Production Fixes with SQL)
How to Fix Data Guard Lag in Oracle 19c: Step-by-Step Troubleshooting Guide
Oracle Database: 19.18.0.0.0 Enterprise Edition • Standby Type: Physical Standby (Active Data Guard) • Protection Mode: Maximum Availability (SYNC/AFFIRM)
Primary: 2-Node RAC, 4.8 TB OLTP • Network: Dedicated 1 GbE WAN, RTT 1.8 ms • Peak Load: 2,800 TPS
In Oracle 19c environments, Data Guard lag is one of the most common production issues DBAs face. It is also one of the most stressful alerts a DBA receives. The standby is falling behind the primary. Every second of lag is a second of potential data loss if the primary fails right now. The pressure to fix it quickly is real.
The problem is that "Data Guard lag" is not one problem. It is five different problems that all show the same symptom. Applying the wrong fix wastes time and can make things worse. This guide gives you the exact decision path, the exact diagnostic queries, and the exact fix commands for each root cause, in the order you should check them.
Follow the steps in order. Each step either identifies your problem and gives you the fix, or clears that cause and moves you to the next. Most Data Guard lag issues are resolved within Steps 1 to 3.
Monday, April 6, 2026
Why Data Guard Lag Happens in Production: Sync, I/O and Network Deep Dive
Why Data Guard Lag Happens in Production: Sync, I/O and Network Deep Dive
Oracle Database: 19.18.0.0.0 Enterprise Edition • Primary: 2-Node RAC, 4.8 TB OLTP • Standby: Physical Standby (Active Data Guard)
Protection Mode: Maximum Availability (SYNC/AFFIRM) • Network: Dedicated 1 GbE WAN, 120 km, RTT 1.8 ms
Peak Load: 2,800 TPS, 180 MB/sec redo generation • Application: Core banking transaction processing
The monitoring alert fires at 11:43 PM: "Data Guard apply lag exceeds 900 seconds." Transport lag is 180 seconds. Apply lag is 900 seconds. The standby is 15 minutes behind the primary. If the primary fails right now, 15 minutes of financial transactions are at risk.
This scenario happens in production Data Guard environments more often than most teams admit. The problem looks the same from the outside every time, but the root cause is completely different each time. Transport lag and apply lag each have different causes, different diagnostic queries, and different fixes. Treating them as the same problem wastes hours of investigation.
This guide covers all six real production causes of Data Guard lag, the exact SQL to identify each one, and the specific fix for each. No guesswork. Precise diagnosis first, then precise resolution.
Monday, March 30, 2026
Why Data Guard Lag Happens in Production: Sync, I/O and Network Deep Dive
Why Data Guard Lag Happens in Production: Sync, I/O and Network Deep Dive
Oracle Database: 19.18.0.0.0 Enterprise Edition • Primary: 2-Node RAC, 4.8 TB OLTP • Standby: Physical Standby (Active Data Guard)
Protection Mode: Maximum Availability (SYNC/AFFIRM) • Network: Dedicated 1 GbE WAN (120 km distance, RTT 1.8 ms)
Peak Load: 2,800 TPS, 180 MB/sec redo generation • Application: Core banking transaction processing
The alert arrives at 11:43 PM: "Data Guard apply lag exceeds 900 seconds." The DBA on call opens the monitoring dashboard. Transport lag is 180 seconds. Apply lag is 900 seconds. The standby is 15 minutes behind the primary. If the primary fails right now, 15 minutes of financial transactions could be at risk.
This scenario plays out in production Data Guard environments more often than most teams admit. Lag is not a single problem , it is six different problems that look identical from the outside. Transport lag and apply lag each have completely different root causes, different diagnostic queries, and completely different fixes. Treating them the same wastes hours of investigation time.
This guide covers every real cause of Data Guard lag I have diagnosed in production, the exact SQL to prove which one you are dealing with, and the specific fix for each. No guesswork. No generic advice about "check your network." Precise diagnosis first, then precise resolution.