Tuesday, April 21, 2026

How to Fix Data Guard Lag in Oracle 19c (Step-by-Step Troubleshooting Guide)

How to Fix Data Guard Lag in Oracle 19c: Step-by-Step Troubleshooting Guide

How to Fix Data Guard Lag in Oracle 19c: Step-by-Step Troubleshooting Guide

5 Proven Fixes for Transport Lag and Apply Lag with Exact SQL Commands
 March 2026
 Chetan Yadav, Senior Oracle & Cloud DBA
⏱️ 12 - 14 min read
⏱️ Estimated Reading Time: 12 - 14 minutes
Missing SRLs, Parallel Apply, Network Compression, RMAN Conflict, Protection Mode Mismatch
Oracle Data Guard lag fix decision flowchart with quick diagnosis panel and fix reference table for Oracle 19c
⚙️ Environment Referenced

Oracle Database: 19.18.0.0.0 Enterprise Edition  •  Standby Type: Physical Standby (Active Data Guard)  •  Protection Mode: Maximum Availability (SYNC/AFFIRM)
Primary: 2-Node RAC, 4.8 TB OLTP  •  Network: Dedicated 1 GbE WAN, RTT 1.8 ms  •  Peak Load: 2,800 TPS

Data Guard lag is one of the most stressful production alerts a DBA receives. The standby is falling behind the primary. Every second of lag is a second of potential data loss if the primary fails right now. The pressure to fix it quickly is real.

The problem is that "Data Guard lag" is not one problem. It is five different problems that all show the same symptom. Applying the wrong fix wastes time and can make things worse. This guide gives you the exact decision path, the exact diagnostic queries, and the exact fix commands for each root cause, in the order you should check them.

Follow the steps in order. Each step either identifies your problem and gives you the fix, or clears that cause and moves you to the next. Most Data Guard lag issues are resolved within Steps 1 to 3.

1. Before You Start: Run These 4 Queries First

Do not start applying fixes before running diagnostics. Two minutes of diagnosis saves two hours of incorrect troubleshooting.

SQL, 4 Mandatory Diagnostic Queries (Run Before Anything Else)
-- QUERY 1: Identify lag type (run on PRIMARY) -- This tells you which direction to go SELECT name, value, time_computed FROM v$dataguard_stats WHERE name IN ( 'transport lag', 'apply lag', 'apply finish time' ) ORDER BY name; -- QUERY 2: Check Standby Redo Logs (run on STANDBY) -- No rows = missing SRLs = most common cause of apply lag SELECT group#, thread#, bytes/1024/1024 AS size_mb, archived, status FROM v$standby_log ORDER BY thread#, group#; -- QUERY 3: Check MRP parallel workers (run on STANDBY) -- Only MRP0 with no PR* processes = parallel apply disabled SELECT process, status, thread#, sequence# FROM v$managed_standby WHERE process LIKE 'MRP%' OR process LIKE 'PR%' ORDER BY process; -- QUERY 4: Check for archive gap (run on PRIMARY) -- Any rows = URGENT, fix this before everything else SELECT thread#, low_sequence#, high_sequence# FROM v$archive_gap ORDER BY thread#;
Read Query 4 Result First:

If Query 4 returns any rows, you have a sequence gap. This means the standby is missing archived redo logs and cannot apply continuously. Fix the gap by manually copying the missing archives from the primary before doing anything else. A sequence gap takes priority over every other fix.

2. Step 1: Check Transport Lag vs Apply Lag

1

Identify Which Type of Lag You Have

From Query 1 output:

  • Transport lag high Redo is not reaching the standby fast enough. Go to Step 4 (network) or check primary I/O.
  • Apply lag high, transport lag low Redo is arriving fine but MRP cannot apply it. Go to Step 2 (SRLs) then Step 3 (parallel apply).
  • Both high Check SRLs first (Step 2), then network (Step 4).
Query 1 ResultWhat It MeansGo To
transport lag: +00 05:00:00, apply lag: +00 00:10:00Network or primary I/O problemStep 4
transport lag: +00 00:00:00, apply lag: +02 00:00:00MRP apply bottleneckStep 2 then Step 3
transport lag: +00 01:00:00, apply lag: +01 00:00:00Multiple issuesStep 2, then Step 3, then Step 4
Both show +00 00:00:00Lag cleared or intermittentStep 9 (verify and monitor)

3. Step 2: Fix Missing Standby Redo Logs

Check This First Missing Standby Redo Logs (SRLs) is the single most common cause of unexpectedly high apply lag in new or recently rebuilt Data Guard environments. Without SRLs, real-time apply cannot function. The standby only applies redo when archive logs are shipped, typically once per hour, creating a massive lag gap.

2

Diagnose and Fix Missing SRLs

SQL, Check and Create Standby Redo Logs
-- Step 2a: Check primary online redo log size (run on PRIMARY) -- SRLs must match this exact size SELECT group#, members, bytes/1024/1024 AS size_mb, status FROM v$log ORDER BY group#; -- Step 2b: Check SRLs on standby (run on STANDBY) -- If no rows returned = SRLs are missing SELECT group#, thread#, bytes/1024/1024 AS size_mb, archived, status FROM v$standby_log ORDER BY thread#, group#; -- Step 2c: Create SRLs if missing (run on STANDBY) -- Rules: same size as online redo logs -- Need (online_log_groups_per_thread + 1) SRL groups per thread -- Example: Primary has 3 redo groups per thread, 2 threads (RAC) -- Need 4 SRL groups for thread 1, 4 for thread 2 -- For ASM storage (recommended): ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 11 ('+DATA') SIZE 500M; ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 12 ('+DATA') SIZE 500M; ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 13 ('+DATA') SIZE 500M; ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 14 ('+DATA') SIZE 500M; -- For single-instance primary (only thread 1 needed): ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 11 ('+DATA') SIZE 500M; ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 12 ('+DATA') SIZE 500M; ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 13 ('+DATA') SIZE 500M; ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 14 ('+DATA') SIZE 500M; -- Step 2d: Verify SRLs created and MRP switches to real-time apply SELECT group#, thread#, bytes/1024/1024 AS size_mb, status FROM v$standby_log ORDER BY thread#, group#; -- Check MRP switched to WAIT_FOR_LOG (real-time) not WAIT_FOR_GAP SELECT process, status, sequence# FROM v$managed_standby WHERE process = 'MRP0';
Expected Result After Creating SRLs:

Within 2,5 minutes of creating SRLs, MRP status should change from WAIT_FOR_GAP to WAIT_FOR_LOG. Apply lag should begin dropping immediately and typically reaches under 60 seconds within 10,15 minutes if the only issue was missing SRLs. In one production environment, apply lag dropped from 8 hours to 42 seconds within 12 minutes of SRL creation.

4. Step 3: Enable Parallel Apply

If SRLs are present and correctly sized but apply lag is still growing, the MRP process is the bottleneck. By default, MRP is single-threaded. Enabling parallel apply allows MRP to use multiple CPU cores to apply redo simultaneously, increasing apply throughput by 3,8x on most workloads.

3

Enable Parallel Apply on the Standby

SQL, Enable Parallel Apply and Verify
-- Step 3a: Check current apply mode (run on STANDBY) -- If only MRP0 shown and no PR* processes = single-threaded SELECT process, status, thread#, sequence# FROM v$managed_standby WHERE process LIKE 'MRP%' OR process LIKE 'PR%' ORDER BY process; -- Step 3b: Stop current MRP (run on STANDBY) ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL; -- Verify MRP stopped SELECT process, status FROM v$managed_standby WHERE process LIKE 'MRP%'; -- Step 3c: Restart with parallel apply -- Set PARALLEL to match standby CPU count (start with CPU_count/2) ALTER DATABASE RECOVER MANAGED STANDBY DATABASE PARALLEL 8 USING CURRENT LOGFILE DISCONNECT FROM SESSION; -- Step 3d: Verify parallel workers started -- Should see MRP0 + PR00, PR01, PR02... workers SELECT process, status, thread#, sequence# FROM v$managed_standby WHERE process LIKE 'MRP%' OR process LIKE 'PR%' ORDER BY process; -- Step 3e: Monitor apply lag after enabling parallel apply SELECT name, value, time_computed FROM v$dataguard_stats WHERE name IN ('transport lag', 'apply lag') ORDER BY name;
How Many Parallel Apply Workers to Set?

Start with half your standby CPU count. For an 8-CPU standby, start with PARALLEL 4. Monitor apply lag and CPU usage. If apply lag is still growing and CPU has headroom, increase to PARALLEL 8. Do not set parallel degree higher than CPU count. If the standby runs Active Data Guard (read workload), leave at least 25% CPU for read sessions.

5. Step 4: Fix Network Latency and Bandwidth

If transport lag is high and apply lag is low, the problem is between the primary and the standby. The redo is being generated but not arriving fast enough.

4

Reduce Redo Transport Bandwidth Usage

SQL, Enable Redo Log Compression and Fix Transport Config
-- Step 4a: Check current transport configuration (run on PRIMARY) SELECT dest_id, dest_name, status, affirm, compression, net_timeout, async_blocks, error FROM v$archive_dest WHERE target = 'STANDBY' AND status = 'VALID'; -- Step 4b: Enable compression on redo transport (run on PRIMARY) -- Reduces bandwidth by 30-70% at cost of CPU on both sides ALTER SYSTEM SET log_archive_dest_2 = 'SERVICE=standby_db SYNC AFFIRM COMPRESSION=ENABLE NET_TIMEOUT=30 VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=standby_db' SCOPE=BOTH SID='*'; -- Step 4c: Verify compression is now enabled SELECT dest_id, compression, status, error FROM v$archive_dest WHERE target = 'STANDBY' AND status = 'VALID'; -- Step 4d: Check current network write latency SELECT dest_id, srl_network_writes, ROUND(srl_network_write_time / NULLIF(srl_network_writes,0), 3) AS avg_network_write_ms FROM v$standby_log_stats ORDER BY dest_id; -- OS level network test (run from primary server shell) -- ping -c 50 standby_host_ip -- iperf3 -c standby_host_ip -t 30 -P 4

6. Step 5: Resolve RMAN vs MRP I/O Conflict

A very common but often overlooked cause of apply lag spikes: RMAN backup running on the standby at the same time as peak MRP apply activity. Both compete for the same disk I/O. RMAN is aggressive by default and can starve MRP of the I/O it needs.

5

Fix RMAN and MRP I/O Conflict

SQL, Detect and Fix RMAN vs MRP I/O Conflict
-- Step 5a: Check if RMAN is running during lag spike (run on STANDBY) SELECT session_key, input_type, status, TO_CHAR(start_time, 'YYYY-MM-DD HH24:MI') AS start_time, TO_CHAR(end_time, 'YYYY-MM-DD HH24:MI') AS end_time, input_bytes_display, output_bytes_display FROM v$rman_backup_job_details WHERE start_time > SYSDATE - 2 ORDER BY start_time DESC; -- Step 5b: Check datafile I/O latency on standby -- High write time during backup = RMAN starving MRP SELECT d.name, ROUND(f.writetim / NULLIF(f.phywrts,0), 2) AS avg_write_ms, f.phyrds, f.phywrts FROM v$filestat f JOIN v$datafile d ON f.file# = d.file# WHERE f.phywrts > 1000 ORDER BY avg_write_ms DESC FETCH FIRST 10 ROWS ONLY; -- Step 5c: Throttle RMAN to protect MRP I/O (run on STANDBY) -- Connect RMAN and set channel throttle: -- RMAN> RUN { -- ALLOCATE CHANNEL c1 DEVICE TYPE DISK -- RATE 100M; -- limit to 100MB/sec -- BACKUP DATABASE; -- } -- Step 5d: Use DBMS_SCHEDULER to move backup to off-peak window -- Check current backup schedule SELECT job_name, enabled, next_run_date, last_start_date FROM dba_scheduler_jobs WHERE job_name LIKE '%RMAN%' OR job_name LIKE '%BACKUP%' ORDER BY next_run_date;

7. Step 6: Fix Protection Mode Mismatch

A silent lag builder. The protection mode shown in v$database has two columns: protection_mode (what you configured) and protection_level (what is actually in effect). If the standby was temporarily unreachable, the primary may have silently fallen back to a lower level. This can cause steady, unexplained lag accumulation.

SQL, Check and Restore Protection Mode
-- Check effective protection mode (run on PRIMARY) -- If these two values differ, primary fell back silently SELECT name, db_unique_name, protection_mode, -- what you configured protection_level -- what is actually active FROM v$database; -- Check archive destination status for errors SELECT dest_id, status, error, gap_status, db_unique_name FROM v$archive_dest_status WHERE target = 'STANDBY' AND status != 'INACTIVE'; -- Restore protection mode via DGMGRL (after standby reconnects) -- dgmgrl / -- DGMGRL> SHOW CONFIGURATION; -- DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY; -- DGMGRL> SHOW CONFIGURATION; -- Verify protection_level now matches protection_mode SELECT protection_mode, protection_level FROM v$database;

8. Fix for Active Data Guard (ADG) Apply Lag

If your standby runs Active Data Guard (open read-only while applying), read queries from reporting users compete with MRP for I/O. During heavy reporting workloads, MRP can be starved of I/O and apply lag grows even when all other configuration is correct.

SQL, Protect MRP from ADG Read Workload Using I/O Resource Manager
-- Check open mode on standby SELECT open_mode, database_role FROM v$database; -- Monitor MRP vs read session I/O competition SELECT s.program, s.status, COUNT(*) AS sessions FROM v$session s WHERE s.type = 'USER' OR s.program LIKE '%MRP%' GROUP BY s.program, s.status ORDER BY sessions DESC FETCH FIRST 10 ROWS ONLY; -- Apply lag while ADG read load active SELECT name, value FROM v$dataguard_stats WHERE name IN ('transport lag', 'apply lag'); -- Create I/O Resource Manager plan to protect MRP (run on STANDBY) BEGIN DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA(); DBMS_RESOURCE_MANAGER.CREATE_PLAN( plan => 'STANDBY_IO_PLAN', comment => 'Protect MRP apply from ADG read workload' ); DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE( plan => 'STANDBY_IO_PLAN', group_or_subplan => '_ORACLE_BACKGROUND_GROUP_', comment => 'MRP gets priority I/O', mgmt_p1 => 80 ); DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA(); END; / ALTER SYSTEM SET resource_manager_plan = 'STANDBY_IO_PLAN' SCOPE = BOTH SID = '*';

9. After the Fix: Verify and Monitor

After applying any fix, do not assume the problem is resolved. Monitor lag for at least 30 minutes under production load before declaring the incident closed.

SQL, Post-Fix Verification Queries
-- Run this every 5 minutes after the fix (run on PRIMARY) SELECT name, value, time_computed FROM v$dataguard_stats WHERE name IN ( 'transport lag', 'apply lag', 'apply finish time' ) ORDER BY name; -- Confirm no sequence gaps (run on PRIMARY) SELECT thread#, low_sequence#, high_sequence# FROM v$archive_gap; -- Confirm MRP running in real-time mode (run on STANDBY) -- Status must be WAIT_FOR_LOG not WAIT_FOR_GAP SELECT process, status, sequence#, delay_mins FROM v$managed_standby WHERE process LIKE 'MRP%' OR process LIKE 'PR%' ORDER BY process; -- Full DGMGRL health check -- dgmgrl / -- DGMGRL> SHOW CONFIGURATION; -- DGMGRL> VALIDATE DATABASE VERBOSE standby_db;

10. Prevent Lag From Returning: Configuration Checklist

SRL and Transport Configuration

  • Standby Redo Logs created with same size as online redo logs
  • SRL count = online redo log groups per thread + 1 per thread
  • MRP status shows WAIT_FOR_LOG (real-time apply active)
  • COMPRESSION=ENABLE set in LOG_ARCHIVE_DEST if WAN bandwidth is a concern
  • NET_TIMEOUT set appropriately for WAN latency (30,60 seconds)

Apply Performance

  • Parallel apply enabled with degree matching standby CPU count
  • RMAN backup scheduled outside peak MRP apply window
  • Standby hardware sized equal to or greater than primary
  • Dedicated I/O path for standby datafiles (not shared with backup)
  • I/O Resource Manager plan active if ADG read workload is present

Monitoring

  • Automated alert when transport lag exceeds 30 seconds
  • Automated alert when apply lag exceeds 2 minutes
  • Archive gap check runs every 15 minutes
  • DGMGRL VALIDATE DATABASE runs weekly
  • Failover tested under production load every quarter

FAQ

Can I create SRLs without stopping the standby database?
Yes. You can add Standby Redo Log groups to a mounted or open standby database without any downtime. The ALTER DATABASE ADD STANDBY LOGFILE command runs online. MRP does not need to be stopped. After creating SRLs, MRP automatically detects them and switches to real-time apply mode. You should verify the switch by checking that MRP status changes from WAIT_FOR_GAP to WAIT_FOR_LOG within a few minutes.
Will enabling parallel apply cause any data consistency risk?
No. Oracle's parallel apply mechanism uses the same redo-based recovery infrastructure as single-threaded MRP. Oracle guarantees transactional consistency regardless of the parallel degree. Redo records within the same transaction are always applied in the correct order. The only consideration is that enabling parallel apply requires briefly cancelling and restarting MRP, which causes a few seconds of apply pause. During that pause, apply lag may increase slightly before recovering.
How do I fix a sequence gap in Data Guard?
A sequence gap means the standby is missing one or more archived redo log files and cannot apply continuously. First run SELECT * FROM v$archive_gap on the primary to identify the missing sequence numbers. Then manually copy the corresponding archive log files from the primary to the standby using RMAN or OS copy, register them on the standby with ALTER DATABASE REGISTER LOGFILE, and verify MRP resumes applying. If the gap is very old and the archive logs no longer exist on the primary, you may need to rebuild the standby using RMAN active duplication.
Transport lag is zero but apply lag keeps growing slowly every day. Why?
Slow, steady apply lag growth with zero transport lag almost always points to one of three causes: (1) parallel apply is not enabled and MRP is falling behind a high-volume OLTP workload, (2) RMAN backup is running daily and competing with MRP during peak hours, or (3) Active Data Guard read workload is starving MRP I/O. Run the MRP parallel worker check (Query 3) and the RMAN schedule check (Step 5) to identify which one is active in your environment.

About the Author

Chetan Yadav

Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL, and Cloud DBA with 15+ years of hands-on experience managing production databases across on-premises, hybrid, and cloud environments.

Chetan has resolved Data Guard lag incidents across financial and healthcare production environments, including the 8-hour apply lag from missing SRLs and the firewall-induced transport spike described in related guides. His writing focuses on precise diagnosis and exact resolution commands, not generic advice.

This blog focuses on real-world DBA problems, career growth, and practical learning, not theoretical documentation or vendor marketing.

No comments:

Post a Comment