Chetan Yadav: How to Fix Data Guard Lag in Oracle 19c (Step-by-Step Troubleshooting Guide)

How to Fix Data Guard Lag in Oracle 19c: Step-by-Step Troubleshooting Guide

5 Proven Fixes for Transport Lag and Apply Lag with Exact SQL Commands

 March 2026

 Chetan Yadav, Senior Oracle & Cloud DBA

⏱️ 12 - 14 min read

⏱️ Estimated Reading Time: 12 - 14 minutes

Missing SRLs, Parallel Apply, Network Compression, RMAN Conflict, Protection Mode Mismatch

Oracle Data Guard lag fix decision flowchart with quick diagnosis panel and fix reference table for Oracle 19c

⚙️ Environment Referenced

Oracle Database: 19.18.0.0.0 Enterprise Edition • Standby Type: Physical Standby (Active Data Guard) • Protection Mode: Maximum Availability (SYNC/AFFIRM)
Primary: 2-Node RAC, 4.8 TB OLTP • Network: Dedicated 1 GbE WAN, RTT 1.8 ms • Peak Load: 2,800 TPS

Data Guard lag is one of the most stressful production alerts a DBA receives. The standby is falling behind the primary. Every second of lag is a second of potential data loss if the primary fails right now. The pressure to fix it quickly is real.

The problem is that "Data Guard lag" is not one problem. It is five different problems that all show the same symptom. Applying the wrong fix wastes time and can make things worse. This guide gives you the exact decision path, the exact diagnostic queries, and the exact fix commands for each root cause, in the order you should check them.

Follow the steps in order. Each step either identifies your problem and gives you the fix, or clears that cause and moves you to the next. Most Data Guard lag issues are resolved within Steps 1 to 3.

 Table of Contents

Before You Start: Run These 4 Queries First
Step 1: Check for Transport Lag vs Apply Lag
Step 2: Fix Missing Standby Redo Logs (Most Common Root Cause)
Step 3: Enable Parallel Apply to Fix MRP Bottleneck
Step 4: Fix Network Latency and Bandwidth
Step 5: Resolve RMAN vs MRP I/O Conflict
Step 6: Fix Protection Mode Mismatch
Fix for Active Data Guard (ADG) Apply Lag
After the Fix: Verify and Monitor
Prevent Lag From Returning: Configuration Checklist
FAQ
Related Reading

1. Before You Start: Run These 4 Queries First

Do not start applying fixes before running diagnostics. Two minutes of diagnosis saves two hours of incorrect troubleshooting.

SQL, 4 Mandatory Diagnostic Queries (Run Before Anything Else)

-- QUERY 1: Identify lag type (run on PRIMARY)
-- This tells you which direction to go
SELECT name,
       value,
       time_computed
FROM   v$dataguard_stats
WHERE  name IN (
    'transport lag',
    'apply lag',
    'apply finish time'
)
ORDER  BY name;

-- QUERY 2: Check Standby Redo Logs (run on STANDBY)
-- No rows = missing SRLs = most common cause of apply lag
SELECT group#,
       thread#,
       bytes/1024/1024  AS size_mb,
       archived,
       status
FROM   v$standby_log
ORDER  BY thread#, group#;

-- QUERY 3: Check MRP parallel workers (run on STANDBY)
-- Only MRP0 with no PR* processes = parallel apply disabled
SELECT process,
       status,
       thread#,
       sequence#
FROM   v$managed_standby
WHERE  process LIKE 'MRP%'
OR     process LIKE 'PR%'
ORDER  BY process;

-- QUERY 4: Check for archive gap (run on PRIMARY)
-- Any rows = URGENT, fix this before everything else
SELECT thread#,
       low_sequence#,
       high_sequence#
FROM   v$archive_gap
ORDER  BY thread#;

Read Query 4 Result First:

If Query 4 returns any rows, you have a sequence gap. This means the standby is missing archived redo logs and cannot apply continuously. Fix the gap by manually copying the missing archives from the primary before doing anything else. A sequence gap takes priority over every other fix.

2. Step 1: Check Transport Lag vs Apply Lag

Identify Which Type of Lag You Have

From Query 1 output:

Transport lag high Redo is not reaching the standby fast enough. Go to Step 4 (network) or check primary I/O.
Apply lag high, transport lag low Redo is arriving fine but MRP cannot apply it. Go to Step 2 (SRLs) then Step 3 (parallel apply).
Both high Check SRLs first (Step 2), then network (Step 4).

Query 1 Result	What It Means	Go To
transport lag: +00 05:00:00, apply lag: +00 00:10:00	Network or primary I/O problem	Step 4
transport lag: +00 00:00:00, apply lag: +02 00:00:00	MRP apply bottleneck	Step 2 then Step 3
transport lag: +00 01:00:00, apply lag: +01 00:00:00	Multiple issues	Step 2, then Step 3, then Step 4
Both show +00 00:00:00	Lag cleared or intermittent	Step 9 (verify and monitor)

3. Step 2: Fix Missing Standby Redo Logs

Check This First Missing Standby Redo Logs (SRLs) is the single most common cause of unexpectedly high apply lag in new or recently rebuilt Data Guard environments. Without SRLs, real-time apply cannot function. The standby only applies redo when archive logs are shipped, typically once per hour, creating a massive lag gap.

Diagnose and Fix Missing SRLs

SQL, Check and Create Standby Redo Logs

-- Step 2a: Check primary online redo log size (run on PRIMARY)
-- SRLs must match this exact size
SELECT group#,
       members,
       bytes/1024/1024  AS size_mb,
       status
FROM   v$log
ORDER  BY group#;

-- Step 2b: Check SRLs on standby (run on STANDBY)
-- If no rows returned = SRLs are missing
SELECT group#,
       thread#,
       bytes/1024/1024  AS size_mb,
       archived,
       status
FROM   v$standby_log
ORDER  BY thread#, group#;

-- Step 2c: Create SRLs if missing (run on STANDBY)
-- Rules: same size as online redo logs
-- Need (online_log_groups_per_thread + 1) SRL groups per thread
-- Example: Primary has 3 redo groups per thread, 2 threads (RAC)
-- Need 4 SRL groups for thread 1, 4 for thread 2

-- For ASM storage (recommended):
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1
    GROUP 11 ('+DATA') SIZE 500M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1
    GROUP 12 ('+DATA') SIZE 500M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1
    GROUP 13 ('+DATA') SIZE 500M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1
    GROUP 14 ('+DATA') SIZE 500M;

-- For single-instance primary (only thread 1 needed):
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1
    GROUP 11 ('+DATA') SIZE 500M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1
    GROUP 12 ('+DATA') SIZE 500M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1
    GROUP 13 ('+DATA') SIZE 500M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1
    GROUP 14 ('+DATA') SIZE 500M;

-- Step 2d: Verify SRLs created and MRP switches to real-time apply
SELECT group#,
       thread#,
       bytes/1024/1024  AS size_mb,
       status
FROM   v$standby_log
ORDER  BY thread#, group#;

-- Check MRP switched to WAIT_FOR_LOG (real-time) not WAIT_FOR_GAP
SELECT process,
       status,
       sequence#
FROM   v$managed_standby
WHERE  process = 'MRP0';

Expected Result After Creating SRLs:

Within 2,5 minutes of creating SRLs, MRP status should change from WAIT_FOR_GAP to WAIT_FOR_LOG. Apply lag should begin dropping immediately and typically reaches under 60 seconds within 10,15 minutes if the only issue was missing SRLs. In one production environment, apply lag dropped from 8 hours to 42 seconds within 12 minutes of SRL creation.

4. Step 3: Enable Parallel Apply

If SRLs are present and correctly sized but apply lag is still growing, the MRP process is the bottleneck. By default, MRP is single-threaded. Enabling parallel apply allows MRP to use multiple CPU cores to apply redo simultaneously, increasing apply throughput by 3,8x on most workloads.

Enable Parallel Apply on the Standby

SQL, Enable Parallel Apply and Verify

-- Step 3a: Check current apply mode (run on STANDBY)
-- If only MRP0 shown and no PR* processes = single-threaded
SELECT process,
       status,
       thread#,
       sequence#
FROM   v$managed_standby
WHERE  process LIKE 'MRP%'
OR     process LIKE 'PR%'
ORDER  BY process;

-- Step 3b: Stop current MRP (run on STANDBY)
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

-- Verify MRP stopped
SELECT process, status
FROM   v$managed_standby
WHERE  process LIKE 'MRP%';

-- Step 3c: Restart with parallel apply
-- Set PARALLEL to match standby CPU count (start with CPU_count/2)
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
    PARALLEL 8
    USING CURRENT LOGFILE
    DISCONNECT FROM SESSION;

-- Step 3d: Verify parallel workers started
-- Should see MRP0 + PR00, PR01, PR02... workers
SELECT process,
       status,
       thread#,
       sequence#
FROM   v$managed_standby
WHERE  process LIKE 'MRP%'
OR     process LIKE 'PR%'
ORDER  BY process;

-- Step 3e: Monitor apply lag after enabling parallel apply
SELECT name,
       value,
       time_computed
FROM   v$dataguard_stats
WHERE  name IN ('transport lag', 'apply lag')
ORDER  BY name;

How Many Parallel Apply Workers to Set?

Start with half your standby CPU count. For an 8-CPU standby, start with PARALLEL 4. Monitor apply lag and CPU usage. If apply lag is still growing and CPU has headroom, increase to PARALLEL 8. Do not set parallel degree higher than CPU count. If the standby runs Active Data Guard (read workload), leave at least 25% CPU for read sessions.

5. Step 4: Fix Network Latency and Bandwidth

If transport lag is high and apply lag is low, the problem is between the primary and the standby. The redo is being generated but not arriving fast enough.

Reduce Redo Transport Bandwidth Usage

SQL, Enable Redo Log Compression and Fix Transport Config

-- Step 4a: Check current transport configuration (run on PRIMARY)
SELECT dest_id,
       dest_name,
       status,
       affirm,
       compression,
       net_timeout,
       async_blocks,
       error
FROM   v$archive_dest
WHERE  target  = 'STANDBY'
AND    status  = 'VALID';

-- Step 4b: Enable compression on redo transport (run on PRIMARY)
-- Reduces bandwidth by 30-70% at cost of CPU on both sides
ALTER SYSTEM SET log_archive_dest_2 =
    'SERVICE=standby_db
      SYNC
      AFFIRM
      COMPRESSION=ENABLE
      NET_TIMEOUT=30
      VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
      DB_UNIQUE_NAME=standby_db'
SCOPE=BOTH SID='*';

-- Step 4c: Verify compression is now enabled
SELECT dest_id,
       compression,
       status,
       error
FROM   v$archive_dest
WHERE  target = 'STANDBY'
AND    status = 'VALID';

-- Step 4d: Check current network write latency
SELECT dest_id,
       srl_network_writes,
       ROUND(srl_network_write_time /
             NULLIF(srl_network_writes,0), 3)  AS avg_network_write_ms
FROM   v$standby_log_stats
ORDER  BY dest_id;

-- OS level network test (run from primary server shell)
-- ping -c 50 standby_host_ip
-- iperf3 -c standby_host_ip -t 30 -P 4

6. Step 5: Resolve RMAN vs MRP I/O Conflict

A very common but often overlooked cause of apply lag spikes: RMAN backup running on the standby at the same time as peak MRP apply activity. Both compete for the same disk I/O. RMAN is aggressive by default and can starve MRP of the I/O it needs.

Fix RMAN and MRP I/O Conflict

SQL, Detect and Fix RMAN vs MRP I/O Conflict

-- Step 5a: Check if RMAN is running during lag spike (run on STANDBY)
SELECT session_key,
       input_type,
       status,
       TO_CHAR(start_time, 'YYYY-MM-DD HH24:MI')  AS start_time,
       TO_CHAR(end_time,   'YYYY-MM-DD HH24:MI')  AS end_time,
       input_bytes_display,
       output_bytes_display
FROM   v$rman_backup_job_details
WHERE  start_time > SYSDATE - 2
ORDER  BY start_time DESC;

-- Step 5b: Check datafile I/O latency on standby
-- High write time during backup = RMAN starving MRP
SELECT d.name,
       ROUND(f.writetim / NULLIF(f.phywrts,0), 2)  AS avg_write_ms,
       f.phyrds,
       f.phywrts
FROM   v$filestat  f
JOIN   v$datafile  d ON f.file# = d.file#
WHERE  f.phywrts   > 1000
ORDER  BY avg_write_ms DESC
FETCH FIRST 10 ROWS ONLY;

-- Step 5c: Throttle RMAN to protect MRP I/O (run on STANDBY)
-- Connect RMAN and set channel throttle:
-- RMAN> RUN {
--   ALLOCATE CHANNEL c1 DEVICE TYPE DISK
--       RATE 100M;   -- limit to 100MB/sec
--   BACKUP DATABASE;
-- }

-- Step 5d: Use DBMS_SCHEDULER to move backup to off-peak window
-- Check current backup schedule
SELECT job_name,
       enabled,
       next_run_date,
       last_start_date
FROM   dba_scheduler_jobs
WHERE  job_name LIKE '%RMAN%'
OR     job_name LIKE '%BACKUP%'
ORDER  BY next_run_date;

7. Step 6: Fix Protection Mode Mismatch

A silent lag builder. The protection mode shown in v$database has two columns: protection_mode (what you configured) and protection_level (what is actually in effect). If the standby was temporarily unreachable, the primary may have silently fallen back to a lower level. This can cause steady, unexplained lag accumulation.

SQL, Check and Restore Protection Mode

-- Check effective protection mode (run on PRIMARY)
-- If these two values differ, primary fell back silently
SELECT name,
       db_unique_name,
       protection_mode,    -- what you configured
       protection_level    -- what is actually active
FROM   v$database;

-- Check archive destination status for errors
SELECT dest_id,
       status,
       error,
       gap_status,
       db_unique_name
FROM   v$archive_dest_status
WHERE  target    = 'STANDBY'
AND    status   != 'INACTIVE';

-- Restore protection mode via DGMGRL (after standby reconnects)
-- dgmgrl /
-- DGMGRL> SHOW CONFIGURATION;
-- DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
-- DGMGRL> SHOW CONFIGURATION;

-- Verify protection_level now matches protection_mode
SELECT protection_mode,
       protection_level
FROM   v$database;

8. Fix for Active Data Guard (ADG) Apply Lag

If your standby runs Active Data Guard (open read-only while applying), read queries from reporting users compete with MRP for I/O. During heavy reporting workloads, MRP can be starved of I/O and apply lag grows even when all other configuration is correct.

SQL, Protect MRP from ADG Read Workload Using I/O Resource Manager

-- Check open mode on standby
SELECT open_mode,
       database_role
FROM   v$database;

-- Monitor MRP vs read session I/O competition
SELECT s.program,
       s.status,
       COUNT(*)  AS sessions
FROM   v$session s
WHERE  s.type = 'USER'
OR     s.program LIKE '%MRP%'
GROUP  BY s.program, s.status
ORDER  BY sessions DESC
FETCH FIRST 10 ROWS ONLY;

-- Apply lag while ADG read load active
SELECT name, value
FROM   v$dataguard_stats
WHERE  name IN ('transport lag', 'apply lag');

-- Create I/O Resource Manager plan to protect MRP (run on STANDBY)
BEGIN
    DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA();
    DBMS_RESOURCE_MANAGER.CREATE_PLAN(
        plan    => 'STANDBY_IO_PLAN',
        comment => 'Protect MRP apply from ADG read workload'
    );
    DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
        plan             => 'STANDBY_IO_PLAN',
        group_or_subplan => '_ORACLE_BACKGROUND_GROUP_',
        comment          => 'MRP gets priority I/O',
        mgmt_p1          => 80
    );
    DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
END;
/

ALTER SYSTEM SET resource_manager_plan = 'STANDBY_IO_PLAN'
SCOPE = BOTH SID = '*';

9. After the Fix: Verify and Monitor

After applying any fix, do not assume the problem is resolved. Monitor lag for at least 30 minutes under production load before declaring the incident closed.

SQL, Post-Fix Verification Queries

-- Run this every 5 minutes after the fix (run on PRIMARY)
SELECT name,
       value,
       time_computed
FROM   v$dataguard_stats
WHERE  name IN (
    'transport lag',
    'apply lag',
    'apply finish time'
)
ORDER  BY name;

-- Confirm no sequence gaps (run on PRIMARY)
SELECT thread#,
       low_sequence#,
       high_sequence#
FROM   v$archive_gap;

-- Confirm MRP running in real-time mode (run on STANDBY)
-- Status must be WAIT_FOR_LOG not WAIT_FOR_GAP
SELECT process,
       status,
       sequence#,
       delay_mins
FROM   v$managed_standby
WHERE  process LIKE 'MRP%'
OR     process LIKE 'PR%'
ORDER  BY process;

-- Full DGMGRL health check
-- dgmgrl /
-- DGMGRL> SHOW CONFIGURATION;
-- DGMGRL> VALIDATE DATABASE VERBOSE standby_db;

10. Prevent Lag From Returning: Configuration Checklist

SRL and Transport Configuration

Standby Redo Logs created with same size as online redo logs
SRL count = online redo log groups per thread + 1 per thread
MRP status shows WAIT_FOR_LOG (real-time apply active)
COMPRESSION=ENABLE set in LOG_ARCHIVE_DEST if WAN bandwidth is a concern
NET_TIMEOUT set appropriately for WAN latency (30,60 seconds)

Apply Performance

Parallel apply enabled with degree matching standby CPU count
RMAN backup scheduled outside peak MRP apply window
Standby hardware sized equal to or greater than primary
Dedicated I/O path for standby datafiles (not shared with backup)
I/O Resource Manager plan active if ADG read workload is present

Monitoring

Automated alert when transport lag exceeds 30 seconds
Automated alert when apply lag exceeds 2 minutes
Archive gap check runs every 15 minutes
DGMGRL VALIDATE DATABASE runs weekly
Failover tested under production load every quarter

FAQ

Can I create SRLs without stopping the standby database?

Yes. You can add Standby Redo Log groups to a mounted or open standby database without any downtime. The ALTER DATABASE ADD STANDBY LOGFILE command runs online. MRP does not need to be stopped. After creating SRLs, MRP automatically detects them and switches to real-time apply mode. You should verify the switch by checking that MRP status changes from WAIT_FOR_GAP to WAIT_FOR_LOG within a few minutes.

Will enabling parallel apply cause any data consistency risk?

No. Oracle's parallel apply mechanism uses the same redo-based recovery infrastructure as single-threaded MRP. Oracle guarantees transactional consistency regardless of the parallel degree. Redo records within the same transaction are always applied in the correct order. The only consideration is that enabling parallel apply requires briefly cancelling and restarting MRP, which causes a few seconds of apply pause. During that pause, apply lag may increase slightly before recovering.

How do I fix a sequence gap in Data Guard?

A sequence gap means the standby is missing one or more archived redo log files and cannot apply continuously. First run SELECT * FROM v$archive_gap on the primary to identify the missing sequence numbers. Then manually copy the corresponding archive log files from the primary to the standby using RMAN or OS copy, register them on the standby with ALTER DATABASE REGISTER LOGFILE, and verify MRP resumes applying. If the gap is very old and the archive logs no longer exist on the primary, you may need to rebuild the standby using RMAN active duplication.

Transport lag is zero but apply lag keeps growing slowly every day. Why?

Slow, steady apply lag growth with zero transport lag almost always points to one of three causes: (1) parallel apply is not enabled and MRP is falling behind a high-volume OLTP workload, (2) RMAN backup is running daily and competing with MRP during peak hours, or (3) Active Data Guard read workload is starving MRP I/O. Run the MRP parallel worker check (Query 3) and the RMAN schedule check (Step 5) to identify which one is active in your environment.

About the Author

Chetan Yadav

Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL, and Cloud DBA with 15+ years of hands-on experience managing production databases across on-premises, hybrid, and cloud environments.

Chetan has resolved Data Guard lag incidents across financial and healthcare production environments, including the 8-hour apply lag from missing SRLs and the firewall-induced transport spike described in related guides. His writing focuses on precise diagnosis and exact resolution commands, not generic advice.

This blog focuses on real-world DBA problems, career growth, and practical learning, not theoretical documentation or vendor marketing.

Chetan Yadav

Pages

Tuesday, April 21, 2026

How to Fix Data Guard Lag in Oracle 19c (Step-by-Step Troubleshooting Guide)

How to Fix Data Guard Lag in Oracle 19c: Step-by-Step Troubleshooting Guide

1. Before You Start: Run These 4 Queries First

2. Step 1: Check Transport Lag vs Apply Lag

Identify Which Type of Lag You Have

3. Step 2: Fix Missing Standby Redo Logs

Diagnose and Fix Missing SRLs

4. Step 3: Enable Parallel Apply

Enable Parallel Apply on the Standby

5. Step 4: Fix Network Latency and Bandwidth

Reduce Redo Transport Bandwidth Usage

6. Step 5: Resolve RMAN vs MRP I/O Conflict

Fix RMAN and MRP I/O Conflict

7. Step 6: Fix Protection Mode Mismatch

8. Fix for Active Data Guard (ADG) Apply Lag

9. After the Fix: Verify and Monitor

10. Prevent Lag From Returning: Configuration Checklist

SRL and Transport Configuration

Apply Performance

Monitoring

FAQ

About the Author

No comments:

Post a Comment

Pages

Tuesday, April 21, 2026

How to Fix Data Guard Lag in Oracle 19c (Step-by-Step Troubleshooting Guide)

How to Fix Data Guard Lag in Oracle 19c: Step-by-Step Troubleshooting Guide

1. Before You Start: Run These 4 Queries First

2. Step 1: Check Transport Lag vs Apply Lag

Identify Which Type of Lag You Have

3. Step 2: Fix Missing Standby Redo Logs

Diagnose and Fix Missing SRLs

4. Step 3: Enable Parallel Apply

Enable Parallel Apply on the Standby

5. Step 4: Fix Network Latency and Bandwidth

Reduce Redo Transport Bandwidth Usage

6. Step 5: Resolve RMAN vs MRP I/O Conflict

Fix RMAN and MRP I/O Conflict

7. Step 6: Fix Protection Mode Mismatch

8. Fix for Active Data Guard (ADG) Apply Lag

9. After the Fix: Verify and Monitor

10. Prevent Lag From Returning: Configuration Checklist

SRL and Transport Configuration

Apply Performance

Monitoring

FAQ

Related Reading from Real Production Systems

About the Author

No comments:

Post a Comment