Monday, May 25, 2026

Standby Redo Logs Not Applying Oracle Data Guard Fix

Standby Redo Logs Not Applying Oracle Data Guard Fix

Standby Redo Logs Not Applying in Oracle Data Guard: Complete Fix Guide

MRP process troubleshooting, SRL configuration, apply lag resolution, and parallel apply tuning from 15 years of production Oracle environments.

Oracle 19c / 21c Data Guard Troubleshooting Senior DBA Production Guide
standby redo logs not applying oracle data guard server room terminal screens
Who This Is For: Junior to Senior Oracle DBAs managing Oracle 19c or 21c Data Guard environments who are troubleshooting standby redo logs not applying, MRP process failures, or unexplained apply lag. Freshers studying for Oracle certification will also benefit from the architecture walkthrough.

It was 2:41 AM. PagerDuty fires. The on-call message reads: "Data Guard apply lag 38 minutes and climbing." I SSH into the standby. ORA-16766 stares back at me from the alert log. A quick check of V$MANAGED_STANDBY confirms it -- MRP0 is gone. No standby redo logs are applying. The business had a 4-hour RPO commitment. We had maybe 90 minutes before the DBA team had a very uncomfortable conversation with the CTO.

Standby redo logs not applying in Oracle Data Guard is one of the highest-stress incidents a production DBA faces. It is also one of the most fixable -- if you know the exact diagnostic tree. In this post I am going to walk through every root cause I have encountered across 15 years of Oracle production work, the precise SQL to diagnose each one, and the fix you run to get apply moving again.

Environment Reference: Oracle 19c (19.19 RU), 2-node RAC primary + single-node physical standby, SYNC transport, 6 TB database, peak 12,000 TPS. All SQL verified in this environment. Steps also apply to Oracle 21c and Oracle 23ai Data Guard configurations.

SRL Architecture: Why Standby Redo Logs Matter

Before you can fix standby redo logs not applying, you need to understand exactly where they sit in the Data Guard pipeline. The Standby Redo Log (SRL) is the buffer between the Remote File Server (RFS) process -- which receives redo from the primary -- and the Managed Recovery Process (MRP), which applies that redo to standby datafiles.

Without SRLs, Oracle falls back to archivelog-based apply, which introduces a structural lag equal to the time between log switches on the primary. On a busy 12,000 TPS system, that lag can balloon to 20+ minutes in under an hour. With SRLs properly configured, redo flows continuously and apply lag stays under 30 seconds.

Oracle Data Guard SRL architecture reference

Figure 1: SRL pipeline from primary Online Redo Logs through RFS to MRP apply engine. See the separately delivered architecture diagram PNG for the detailed flow.

Key Rule: Oracle mandates SRL group count = (online redo log groups) + 1 at minimum. Each SRL member must be at least as large as the largest online redo log member. On a RAC primary with 3 log groups per thread and 2 threads, you need at least (3 x 2) + 1 = 7 SRL groups.

Step 1: Diagnose the Apply Engine

Every standby redo logs not applying incident starts with the same three queries. Run these on the standby database first.

Query 1: MRP and RFS Process Status

SQL -- V$MANAGED_STANDBY
SELECT
    PROCESS,
    STATUS,
    CLIENT_PROCESS,
    SEQUENCE#,
    BLOCK#,
    BLOCKS
FROM  V$MANAGED_STANDBY
WHERE PROCESS IN ('MRP0', 'RFS')
ORDER BY PROCESS;

Interpret the output:

  • MRP0 / APPLYING_LOG -- healthy, redo is being applied
  • MRP0 / WAIT_FOR_LOG -- MRP running but waiting; check for sequence gap
  • MRP0 absent from result -- MRP not started; most common cause of standby redo logs not applying
  • RFS / IDLE -- transport up but no current redo flowing; check primary
  • RFS absent -- redo transport broken; check LOG_ARCHIVE_DEST_2

Query 2: Standby Redo Log Status

SQL -- V$STANDBY_LOG
SELECT
    GROUP#,
    THREAD#,
    SEQUENCE#,
    BYTES / 1048576   AS SIZE_MB,
    STATUS,
    ARCHIVED,
    FIRST_CHANGE#
FROM  V$STANDBY_LOG
ORDER BY GROUP#;

A healthy result shows STATUS = ACTIVE or UNASSIGNED with at least one ACTIVE group per thread. If the query returns zero rows, SRL groups do not exist at all -- jump to Step 2 immediately.

Query 3: Apply Lag and Transport Lag

SQL -- V$DATAGUARD_STATS
SELECT
    NAME,
    VALUE,
    UNIT,
    TIME_COMPUTED
FROM  V$DATAGUARD_STATS
WHERE NAME IN ('apply lag', 'transport lag', 'apply finish time')
ORDER BY NAME;
During the 2:41 AM incident: apply lag = +00 00:38:14, transport lag = +00 00:00:02. Transport lag near zero tells me redo is arriving fine -- the problem is 100% in the apply layer. MRP was stopped by a nightly backup script that ran CANCEL but failed to restart MRP after an error.

Query 4: Archive Destination Status

SQL -- V$ARCHIVE_DEST_STATUS
SELECT
    DEST_ID,
    DEST_NAME,
    STATUS,
    TARGET,
    ARCHIVER,
    SCHEDULE,
    VALID_NOW,
    ERROR
FROM  V$ARCHIVE_DEST_STATUS
WHERE STATUS != 'INACTIVE'
ORDER BY DEST_ID;

STATUS should be VALID. Any value in the ERROR column is your root cause signal. Common: ORA-16058 (standby not mounted), ORA-12541 (TNS no listener), ORA-16057 (standby not in correct state).


Step 2: Create or Fix Standby Redo Logs

Missing or undersized SRL groups cause standby redo logs not applying in roughly 35% of incidents I have handled. This typically happens after a redo log resize on the primary where the standby was not updated to match.

Check Current Online Redo Log Size (Run on Primary)

SQL -- V$LOG primary
SELECT
    GROUP#,
    THREAD#,
    BYTES / 1048576  AS SIZE_MB,
    MEMBERS,
    STATUS
FROM  V$LOG
ORDER BY THREAD#, GROUP#;

Add SRL Groups on the Standby

Target size = primary online redo log size + 10%. For a primary with 50 MB log groups, create 55 MB SRL groups. Run on the standby as SYSDBA:

SQL -- Add SRL Groups (Standby)
-- One SRL group per command; repeat until count = online_groups + 1
-- Adjust paths to match your ASM diskgroup or filesystem

ALTER DATABASE ADD STANDBY LOGFILE GROUP 4
    ('/u01/oradata/ORCL_STBY/srl_4.log')
    SIZE 55M;

ALTER DATABASE ADD STANDBY LOGFILE GROUP 5
    ('/u01/oradata/ORCL_STBY/srl_5.log')
    SIZE 55M;

ALTER DATABASE ADD STANDBY LOGFILE GROUP 6
    ('/u01/oradata/ORCL_STBY/srl_6.log')
    SIZE 55M;

-- ASM example:
-- ALTER DATABASE ADD STANDBY LOGFILE GROUP 4 ('+RECO') SIZE 55M;

Drop and Re-create Undersized SRL Groups

SQL -- Drop Undersized SRL
-- Only drop INACTIVE / UNASSIGNED groups; never drop ACTIVE groups
SELECT GROUP#, STATUS FROM V$STANDBY_LOG;

ALTER DATABASE DROP STANDBY LOGFILE GROUP 4;
ALTER DATABASE DROP STANDBY LOGFILE GROUP 5;

ALTER DATABASE ADD STANDBY LOGFILE GROUP 4
    ('/u01/oradata/ORCL_STBY/srl_4.log') SIZE 55M;

ALTER DATABASE ADD STANDBY LOGFILE GROUP 5
    ('/u01/oradata/ORCL_STBY/srl_5.log') SIZE 55M;
RAC Primary Note: For a RAC primary with 2 threads, add SRL groups for each thread using the THREAD# clause: ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 GROUP 10 .... Without per-thread SRLs, Thread 2 redo transport has no SRL to write into.

Step 3: Start or Restart the MRP Process

The MRP0 process is the engine that applies redo to standby datafiles. It must be explicitly started unless the standby uses broker with fast-start failover configured.

Start MRP in Background (Recommended)

SQL -- Start MRP
-- DISCONNECT FROM SESSION keeps MRP running after you log off
-- Run on standby as SYSDBA

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
    DISCONNECT FROM SESSION;

Stop MRP Cleanly Before Restarting

SQL -- Stop then Restart MRP
-- Use CANCEL to stop; never kill MRP with OS kill -9
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

-- Wait 30 seconds; confirm MRP0 is gone:
SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY
WHERE  PROCESS = 'MRP0';

-- Restart:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
    DISCONNECT FROM SESSION;
Root Cause I Have Seen Repeatedly: A nightly RMAN backup script on the standby that runs CANCEL to pause apply during backup, but fails to restart MRP if the backup errors out. The result is standby redo logs not applying all night, discovered at morning alerting. Fix: wrap the backup script in a shell error trap that always restarts MRP regardless of backup exit code.

Verify MRP Started Correctly

SQL -- Verify MRP
SELECT
    PROCESS,
    STATUS,
    SEQUENCE#,
    BLOCK#,
    DELAY_MINS
FROM  V$MANAGED_STANDBY
ORDER BY PROCESS;

-- Expected: MRP0 STATUS=APPLYING_LOG, SEQUENCE# incrementing

Step 4: Fix LOG_ARCHIVE_DEST Configuration

A misconfigured LOG_ARCHIVE_DEST_2 on the primary prevents redo from reaching the standby entirely. The most common error I see is an incorrect VALID_FOR clause that excludes the current redo stream.

Check Current Configuration (Primary)

SQL -- Check LOG_ARCHIVE_DEST_2
SHOW PARAMETER log_archive_dest_2;

SELECT DEST_ID, STATUS, TARGET, ARCHIVER, VALID_NOW, ERROR
FROM  V$ARCHIVE_DEST_STATUS
WHERE DEST_ID = 2;

Wrong Configuration

  • VALID_FOR=(ONLINE_LOGFILES, ALL_ROLES)
  • VALID_FOR=(ALL_LOGFILES, STANDBY_ROLE) excludes primary role
  • No DB_UNIQUE_NAME when broker is in use
  • REOPEN=0 disables retry on transport failure

Correct Configuration

  • VALID_FOR=(ALL_LOGFILES, PRIMARY_ROLE)
  • DB_UNIQUE_NAME= set to standby unique name
  • REOPEN=300 retries every 5 minutes on failure
  • SERVICE= matches TNS alias for standby
SQL -- Fix LOG_ARCHIVE_DEST_2 (Primary)
-- Run on PRIMARY as SYSDBA
-- Adjust SERVICE and DB_UNIQUE_NAME to match your environment

ALTER SYSTEM SET log_archive_dest_2 =
    'SERVICE=ORCL_STBY
     ASYNC
     VALID_FOR=(ALL_LOGFILES,PRIMARY_ROLE)
     DB_UNIQUE_NAME=ORCL_STBY
     REOPEN=300
     COMPRESSION=DISABLE'
    SCOPE=BOTH;

-- For SYNC (zero data loss): replace ASYNC with SYNC AFFIRM
-- and add NET_TIMEOUT=30 to prevent hanging the primary

Step 5: Resolve an Archive Gap

An archive gap means the standby is missing one or more archived log files that MRP needs before it can advance. The FAL (Fetch Archive Log) process resolves gaps automatically, but it can fail if FAL parameters are wrong or the primary archive area is full.

Detect Gaps

SQL -- V$ARCHIVE_GAP
SELECT
    THREAD#,
    LOW_SEQUENCE#,
    HIGH_SEQUENCE#
FROM  V$ARCHIVE_GAP;

-- Rows returned = gap exists; archivelogs must be registered before MRP can proceed

Register Missing Archivelogs Manually

SQL -- Register Archivelog
-- Copy missing archivelog from primary to standby first
-- Then register on standby:

ALTER DATABASE REGISTER LOGFILE
    '/u02/arch/1_998_1145678901.dbf';

ALTER DATABASE REGISTER LOGFILE
    '/u02/arch/1_999_1145678901.dbf';

-- Confirm:
SELECT SEQUENCE#, APPLIED, REGISTRAR
FROM  V$ARCHIVED_LOG
WHERE SEQUENCE# BETWEEN 998 AND 1000
ORDER BY SEQUENCE#;

Fix FAL Parameters (Prevents Future Gaps)

SQL -- FAL Parameters on Standby
ALTER SYSTEM SET fal_server  = 'ORCL_PRIM' SCOPE=BOTH;
ALTER SYSTEM SET fal_client  = 'ORCL_STBY' SCOPE=BOTH;

-- fal_server = TNS service name for the primary
-- fal_client = this standby's TNS service name
After a 4-hour network outage on a 6 TB standby, I had a gap of 288 archive sequences. FAL automatic recovery handled all 288 in 23 minutes once fal_server was corrected. Always fix FAL parameters first and give the FAL process 10-15 minutes before manually copying archivelogs.

Step 6: Tune Parallel Apply to Burn Down Lag Faster

Once MRP is running and SRLs are healthy, the standby may still have accumulated lag to work through. Enabling parallel apply dramatically increases recovery throughput on multi-core standby systems.

SQL -- Enable Parallel Apply
-- Stop MRP first
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

-- Set parallel apply workers (start with 2x CPU count)
ALTER SYSTEM SET recovery_parallelism = 8 SCOPE=BOTH;

-- Restart MRP with parallel option
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
    USING CURRENT LOGFILE
    PARALLEL 8
    DISCONNECT FROM SESSION;
4x
Apply throughput improvement: moving from serial to parallel=8 on an 8-core standby cleared 38 minutes of lag in 14 minutes vs. an estimated 58 minutes serial.
Caution: USING CURRENT LOGFILE is required when SRLs are present. Without it, MRP applies only archived logs and ignores the active SRL, reverting to archivelog-mode apply and reintroducing structural lag.

Post-Fix Verification Checklist

Run all of these before closing the incident ticket.

  • MRP0 STATUS = APPLYING_LOG in V$MANAGED_STANDBY with incrementing SEQUENCE#
  • Apply lag < 30 seconds in V$DATAGUARD_STATS (allow 5 min to normalize after restart)
  • All SRL groups STATUS = ACTIVE or UNASSIGNED in V$STANDBY_LOG; no INACTIVE
  • V$ARCHIVE_DEST_STATUS DEST_ID=2 STATUS = VALID with blank ERROR column
  • V$ARCHIVE_GAP returns zero rows after gap resolution
  • DGMGRL show database verbose [standby_name] shows no warnings
  • Standby alert log -- no new ORA-16xxx errors in last 5 minutes
DGMGRL -- Final Verification
DGMGRL> connect sysdg/password@ORCL_PRIM
DGMGRL> show configuration;
DGMGRL> show database verbose ORCL_STBY;

-- Target output:
-- Database Status: SUCCESS
-- Apply Lag:       0 seconds
-- Transport Lag:   0 seconds

What NOT to Do When Standby Redo Logs Are Not Applying

Do NOT kill MRP with OS-level kill -9. Killing MRP at the OS level can corrupt the standby control file's recovery position, forcing a full resync. Always use ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL.
Do NOT assume a snapshot standby conversion fixes apply issues. Converting to snapshot standby and back resets apply but does not fix the underlying cause -- SRL misconfiguration or MRP issues will recur immediately after.
Do NOT add SRL groups while MRP is actively applying. While Oracle supports it, I have seen MRP stall for up to 8 minutes while new SRL groups initialize on a busy system. Cancel MRP first, add the SRLs, then restart.
Do NOT let the FRA fill while SRLs live inside it. If SRLs reside inside the Flash Recovery Area and the FRA fills, Oracle cannot write to SRLs, which silently stops RFS and apply. Always size FRA at 3x daily redo volume.

Frequently Asked Questions

Why are standby redo logs not applying in Oracle Data Guard?+
Standby redo logs not applying in Oracle Data Guard is most often caused by one of four issues: the MRP process is stopped or was never started, the Standby Redo Log (SRL) groups are missing or misconfigured, a LOG_ARCHIVE_DEST_2 parameter has an incorrect VALID_FOR clause, or there is a redo sequence gap. Run SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY WHERE PROCESS='MRP0' as your first diagnostic step.
How do I check if standby redo logs are applying in Oracle Data Guard?+
Run SELECT PROCESS, STATUS, SEQUENCE# FROM V$MANAGED_STANDBY on the standby. Look for MRP0 with STATUS = APPLYING_LOG and an incrementing SEQUENCE#. Also check V$DATAGUARD_STATS WHERE NAME='apply lag' to confirm lag is near zero. If MRP0 is absent or shows WAIT_FOR_LOG, apply has stalled and you need to investigate SRL and gap status.
How many standby redo log groups should I create?+
Oracle recommends at least one more SRL group than the number of online redo log groups per thread on the primary. For a single-instance primary with 3 online redo log groups, create 4 SRL groups on the standby. For a 2-node RAC primary with 3 groups per thread, create at least 7 SRL groups total. Each SRL member must be at least as large as the largest online redo log member, with 10% extra space recommended.
What does ORA-16766 mean in a Data Guard environment?+
ORA-16766 means the redo apply service is disabled on the standby database. This fires when MRP is stopped or crashed unexpectedly. The fix is to confirm SRL groups exist at the correct size, then issue ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION. Monitor V$MANAGED_STANDBY until MRP0 shows STATUS = APPLYING_LOG. If MRP keeps crashing, check the standby alert log for underlying ORA-600 or ORA-10567 errors.
Can I fix Data Guard apply lag without a full resync?+
Yes. For most apply-lag incidents caused by MRP outages or SRL misconfiguration, a full resync is not required. Restart MRP, verify SRL status in V$STANDBY_LOG, and check for archive gaps using V$ARCHIVE_GAP. The FAL process automatically fetches missing archivelogs once its parameters are correct. A full resync is only necessary when the standby SCN has diverged from the primary, which typically occurs only after an incomplete recovery or an improperly reversed switchover.

Want the Full Data Guard Troubleshooting Runbook?

Comment DG below and I will send you the production-ready DBA PDF covering SRL, MRP, transport gaps, switchover, and DGMGRL commands -- all in one sheet.

Watch on YouTube Connect on LinkedIn
CY

Chetan Yadav

Senior Oracle and Cloud DBA | Oracle ACE Apprentice | 15+ Years Production Experience

I manage large-scale Oracle production environments and share real incidents, fixes, and career guidance through LevelUp Careers. Content covers Oracle 19c/21c/23ai, Data Guard, RAC, performance tuning, and cloud DBA careers.

No comments:

Post a Comment