Standby Redo Logs Not Applying in Oracle Data Guard: Complete Fix Guide
MRP process troubleshooting, SRL configuration, apply lag resolution, and parallel apply tuning from 15 years of production Oracle environments.
It was 2:41 AM. PagerDuty fires. The on-call message reads: "Data Guard apply lag 38 minutes and climbing." I SSH into the standby. ORA-16766 stares back at me from the alert log. A quick check of V$MANAGED_STANDBY confirms it -- MRP0 is gone. No standby redo logs are applying. The business had a 4-hour RPO commitment. We had maybe 90 minutes before the DBA team had a very uncomfortable conversation with the CTO.
Standby redo logs not applying in Oracle Data Guard is one of the highest-stress incidents a production DBA faces. It is also one of the most fixable -- if you know the exact diagnostic tree. In this post I am going to walk through every root cause I have encountered across 15 years of Oracle production work, the precise SQL to diagnose each one, and the fix you run to get apply moving again.
SRL Architecture: Why Standby Redo Logs Matter
Before you can fix standby redo logs not applying, you need to understand exactly where they sit in the Data Guard pipeline. The Standby Redo Log (SRL) is the buffer between the Remote File Server (RFS) process -- which receives redo from the primary -- and the Managed Recovery Process (MRP), which applies that redo to standby datafiles.
Without SRLs, Oracle falls back to archivelog-based apply, which introduces a structural lag equal to the time between log switches on the primary. On a busy 12,000 TPS system, that lag can balloon to 20+ minutes in under an hour. With SRLs properly configured, redo flows continuously and apply lag stays under 30 seconds.
Figure 1: SRL pipeline from primary Online Redo Logs through RFS to MRP apply engine. See the separately delivered architecture diagram PNG for the detailed flow.
(3 x 2) + 1 = 7 SRL groups.
Step 1: Diagnose the Apply Engine
Every standby redo logs not applying incident starts with the same three queries. Run these on the standby database first.
Query 1: MRP and RFS Process Status
SELECT PROCESS, STATUS, CLIENT_PROCESS, SEQUENCE#, BLOCK#, BLOCKS FROM V$MANAGED_STANDBY WHERE PROCESS IN ('MRP0', 'RFS') ORDER BY PROCESS;
Interpret the output:
MRP0 / APPLYING_LOG-- healthy, redo is being appliedMRP0 / WAIT_FOR_LOG-- MRP running but waiting; check for sequence gap- MRP0 absent from result -- MRP not started; most common cause of standby redo logs not applying
RFS / IDLE-- transport up but no current redo flowing; check primary- RFS absent -- redo transport broken; check
LOG_ARCHIVE_DEST_2
Query 2: Standby Redo Log Status
SELECT GROUP#, THREAD#, SEQUENCE#, BYTES / 1048576 AS SIZE_MB, STATUS, ARCHIVED, FIRST_CHANGE# FROM V$STANDBY_LOG ORDER BY GROUP#;
A healthy result shows STATUS = ACTIVE or UNASSIGNED with at least one ACTIVE group per thread. If the query returns zero rows, SRL groups do not exist at all -- jump to Step 2 immediately.
Query 3: Apply Lag and Transport Lag
SELECT NAME, VALUE, UNIT, TIME_COMPUTED FROM V$DATAGUARD_STATS WHERE NAME IN ('apply lag', 'transport lag', 'apply finish time') ORDER BY NAME;
apply lag = +00 00:38:14, transport lag = +00 00:00:02. Transport lag near zero tells me redo is arriving fine -- the problem is 100% in the apply layer. MRP was stopped by a nightly backup script that ran CANCEL but failed to restart MRP after an error.
Query 4: Archive Destination Status
SELECT DEST_ID, DEST_NAME, STATUS, TARGET, ARCHIVER, SCHEDULE, VALID_NOW, ERROR FROM V$ARCHIVE_DEST_STATUS WHERE STATUS != 'INACTIVE' ORDER BY DEST_ID;
STATUS should be VALID. Any value in the ERROR column is your root cause signal. Common: ORA-16058 (standby not mounted), ORA-12541 (TNS no listener), ORA-16057 (standby not in correct state).
Step 2: Create or Fix Standby Redo Logs
Missing or undersized SRL groups cause standby redo logs not applying in roughly 35% of incidents I have handled. This typically happens after a redo log resize on the primary where the standby was not updated to match.
Check Current Online Redo Log Size (Run on Primary)
SELECT GROUP#, THREAD#, BYTES / 1048576 AS SIZE_MB, MEMBERS, STATUS FROM V$LOG ORDER BY THREAD#, GROUP#;
Add SRL Groups on the Standby
Target size = primary online redo log size + 10%. For a primary with 50 MB log groups, create 55 MB SRL groups. Run on the standby as SYSDBA:
-- One SRL group per command; repeat until count = online_groups + 1 -- Adjust paths to match your ASM diskgroup or filesystem ALTER DATABASE ADD STANDBY LOGFILE GROUP 4 ('/u01/oradata/ORCL_STBY/srl_4.log') SIZE 55M; ALTER DATABASE ADD STANDBY LOGFILE GROUP 5 ('/u01/oradata/ORCL_STBY/srl_5.log') SIZE 55M; ALTER DATABASE ADD STANDBY LOGFILE GROUP 6 ('/u01/oradata/ORCL_STBY/srl_6.log') SIZE 55M; -- ASM example: -- ALTER DATABASE ADD STANDBY LOGFILE GROUP 4 ('+RECO') SIZE 55M;
Drop and Re-create Undersized SRL Groups
-- Only drop INACTIVE / UNASSIGNED groups; never drop ACTIVE groups SELECT GROUP#, STATUS FROM V$STANDBY_LOG; ALTER DATABASE DROP STANDBY LOGFILE GROUP 4; ALTER DATABASE DROP STANDBY LOGFILE GROUP 5; ALTER DATABASE ADD STANDBY LOGFILE GROUP 4 ('/u01/oradata/ORCL_STBY/srl_4.log') SIZE 55M; ALTER DATABASE ADD STANDBY LOGFILE GROUP 5 ('/u01/oradata/ORCL_STBY/srl_5.log') SIZE 55M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 GROUP 10 .... Without per-thread SRLs, Thread 2 redo transport has no SRL to write into.
Step 3: Start or Restart the MRP Process
The MRP0 process is the engine that applies redo to standby datafiles. It must be explicitly started unless the standby uses broker with fast-start failover configured.
Start MRP in Background (Recommended)
-- DISCONNECT FROM SESSION keeps MRP running after you log off -- Run on standby as SYSDBA ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
Stop MRP Cleanly Before Restarting
-- Use CANCEL to stop; never kill MRP with OS kill -9 ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL; -- Wait 30 seconds; confirm MRP0 is gone: SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY WHERE PROCESS = 'MRP0'; -- Restart: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
Verify MRP Started Correctly
SELECT PROCESS, STATUS, SEQUENCE#, BLOCK#, DELAY_MINS FROM V$MANAGED_STANDBY ORDER BY PROCESS; -- Expected: MRP0 STATUS=APPLYING_LOG, SEQUENCE# incrementing
Step 4: Fix LOG_ARCHIVE_DEST Configuration
A misconfigured LOG_ARCHIVE_DEST_2 on the primary prevents redo from reaching the standby entirely. The most common error I see is an incorrect VALID_FOR clause that excludes the current redo stream.
Check Current Configuration (Primary)
SHOW PARAMETER log_archive_dest_2; SELECT DEST_ID, STATUS, TARGET, ARCHIVER, VALID_NOW, ERROR FROM V$ARCHIVE_DEST_STATUS WHERE DEST_ID = 2;
Wrong Configuration
VALID_FOR=(ONLINE_LOGFILES, ALL_ROLES)VALID_FOR=(ALL_LOGFILES, STANDBY_ROLE)excludes primary role- No
DB_UNIQUE_NAMEwhen broker is in use REOPEN=0disables retry on transport failure
Correct Configuration
VALID_FOR=(ALL_LOGFILES, PRIMARY_ROLE)DB_UNIQUE_NAME=set to standby unique nameREOPEN=300retries every 5 minutes on failureSERVICE=matches TNS alias for standby
-- Run on PRIMARY as SYSDBA -- Adjust SERVICE and DB_UNIQUE_NAME to match your environment ALTER SYSTEM SET log_archive_dest_2 = 'SERVICE=ORCL_STBY ASYNC VALID_FOR=(ALL_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=ORCL_STBY REOPEN=300 COMPRESSION=DISABLE' SCOPE=BOTH; -- For SYNC (zero data loss): replace ASYNC with SYNC AFFIRM -- and add NET_TIMEOUT=30 to prevent hanging the primary
Step 5: Resolve an Archive Gap
An archive gap means the standby is missing one or more archived log files that MRP needs before it can advance. The FAL (Fetch Archive Log) process resolves gaps automatically, but it can fail if FAL parameters are wrong or the primary archive area is full.
Detect Gaps
SELECT THREAD#, LOW_SEQUENCE#, HIGH_SEQUENCE# FROM V$ARCHIVE_GAP; -- Rows returned = gap exists; archivelogs must be registered before MRP can proceed
Register Missing Archivelogs Manually
-- Copy missing archivelog from primary to standby first -- Then register on standby: ALTER DATABASE REGISTER LOGFILE '/u02/arch/1_998_1145678901.dbf'; ALTER DATABASE REGISTER LOGFILE '/u02/arch/1_999_1145678901.dbf'; -- Confirm: SELECT SEQUENCE#, APPLIED, REGISTRAR FROM V$ARCHIVED_LOG WHERE SEQUENCE# BETWEEN 998 AND 1000 ORDER BY SEQUENCE#;
Fix FAL Parameters (Prevents Future Gaps)
ALTER SYSTEM SET fal_server = 'ORCL_PRIM' SCOPE=BOTH; ALTER SYSTEM SET fal_client = 'ORCL_STBY' SCOPE=BOTH; -- fal_server = TNS service name for the primary -- fal_client = this standby's TNS service name
Step 6: Tune Parallel Apply to Burn Down Lag Faster
Once MRP is running and SRLs are healthy, the standby may still have accumulated lag to work through. Enabling parallel apply dramatically increases recovery throughput on multi-core standby systems.
-- Stop MRP first ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL; -- Set parallel apply workers (start with 2x CPU count) ALTER SYSTEM SET recovery_parallelism = 8 SCOPE=BOTH; -- Restart MRP with parallel option ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE PARALLEL 8 DISCONNECT FROM SESSION;
USING CURRENT LOGFILE is required when SRLs are present. Without it, MRP applies only archived logs and ignores the active SRL, reverting to archivelog-mode apply and reintroducing structural lag.
Post-Fix Verification Checklist
Run all of these before closing the incident ticket.
- MRP0 STATUS = APPLYING_LOG in V$MANAGED_STANDBY with incrementing SEQUENCE#
- Apply lag < 30 seconds in V$DATAGUARD_STATS (allow 5 min to normalize after restart)
- All SRL groups STATUS = ACTIVE or UNASSIGNED in V$STANDBY_LOG; no INACTIVE
- V$ARCHIVE_DEST_STATUS DEST_ID=2 STATUS = VALID with blank ERROR column
- V$ARCHIVE_GAP returns zero rows after gap resolution
- DGMGRL show database verbose [standby_name] shows no warnings
- Standby alert log -- no new ORA-16xxx errors in last 5 minutes
DGMGRL> connect sysdg/password@ORCL_PRIM DGMGRL> show configuration; DGMGRL> show database verbose ORCL_STBY; -- Target output: -- Database Status: SUCCESS -- Apply Lag: 0 seconds -- Transport Lag: 0 seconds
What NOT to Do When Standby Redo Logs Are Not Applying
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL.
Frequently Asked Questions
SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY WHERE PROCESS='MRP0' as your first diagnostic step.SELECT PROCESS, STATUS, SEQUENCE# FROM V$MANAGED_STANDBY on the standby. Look for MRP0 with STATUS = APPLYING_LOG and an incrementing SEQUENCE#. Also check V$DATAGUARD_STATS WHERE NAME='apply lag' to confirm lag is near zero. If MRP0 is absent or shows WAIT_FOR_LOG, apply has stalled and you need to investigate SRL and gap status.ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION. Monitor V$MANAGED_STANDBY until MRP0 shows STATUS = APPLYING_LOG. If MRP keeps crashing, check the standby alert log for underlying ORA-600 or ORA-10567 errors.Want the Full Data Guard Troubleshooting Runbook?
Comment DG below and I will send you the production-ready DBA PDF covering SRL, MRP, transport gaps, switchover, and DGMGRL commands -- all in one sheet.
Watch on YouTube Connect on LinkedIn
No comments:
Post a Comment