Chetan Yadav: Why Data Guard Lag Happens in Production: Sync, I/O and Network Deep Dive

Why Data Guard Lag Happens in Production: Sync, I/O and Network Deep Dive

Six Root Causes of Transport and Apply Lag , With Diagnostic SQL to Prove Each One

30 March 2026

Chetan Yadav , Senior Oracle & Cloud DBA

⏱️ 14–16 min read

⏱️ Estimated Reading Time: 14–16 minutes

Transport Lag • Apply Lag • SYNC vs ASYNC • Network RTT • Standby I/O • MRP Apply Bottleneck

Oracle Data Guard lag root cause architecture map showing 6 production causes across Primary Network and Standby layers

⚙️ Production Environment Referenced in This Article

Oracle Database: 19.18.0.0.0 Enterprise Edition • Primary: 2-Node RAC, 4.8 TB OLTP • Standby: Physical Standby (Active Data Guard)
Protection Mode: Maximum Availability (SYNC/AFFIRM) • Network: Dedicated 1 GbE WAN (120 km distance, RTT 1.8 ms)
Peak Load: 2,800 TPS, 180 MB/sec redo generation • Application: Core banking transaction processing

The alert arrives at 11:43 PM: "Data Guard apply lag exceeds 900 seconds." The DBA on call opens the monitoring dashboard. Transport lag is 180 seconds. Apply lag is 900 seconds. The standby is 15 minutes behind the primary. If the primary fails right now, 15 minutes of financial transactions could be at risk.

This scenario plays out in production Data Guard environments more often than most teams admit. Lag is not a single problem , it is six different problems that look identical from the outside. Transport lag and apply lag each have completely different root causes, different diagnostic queries, and completely different fixes. Treating them the same wastes hours of investigation time.

This guide covers every real cause of Data Guard lag I have diagnosed in production, the exact SQL to prove which one you are dealing with, and the specific fix for each. No guesswork. No generic advice about "check your network." Precise diagnosis first, then precise resolution.

 Table of Contents

Transport Lag vs Apply Lag: Understanding the Difference
The Diagnostic Starting Point: Five Queries to Run First
Cause 1: Primary Storage I/O Pressure
Cause 2: Redo Volume Spike (Batch Jobs)
Cause 3: Network Latency and Bandwidth
Cause 4: Misconfiguration (SRL, NET_TIMEOUT, AFFIRM)
Cause 5: Standby Apply Bottleneck
Cause 6: Standby Resource Starvation
SYNC vs ASYNC: Which Mode Causes Which Lag Pattern
Lag Under Active Data Guard (Read-Only Standby)
Building a Lag Monitoring Alert
FAQ
Related Reading from Real Production Systems

1. Transport Lag vs Apply Lag: Understanding the Difference

Before diagnosing lag, you must understand which type you have. They look the same on a monitoring dashboard but live in completely different parts of the Data Guard pipeline.

Lag Type	Definition	Lives In	Primary Cause
Transport Lag	Redo generated on primary but not yet received by standby	Network pipe between primary and standby	Network, primary I/O, redo volume
Apply Lag	Redo received by standby but not yet applied to datafiles	Standby MRP apply process	Standby I/O, CPU, misconfiguration

Critical Distinction:

If transport lag is high but apply lag is low , your network or primary is the problem. The standby is applying redo as fast as it receives it; it just isn't receiving it fast enough.

If transport lag is low but apply lag is high , your standby is the problem. Redo is arriving quickly but MRP cannot apply it fast enough.

If both are high , you have multiple problems or a severe redo volume spike that is overwhelming the entire pipeline.

2. The Diagnostic Starting Point: Five Queries to Run First

Run these five queries immediately when lag is reported. They tell you which cause category you are dealing with before you go deeper.

SQL , Five Immediate Diagnostic Queries for Data Guard Lag

-- Query 1: Transport lag vs apply lag (run on PRIMARY)
SELECT name,
       value,
       unit,
       time_computed
FROM   v$dataguard_stats
WHERE  name IN (
    'transport lag',
    'apply lag',
    'apply finish time',
    'estimated startup time'
)
ORDER  BY name;

-- Query 2: Current redo generation rate (run on PRIMARY)
-- High rate = redo volume problem
SELECT inst_id,
       ROUND(value / 1024 / 1024, 2)  AS redo_mb_per_sec
FROM   gv$sysstat
WHERE  name = 'redo size'
AND    inst_id IN (SELECT inst_id FROM gv$instance);

-- Query 3: Archive dest status and error (run on PRIMARY)
SELECT dest_id,
       dest_name,
       status,
       target,
       archiver,
       schedule,
       destination,
       error,
       applied_scn,
       gap_status
FROM   v$archive_dest_status
WHERE  status != 'INACTIVE'
ORDER  BY dest_id;

-- Query 4: MRP apply process status (run on STANDBY)
SELECT process,
       status,
       thread#,
       sequence#,
       block#,
       blocks,
       delay_mins
FROM   v$managed_standby
ORDER  BY process;

-- Query 5: Standby redo log receive vs apply position (run on STANDBY)
SELECT thread#,
       low_sequence#,
       high_sequence#,
       applied
FROM   v$archived_log
WHERE  standby_dest = 'YES'
ORDER  BY thread#, high_sequence# DESC
FETCH FIRST 10 ROWS ONLY;

3. Cause 1: Primary Storage I/O Pressure

Cause 1 LGWR cannot write redo to local online redo logs fast enough. Every redo write is now slower than normal. In SYNC mode this directly delays the network send to the standby. In ASYNC mode it slows redo generation which eventually causes the NSA buffer to fill up.

How to Confirm It

SQL , Diagnose Primary Storage I/O as Lag Root Cause

-- Check LGWR wait events (run on PRIMARY)
-- High log file sync or log file parallel write = LGWR I/O problem
SELECT event,
       total_waits,
       time_waited,
       ROUND(average_wait, 3)   AS avg_wait_ms,
       wait_class
FROM   v$system_event
WHERE  event IN (
    'log file sync',
    'log file parallel write',
    'log buffer space'
)
ORDER  BY time_waited DESC;

-- Check datafile I/O latency (high write latency = DBWR pressure on LGWR)
SELECT name,
       phyrds,
       phywrts,
       ROUND(readtim  / NULLIF(phyrds,  0), 2)  AS avg_read_ms,
       ROUND(writetim / NULLIF(phywrts, 0), 2)  AS avg_write_ms
FROM   v$filestat f
JOIN   v$datafile d ON f.file# = d.file#
WHERE  phywrts > 1000
ORDER  BY avg_write_ms DESC
FETCH FIRST 10 ROWS ONLY;

-- Check ASM rebalance activity (can throttle I/O)
SELECT group_number,
       operation,
       state,
       power,
       actual,
       sofar,
       est_minutes
FROM   v$asm_operation;

Fix Separate redo log files onto dedicated fast storage (NVMe or dedicated ASM disk group). Increase redo log size to reduce switch frequency. Schedule ASM rebalance outside peak hours. If on Exadata, verify Smart Flash Cache is handling redo log writes.

4. Cause 2: Redo Volume Spike

Cause 2 The primary is generating far more redo than normal. A batch job, bulk load, or mass update suddenly pushes 10–50x the normal redo rate. The network pipe cannot keep up. Transport lag grows rapidly. This is the most common cause of sudden lag spikes in production OLTP systems with nightly batch processing.

SQL , Identify Redo Volume Spike and Source SQL

-- Check redo generation rate per instance (run on PRIMARY)
SELECT inst_id,
       ROUND(value / 1024 / 1024, 2)  AS redo_mb_total
FROM   gv$sysstat
WHERE  name = 'redo size'
ORDER  BY inst_id;

-- Find top SQL generating most redo (run on PRIMARY)
SELECT sql_id,
       executions,
       ROUND(sorts * 1024 / 1024, 2)        AS sorts_mb,
       ROUND(buffer_gets / 1024.0, 0)       AS buffer_gets_k,
       ROUND(direct_writes / 1024.0, 0)     AS direct_writes_k,
       last_active_time
FROM   v$sql
WHERE  executions > 0
ORDER  BY sorts DESC
FETCH FIRST 20 ROWS ONLY;

-- Check archive log generation rate (run on PRIMARY)
-- Sudden spike in log count = redo volume surge
SELECT TO_CHAR(first_time, 'YYYY-MM-DD HH24')  AS hour,
       COUNT(*)                                  AS logs_generated,
       ROUND(SUM(blocks * block_size)/1024/1024) AS total_mb
FROM   v$archived_log
WHERE  first_time > SYSDATE - 1
AND    standby_dest = 'NO'
GROUP  BY TO_CHAR(first_time, 'YYYY-MM-DD HH24')
ORDER  BY hour DESC;

Fix Use NOLOGGING or APPEND hints for bulk data loads where tolerable. Schedule heavy batch jobs during off-peak hours. Implement redo log compression on the archive destination (COMPRESSION=ENABLE in LOG_ARCHIVE_DEST). For ASYNC mode consider increasing async_blocks parameter.

5. Cause 3: Network Latency and Bandwidth

Cause 3 This is the most commonly blamed cause , and the most commonly misdiagnosed. Network problems cause transport lag. They do not cause apply lag. If apply lag is high and transport lag is low, stop looking at the network.

Network Metrics That Matter for Data Guard

Metric	Good Value	Problem Threshold	Impact
Round-trip latency (RTT)	< 2 ms	> 5 ms for SYNC	Every commit waits RTT in SYNC mode
Bandwidth utilisation	< 60%	> 80% sustained	Redo bursts cause queuing delay
Packet loss	0%	> 0.01%	TCP retransmit multiplies latency
Jitter (latency variance)	< 1 ms	> 5 ms	Unpredictable LGWR stalls in SYNC

SQL , Diagnose Network as Data Guard Lag Cause

-- Check redo transport network statistics (run on PRIMARY)
SELECT dest_id,
       srl_network_writes,
       srl_network_write_time,
       ROUND(srl_network_write_time /
             NULLIF(srl_network_writes, 0), 3)  AS avg_network_write_ms
FROM   v$standby_log_stats
ORDER  BY dest_id;

-- Check NSA (async) or NSS (sync) send process (run on PRIMARY)
SELECT process,
       status,
       client_process,
       sequence#,
       block#,
       blocks,
       delay_mins
FROM   v$managed_standby
WHERE  process LIKE 'NS%'
ORDER  BY process;

-- Check archive destination error status
SELECT dest_id,
       status,
       error,
       gap_status,
       db_unique_name
FROM   v$archive_dest_status
WHERE  target = 'STANDBY'
AND    status != 'INACTIVE';

-- OS-level network check (run from primary server shell)
-- ping -c 100 standby_host | tail -5
-- traceroute -n standby_host
-- iperf3 -c standby_host -t 30 -P 4   (bandwidth test)

Fix Enable redo log compression: add COMPRESSION=ENABLE to LOG_ARCHIVE_DEST_2. Implement QoS tagging on redo transport traffic on your WAN appliance. Increase NET_TIMEOUT if packet loss causes premature timeouts. Switch from SYNC to ASYNC if RTT exceeds 5 ms and strict zero data loss is not mandatory. Use dedicated WAN link for redo transport , never share with application traffic.

Incident: Firewall Stateful Inspection Causing Transport Lag

A Data Guard environment was configured correctly with 1.8 ms RTT on the dedicated WAN link. Transport lag would spike to 120+ seconds randomly, then clear, then spike again. Network team confirmed zero packet loss. Root cause: a next-gen firewall performing deep packet inspection on the redo transport TCP stream. Each 10-second idle gap caused the firewall to drop and re-establish the TCP state table entry, causing a 15–30 second reconnect delay on the NSS process.

Fix: Added a firewall bypass rule for the specific TCP port used by Data Guard redo transport (LOG_ARCHIVE_DEST SERVICE port). Lag spikes disappeared immediately.

6. Cause 4: Misconfiguration

Cause 4 Configuration errors are silent lag causes. Everything looks operationally normal , no errors in alert log, no network problems, reasonable redo rate. But lag slowly accumulates because something is configured wrong. These are the hardest to find because they require knowing what correct configuration looks like.

SQL , Verify Data Guard Configuration for Common Misconfigurations

-- Check 1: Standby Redo Logs exist and are correct size
-- SRL must be same size as online redo logs
-- Must have at least (online_log_groups + 1) SRL per thread
SELECT group#,
       thread#,
       sequence#,
       bytes/1024/1024  AS size_mb,
       archived,
       status
FROM   v$standby_log
ORDER  BY thread#, group#;

-- Compare SRL size vs online redo log size (run on PRIMARY)
-- These must match
SELECT 'online' AS log_type,
       group#,
       bytes/1024/1024  AS size_mb
FROM   v$log
UNION ALL
SELECT 'standby',
       group#,
       bytes/1024/1024
FROM   v$standby_log
ORDER  BY log_type, group#;

-- Check 2: AFFIRM setting matches protection mode intention
SELECT dest_id,
       dest_name,
       affirm,
       async_blocks,
       net_timeout,
       delay,
       compression,
       status
FROM   v$archive_dest
WHERE  target = 'STANDBY'
AND    status = 'VALID';

-- Check 3: Real-time apply is running (not archive-based apply)
-- Must show WAIT_FOR_LOG not WAIT_FOR_GAP
SELECT process,
       status,
       sequence#
FROM   v$managed_standby
WHERE  process = 'MRP0';

-- Check 4: Protection mode vs protection level match
-- If they differ, primary has fallen back silently
SELECT protection_mode,    -- configured
       protection_level    -- actual effective
FROM   v$database;

Incident: Missing Standby Redo Logs , 8-Hour Apply Gap

A newly built Data Guard environment appeared healthy. Alert log clean. No errors. Archive destination showed SUCCESS. But during failover testing, the standby was 8 hours behind the primary despite a low-latency WAN link.

Root cause: Standby Redo Logs had never been created on the standby. Without SRLs, real-time apply cannot function. The MRP process was waiting for archived logs to be shipped rather than receiving redo in real time. The standby was only catching up when logs were archived , once per hour by default.

Fix: Created SRL groups matching primary online redo log size and count + 1. Real-time apply started immediately. Apply lag dropped from 8 hours to under 60 seconds within minutes.

7. Cause 5: Standby Apply Bottleneck

Cause 5 Redo is arriving at the standby on time (low transport lag) but MRP cannot apply it fast enough. Apply lag grows. This is purely a standby-side problem , do not touch the primary or network until you have confirmed this is the cause.

SQL , Diagnose and Fix MRP Apply Bottleneck

-- Check MRP apply rate and wait events (run on STANDBY)
SELECT process,
       status,
       thread#,
       sequence#,
       block#,
       blocks
FROM   v$managed_standby
WHERE  process LIKE 'MRP%'
OR     process = 'RFS';

-- Check what MRP is waiting for (run on STANDBY)
SELECT event,
       total_waits,
       ROUND(average_wait, 3)  AS avg_wait_ms,
       wait_class
FROM   v$system_event
WHERE  wait_class NOT IN ('Idle', 'Network')
ORDER  BY time_waited DESC
FETCH FIRST 15 ROWS ONLY;

-- Check parallel apply worker count (run on STANDBY)
-- If only MRP0 shown, parallel apply is disabled
SELECT process,
       status,
       thread#
FROM   v$managed_standby
WHERE  process LIKE 'PR%'    -- parallel recovery slaves
OR     process LIKE 'MRP%'
ORDER  BY process;

-- Enable parallel apply (run on STANDBY , requires brief MRP restart)
-- Stop current apply
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

-- Restart with parallel apply (match to standby CPU count)
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
    PARALLEL 8
    USING CURRENT LOGFILE
    DISCONNECT FROM SESSION;

-- Verify parallel workers started
SELECT process, status
FROM   v$managed_standby
WHERE  process LIKE 'PR%' OR process LIKE 'MRP%'
ORDER  BY process;

Fix Enable parallel apply with a degree matching standby CPU count. Ensure standby has dedicated I/O for datafile writes , separate from RMAN backup disk. If Active Data Guard is enabled, read queries compete with MRP for I/O , use I/O Resource Manager to prioritise MRP. Check standby UNDO tablespace for contention if apply waits show undo-related events.

8. Cause 6: Standby Resource Starvation

Cause 6 The standby server does not have enough CPU, memory, or I/O bandwidth to keep up with apply. This is frequently caused by under-specifying the standby relative to the primary, or by running competing workloads (RMAN backup, ADG reads, OS processes) on the same standby host.

SQL , Diagnose Standby Resource Starvation

-- Check CPU and I/O pressure on standby (run on STANDBY)
SELECT stat_name,
       value
FROM   v$osstat
WHERE  stat_name IN (
    'NUM_CPUS',
    'LOAD',
    'BUSY_TIME',
    'IDLE_TIME',
    'PHYSICAL_MEMORY_BYTES'
);

-- Check if RMAN backup is running during peak apply (run on STANDBY)
SELECT session_key,
       input_type,
       status,
       TO_CHAR(start_time, 'YYYY-MM-DD HH24:MI')  AS start_time,
       input_bytes_display,
       output_bytes_display
FROM   v$rman_backup_job_details
WHERE  start_time > SYSDATE - 1
ORDER  BY start_time DESC;

-- Check I/O contention on standby datafiles (run on STANDBY)
SELECT name,
       ROUND(readtim  / NULLIF(phyrds,  0), 2)  AS avg_read_ms,
       ROUND(writetim / NULLIF(phywrts, 0), 2)  AS avg_write_ms,
       phyrds,
       phywrts
FROM   v$filestat f
JOIN   v$datafile d ON f.file# = d.file#
WHERE  (phyrds + phywrts) > 1000
ORDER  BY avg_write_ms DESC
FETCH FIRST 10 ROWS ONLY;

-- Use I/O Resource Manager to protect MRP (run on STANDBY)
-- Create plan that gives MRP highest I/O priority
BEGIN
    DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA();
    DBMS_RESOURCE_MANAGER.CREATE_PLAN(
        plan    => 'STANDBY_MGMT_PLAN',
        comment => 'Protect MRP apply from ADG read workload'
    );
    DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
        plan                  => 'STANDBY_MGMT_PLAN',
        group_or_subplan      => '_ORACLE_BACKGROUND_GROUP_',
        comment               => 'MRP background priority',
        mgmt_p1               => 80
    );
    DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
END;
/

9. SYNC vs ASYNC: Which Mode Causes Which Lag Pattern

Scenario	SYNC Mode Behaviour	ASYNC Mode Behaviour
Network spike (100ms RTT)	Every commit stalls 100ms , performance crisis on primary	Transport lag grows silently , primary unaffected
Standby reboots	Primary stalls (Max Protection) or falls back to async (Max Availability)	Transport lag grows until standby returns; no primary impact
Redo volume burst	Network saturates; commits queue behind LGWR	NSA buffer fills; lag grows but primary performance maintained
Steady state (good network)	Transport lag near zero; apply lag depends on MRP	Small transport lag always present (1–5 sec typical)
Network intermittent loss	LGWR stalls on each retransmit; very visible on primary	Lag spikes during loss window; primary unaffected

10. Lag Under Active Data Guard (Read-Only Standby)

Active Data Guard (ADG) introduces an additional lag risk: read queries on the standby compete with MRP for I/O bandwidth. In a busy ADG reporting workload, MRP can be starved of the I/O it needs to apply redo quickly.

SQL , Monitor ADG Read vs MRP Apply I/O Competition

-- Check open mode on standby (ADG = READ ONLY WITH APPLY)
SELECT open_mode,
       database_role,
       db_unique_name
FROM   v$database;

-- Monitor read sessions vs MRP on standby
SELECT s.type,
       s.status,
       COUNT(*)  AS session_count
FROM   v$session s
WHERE  s.type = 'USER'
OR     s.program LIKE '%MRP%'
GROUP  BY s.type, s.status;

-- Check I/O split between read sessions and MRP (run on STANDBY)
SELECT n.name,
       s.value
FROM   v$sesstat  s
JOIN   v$statname n ON s.statistic# = n.statistic#
JOIN   v$session  ss ON s.sid = ss.sid
WHERE  ss.program LIKE '%MRP%'
AND    n.name IN (
    'physical reads',
    'physical writes',
    'redo size'
);

-- Temporarily pause ADG reads to let MRP catch up (emergency use)
-- This closes standby to read sessions until apply lag clears
-- ALTER DATABASE CLOSE;
-- ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
--     USING CURRENT LOGFILE DISCONNECT;
-- Then re-open: ALTER DATABASE OPEN READ ONLY;

11. Building a Lag Monitoring Alert

Do not wait for a pager alert at 11 PM. Build proactive lag monitoring that catches problems before they become incidents.

SQL , Data Guard Lag Monitoring Query for Alert System

-- Comprehensive lag monitoring query (run on PRIMARY every 5 minutes)
-- Alerts when transport lag > 30s or apply lag > 120s
SELECT db_unique_name,
       name                                       AS metric,
       value                                      AS lag_value,
       CASE
           WHEN name = 'transport lag'
                AND TO_DSINTERVAL(value) > INTERVAL '0 00:00:30' HOUR TO SECOND
           THEN 'ALERT: Transport lag exceeds 30 seconds'
           WHEN name = 'apply lag'
                AND TO_DSINTERVAL(value) > INTERVAL '0 00:02:00' HOUR TO SECOND
           THEN 'ALERT: Apply lag exceeds 2 minutes'
           ELSE 'OK'
       END                                        AS alert_status,
       time_computed
FROM   v$dataguard_stats
WHERE  name IN ('transport lag', 'apply lag')
ORDER  BY name;

-- Archive gap detection (run on PRIMARY)
-- Any rows = sequence gap = serious problem
SELECT thread#,
       low_sequence#,
       high_sequence#
FROM   v$archive_gap
ORDER  BY thread#;

-- Full Data Guard health check (run on PRIMARY via DGMGRL)
-- dgmgrl / "show configuration"
-- dgmgrl / "validate database verbose standby_db"

12. FAQ

My transport lag shows +00 00:00:00 but apply lag is 2 hours. What is happening?

Transport lag of zero means redo is arriving at the standby immediately , the network and primary are fine. Apply lag of 2 hours means MRP is 2 hours behind in applying what it has already received. This is a pure standby-side problem: check MRP parallel degree, standby I/O performance, RMAN backup scheduling, and whether ADG read workload is starving MRP. Do not waste time looking at the network or primary for this pattern.

Should I increase redo log size to reduce transport lag?

Increasing redo log size reduces log switch frequency, which smooths out redo transport by reducing the burst pattern of frequent log switches. However it does not reduce the total volume of redo being generated , so if the problem is sustained bandwidth saturation, larger logs will not help. Larger logs also mean a longer checkpoint interval, which can increase crash recovery time. The sweet spot for most production systems is 500 MB to 2 GB per redo log group, switched every 10–30 minutes under normal load.

Can Data Guard lag cause data corruption?

Lag itself does not cause corruption , it causes data loss risk if the primary fails while the standby is behind. Corruption is a different failure mode (block corruption on primary or standby) which Data Guard can actually detect and prevent using the DB_BLOCK_CHECKING and DB_LOST_WRITE_PROTECT parameters. Enable DB_LOST_WRITE_PROTECT=TYPICAL on both primary and standby to detect lost writes that could cause divergence between the two databases.

How do I reduce lag without changing from SYNC to ASYNC mode?

Several options that preserve SYNC transport: (1) Enable redo log compression (COMPRESSION=ENABLE) to reduce bandwidth by 50–70% at the cost of CPU. (2) Upgrade network bandwidth , going from 1 GbE to 10 GbE typically resolves SYNC lag caused by saturation. (3) Increase redo log group count so LGWR can switch to a new group while NSS sends the previous one. (4) Ensure the standby uses real-time apply (SRLs configured) so MRP stays close to arrival. (5) Schedule batch redo-heavy jobs during off-peak to avoid bursting the SYNC transport pipe.

About the Author

Chetan Yadav

Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL, and Cloud DBA with 15+ years of hands-on experience managing production databases across on-premises, hybrid, and cloud environments. He specialises in high availability architecture, performance tuning, disaster recovery, and database migrations.

Chetan has diagnosed and resolved Data Guard lag incidents across financial, healthcare, and e-commerce production environments , including the 8-hour apply gap from missing Standby Redo Logs and the firewall stateful inspection transport lag described in this article. His writing focuses on precise diagnosis and precise resolution, not generic advice.

This blog focuses on real-world DBA problems, career growth, and practical learning , not theoretical documentation or vendor marketing.

Chetan Yadav

Pages

Monday, March 30, 2026

Why Data Guard Lag Happens in Production: Sync, I/O and Network Deep Dive

Why Data Guard Lag Happens in Production: Sync, I/O and Network Deep Dive

1. Transport Lag vs Apply Lag: Understanding the Difference

2. The Diagnostic Starting Point: Five Queries to Run First

3. Cause 1: Primary Storage I/O Pressure

How to Confirm It

4. Cause 2: Redo Volume Spike

5. Cause 3: Network Latency and Bandwidth

Network Metrics That Matter for Data Guard

6. Cause 4: Misconfiguration

7. Cause 5: Standby Apply Bottleneck

8. Cause 6: Standby Resource Starvation

9. SYNC vs ASYNC: Which Mode Causes Which Lag Pattern

10. Lag Under Active Data Guard (Read-Only Standby)

11. Building a Lag Monitoring Alert

12. FAQ

About the Author

No comments:

Post a Comment

Pages

Monday, March 30, 2026

Why Data Guard Lag Happens in Production: Sync, I/O and Network Deep Dive

Why Data Guard Lag Happens in Production: Sync, I/O and Network Deep Dive

1. Transport Lag vs Apply Lag: Understanding the Difference

2. The Diagnostic Starting Point: Five Queries to Run First

3. Cause 1: Primary Storage I/O Pressure

How to Confirm It

4. Cause 2: Redo Volume Spike

5. Cause 3: Network Latency and Bandwidth

Network Metrics That Matter for Data Guard

6. Cause 4: Misconfiguration

7. Cause 5: Standby Apply Bottleneck

8. Cause 6: Standby Resource Starvation

9. SYNC vs ASYNC: Which Mode Causes Which Lag Pattern

10. Lag Under Active Data Guard (Read-Only Standby)

11. Building a Lag Monitoring Alert

12. FAQ

13. Related Reading from Real Production Systems

About the Author

No comments:

Post a Comment