Showing posts with label Oracle DBA. Show all posts
Showing posts with label Oracle DBA. Show all posts

Thursday, November 27, 2025

Stop Guessing: The "One-Shot" Script to Check Oracle Data Guard Health (Lags, Gaps & MRP)

Oracle Data Guard Health Check Script Dashboard Primary Standby



If you are a DBA, you know the panic of a "Quiet Standby." The alerts are silent. The phone isn't ringing. But deep down, you wonder: Is my Disaster Recovery (DR) site actually in sync, or has it been stuck on Sequence #10452 since last Tuesday?

Too many monitoring tools (like OEM or Zabbix) only trigger an alert when the lag hits a threshold (e.g., "Lag > 30 Mins"). By then, it’s often too late. You don't just want to know if there is a lag; you need to know where the lag is.

Is it the Network (Transport Lag)? Or is it the Disk/CPU (Apply Lag)?

Below is the exact script I use in my daily health checks. It consolidates 4 different dynamic performance views (v$dataguard_statsv$managed_standbyv$archive_gapv$database) into one single "Truth" report.

The Script (dg_health_check.sql)

Save this as dg_health_check.sql and run it on your Standby Database.

SQL
SET LINESIZE 200 PAGESIZE 1000 CHECK OFF FEEDBACK OFF ECHO OFF VERIFY OFF
COL name FORMAT a30
COL value FORMAT a20
COL unit FORMAT a30
COL time_computed FORMAT a25
COL process FORMAT a10
COL status FORMAT a15
COL sequence# FORMAT 99999999
COL block# FORMAT 999999
COL error_message FORMAT a50

PROMPT ========================================================
PROMPT  ORACLE DATA GUARD HEALTH CHECK (Run on Standby)
PROMPT ========================================================

PROMPT
PROMPT 1. DATABASE ROLE & PROTECTION MODE
PROMPT ----------------------------------------
SELECT name, db_unique_name, database_role, open_mode, protection_mode 
FROM v$database;

PROMPT
PROMPT 2. REAL-TIME LAG STATISTICS (The Source of Truth)
PROMPT ----------------------------------------
-- Transport Lag = Delay in receiving data (Network Issue)
-- Apply Lag     = Delay in writing data (IO/CPU Issue)
SELECT name, value, unit, time_computed 
FROM v$dataguard_stats 
WHERE name IN ('transport lag', 'apply lag', 'estimated startup time');

PROMPT
PROMPT 3. MRP (MANAGED RECOVERY PROCESS) STATUS
PROMPT ----------------------------------------
-- IF NO ROWS SELECTED: Your recovery is STOPPED.
-- Look for 'APPLYING_LOG' or 'WAIT_FOR_LOG'
SELECT process, status, thread#, sequence#, block# 
FROM v$managed_standby 
WHERE process LIKE 'MRP%';

PROMPT
PROMPT 4. GAP DETECTION
PROMPT ----------------------------------------
-- If rows appear here, you have a missing archive log that FAL_SERVER could not fetch.
SELECT * FROM v$archive_gap;

PROMPT
PROMPT 5. RECENT ERRORS (Last 10 Events)
PROMPT ----------------------------------------
SELECT TO_CHAR(timestamp, 'DD-MON-RR HH24:MI:SS') as err_time, message 
FROM v$dataguard_status 
WHERE severity IN ('Error','Fatal') 
AND timestamp > sysdate-1
ORDER BY timestamp DESC
FETCH FIRST 10 ROWS ONLY;

PROMPT ========================================================
PROMPT  END OF REPORT
PROMPT ========================================================

Oracle Data Guard Transport Lag vs Apply Lag Architecture Diagram


How to Analyze the Output (Like a Senior DBA)

Scenario A: High Transport Lag

  • What you see: Transport Lag is high (e.g., +00 01:20:00), but Apply Lag is low.

  • What it means: Your Primary database is generating Redo faster than your network can ship it.

  • The Fix: Check your network bandwidth. If you are using Oracle 19c or 23ai, consider enabling Redo Compression in your Data Guard broker configuration (EditDatabase Set Property RedoCompression='ENABLE').

Scenario B: High Apply Lag

  • What you see: Transport Lag is near 0, but Apply Lag is climbing (e.g., +00 00:45:00).

  • What it means: The data is there (on the standby server), but the database can't write it to disk fast enough. This often happens during batch loads or index rebuilds on the Primary.

  • The Fix: Check I/O stats on the Standby. Ensure you are using Real-Time Apply so the MRP (Managed Recovery Process) reads directly from Standby Redo Logs (SRLs) rather than waiting for archive logs to be finalized.

Scenario C: MRP Status is "WAIT_FOR_GAP"

  • What you see: In Section 3, the status is WAIT_FOR_GAP.

  • What it means: A severe gap has occurred. The Standby is missing a specific sequence number and cannot proceed until you manually register that file.

  • The Fix: Run the query in Section 4 (v$archive_gap) to identify the missing sequence, restore it from backup, and register it.

Why this works in 2026

Old school scripts relied on v$archived_log, which only tells you history. In modern Oracle Cloud (OCI) and Hybrid environments, v$dataguard_stats is the only view that accurately calculates the time difference between the Primary commit and the Standby visibility.

Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL & Cloud DBA with 14+ years of experience supporting high-traffic production environments across AWS, Azure, and on-premise systems. His core expertise includes Oracle RAC, ASM, Data Guard, performance tuning, HA/DR design, monitoring frameworks, and real-world troubleshooting.

He also trains DBAs globally through deep-dive technical content, hands-on sessions, and automation workflows using n8n, AI tools, and modern monitoring stacks. His mission is to help DBAs solve real production problems and grow into high-paying remote roles worldwide.

Chetan regularly publishes expert content across Oracle, PostgreSQL, MySQL, and Cloud DBA technologies—including performance tuning guides, DR architectures, monitoring tools, scripts, and real incident-based case studies.


Explore More Technical Work

LinkedIn (Professional Profile & Articles)
https://www.linkedin.com/in/chetanyadavvds/

YouTube – Oracle Foundations Playlist
https://www.youtube.com/playlist?list=PL5TN6ECUWGROHQGXep_5hff-2ageWTp4b

Telegram – LevelUp_Careers DBA Tips
https://t.me/LevelUp_Careers

Instagram – Oracle/Cloud Learning Reels
https://www.instagram.com/levelup_careers/

Facebook Page – OracleDBAInfo
https://www.facebook.com/OracleDBAInfo

These platforms feature guides, scripts, diagrams, troubleshooting workflows, and real-world DBA case studies designed for database professionals worldwide.

Friday, November 7, 2025

How I Use ChatGPT and Automation to Save 3 Hours a Day as a Database Administrator (Real Workflow Example)

How I Use ChatGPT and Automation to Save 3 Hours a Day as a DBA

DBA working on database performance dashboards with ChatGPT AI assistant



The New Reality of Database Administration

Database environments today are more dynamic than ever. A DBA manages hybrid and multi-cloud systems across Oracle, PostgreSQL, Aurora MySQL, and other platforms.
While architecture complexity keeps growing, the number of hours in a day does not. Much of a DBA’s time still goes into manual analysis, log checks, and repetitive reporting.

To reclaim that time, I built a workflow using ChatGPT for analysis and n8n for automation. Together they now handle much of the repetitive monitoring and documentation work that used to slow me down.


Step 1: Using ChatGPT as an Analytical Assistant 

ChatGPT analyzing SQL execution plan with database performance metrics and query optimization insights on screen




I use ChatGPT as an intelligent interpreter for the technical data I already collect.

SQL and AWR Analysis
Prompt example:

Analyze this SQL execution plan. Identify expensive operations, missing indexes, and filter or join inefficiencies.

ChatGPT highlights cost-heavy steps, missing statistics, and joins that need review. I then validate insights using DBMS_XPLAN.DISPLAY_CURSOR before making any changes.

Incident Summaries and RCA Drafts
Prompt example:

Summarize the top waits and likely root causes from this AWR report in concise technical language for a status email.

This produces a clean summary that I can send to teams without spending time on formatting or rewriting.

Documentation and SOPs
Prompt example:

Write a step-by-step guide for restoring an Oracle 19c database from RMAN backup using target and auxiliary channels.

The generated draft is clear and consistent, saving time on documentation while maintaining accuracy.


Step 2: Automating Monitoring and Alerts with n8n


n8n automation workflow showing ChatGPT integration with CloudWatch, Google Sheets, and Teams for database monitoring alerts



After simplifying documentation, I focused on automating data flow and notifications. Using n8n, I built workflows that:

When IO latency crosses a set threshold, the summary reads:

IO wait time on the primary database instance exceeded 60 percent. Possible cause: concurrent updates or storage contention. Review session activity and storage throughput.

Each alert is logged automatically in Google Sheets for trend analysis, so I no longer need to export or merge reports manually.


Step 3: The Measured Impact

 

Team dashboard displaying real-time database performance alerts with IO latency, CPU utilization, and query wait time summaries

After a few weeks, the results were visible:

  • Around 3 hours of manual effort are saved daily.

  • Faster communication through structured alerts.

  • Fewer repetitive RCA summaries.

  • More focus on architecture, tuning, and mentoring.

This combination of ChatGPT and n8n now runs quietly in the background, reducing operational overhead and improving accuracy.


Key Takeaways

Automation does not replace DBAs; it amplifies their impact.
ChatGPT brings analytical speed and structured communication.
n8n enables event-driven automation that scales without complexity.

If you’re managing complex environments, start with one task — maybe your daily health check or backup report — and automate it. Small steps quickly add up to big efficiency gains.


Final Thought

The next phase of database administration belongs to professionals who merge technical expertise with intelligent automation.
Instead of reacting to alerts, we should design systems that interpret themselves.

Start small, validate your results, and let automation do the routine work so you can focus on engineering.


Where I Share More

If you want to explore DBA automation, Oracle training, or real-world case studies, follow my work here:

🎥 YouTube: LevelUp_Careers Oracle Foundation Playlist
💬 Telegram: @LevelUp_Careers
📸 Instagram: @levelup_careers
🧠 LinkedIn Newsletter: LevelUp DBA Digest

Follow any of these for practical DBA learning and automation insights.


 

#OracleDBA #DatabaseAutomation #ChatGPT #CloudDBA #n8n #AIOps #PerformanceTuning #DatabaseMonitoring #AutomationEngineering #TechLeadership

Where I Share More

If you are interested in DBA automation, Oracle training, or real-world case studies, you can explore more of my content below:

🎥 YouTube: LevelUp_Careers Oracle Foundation Playlist
💬 Telegram: @LevelUp_Careers
📸 Instagram: @levelup_careers
🧠 LinkedIn Newsletter: LevelUp DBA Digest

Follow any of these to keep learning and stay updated on practical DBA automation workflows.