Showing posts with label Oracle Database. Show all posts
Showing posts with label Oracle Database. Show all posts

Thursday, February 12, 2026

Multi-Cloud Database Architecture: Oracle, AWS, Azure, GCP Reference Guide (2026)

Multi-Cloud Database Architecture: Oracle, AWS, Azure, GCP Reference Guide

Multi-Cloud Database Architecture: Oracle, AWS, Azure, GCP Reference Guide

Production-Proven Reference Architectures for Enterprise DBAs
📅 February 04, 2026
👤 Chetan Yadav - Senior Oracle & Cloud DBA
⏱️ 20-22 min read
⏱️ Estimated Reading Time: 20–22 minutes
☁️ Multi-Cloud Database Architecture - Oracle, AWS RDS, Azure SQL, GCP AlloyDB

Six months ago, our CTO announced we were moving to a "multi-cloud strategy." The board wanted vendor independence. Engineering wanted best-of-breed services. Finance wanted competitive pricing.

I was the DBA who had to make it actually work.

Cloud computing infrastructure with interconnected networks representing multi-cloud database architecture across Oracle, AWS, Azure, and GCP platforms

Designing multi-cloud database architecture isn't about running the same database everywhere. It's about knowing which workload belongs on which platform, how to synchronize data across clouds, and when vendor lock-in is actually acceptable.

This guide covers reference architectures from production systems running across Oracle on-premises, AWS RDS, Azure SQL Managed Instance, and GCP AlloyDB. If you're a DBA planning or managing multi-cloud databases, these are the patterns that work.

Thursday, February 5, 2026

Azure Managed Instance Migration Checklist: Oracle DBA's Step-by-Step Guide 2026

Azure Managed Instance Migration Checklist for Oracle DBAs: Complete Guide

Azure Managed Instance Migration Checklist for Oracle DBAs

Your Complete Pre-Migration, Migration, and Post-Migration Validation Guide
📅 February 02, 2026
👤 Chetan Yadav - Senior Oracle & Cloud DBA
⏱️ 16-18 min read
⏱️ Estimated Reading Time: 16–18 minutes
☁️ Azure Managed Instance Migration - Enterprise-Grade Oracle to Cloud Migration

Two weeks before our planned Azure migration cutover, the project manager asked me, "We've tested the migration three times in dev. What could possibly go wrong in production?"

I showed him my checklist. Sixty-three items. He thought I was being overcautious. Until we hit production and discovered our application connection strings were hardcoded with Oracle-specific syntax that Azure SQL Managed Instance didn't support. The migration succeeded, but the application couldn't connect.

Azure cloud migration infrastructure showing database migration from Oracle to Azure Managed Instance with enterprise cloud computing

That's when he understood: migration isn't just moving data. It's validating every assumption about how your application interacts with the database.

This checklist represents lessons from five Azure Managed Instance migrations, including two that required emergency rollbacks. If you're an Oracle DBA planning an Azure migration, this is the checklist I wish I had on my first project.

Monday, February 2, 2026

AWS RDS Parameter Groups: The Oracle DBA's Complete Guide to Cloud Database Tuning

AWS RDS Parameter Groups: The Oracle DBA's Complete Guide to Cloud Database Tuning
Senior Oracle & Cloud DBA
Real-World Databases • Cloud • Reliability • Careers
LevelUp Careers Initiative

AWS RDS Parameter Groups: The Oracle DBA's Complete Guide to Cloud Database Tuning

What Changes, What Doesn't, and What Actually Matters in Production RDS Environments
📅 January 22, 2024
👤 Chetan Yadav - Oracle ACE Apprentice
⏱️ 10-15 min read
AWS RDS cloud database infrastructure showing parameter configuration and Oracle database management in cloud computing environment
⏱️ Estimated Reading Time: 14–15 minutes
🌐 Oracle Globally Distributed Database - Global Scale, Local Performance

Six months into my first AWS RDS migration project, I confidently told the infrastructure team, "Just give me the same SGA size we had on-premises." They did. The database started, connected, and seemed fine. Performance reports looked acceptable in testing.

Three days into production, query response times were 40% slower than on-premises. The application team was escalating. My manager was asking questions. I spent two days checking execution plans, reviewing AWR reports, and validating indexes. Everything looked identical to our on-premises environment.

The problem wasn't in my queries. It was in my assumptions.

I had treated AWS RDS like a traditional Oracle installation where I could tune anything. In RDS, AWS controls the operating system, manages memory allocation differently, and restricts access to initialization parameters that I had relied on for fifteen years. I wasn't working with Oracle anymore. I was working with Oracle-on-RDS, and those are fundamentally different operational models.

1. Why RDS Parameter Groups Matter for Oracle DBAs

If you've spent your career managing on-premises Oracle databases, AWS RDS represents a fundamental shift in how you work. The change isn't just about cloud infrastructure. It's about accepting that AWS owns parts of database administration that you previously controlled completely.

What Changes When You Move to RDS

On-premises, you had root access to the operating system, complete control over initialization parameters, and the ability to tune anything from kernel settings to memory allocation. In RDS, AWS manages the OS, restricts access to certain parameters, and makes architectural decisions about resource allocation that you cannot override.

This is not a limitation. It's a trade-off. AWS handles patching, backups, high availability, and infrastructure management. In exchange, you accept reduced control over low-level tuning. The question is whether you understand which tuning options remain available and which are now AWS's responsibility.

The Three Categories of RDS Parameters

Category 1: Parameters you can modify immediately. These include application-level settings like optimizer_mode, cursor_sharing, and session-level memory parameters. Changes take effect without reboot.

Category 2: Parameters requiring instance reboot. These control memory allocation, process limits, and connection settings. Examples include db_cache_size, shared_pool_size, and processes. AWS allows modification but requires downtime to apply.

Category 3: Parameters AWS controls entirely. These include memory_target, sga_target (in automatic memory management mode), and OS-level settings. AWS sets these based on instance class and you cannot override them.

Why Most DBAs Get RDS Parameters Wrong

The typical mistake is trying to replicate on-premises tuning in RDS without understanding what AWS has already optimized. Common errors:

  • Attempting to tune memory_target when AWS manages it automatically
  • Spending hours adjusting db_file_multiblock_read_count when AWS storage I/O makes it irrelevant
  • Tuning SGA components individually when automatic memory management is already optimal
  • Expecting filesystem-level access for datafile management

The correct approach: Learn which parameters matter in RDS, which are handled automatically, and which are architectural decisions you should make before launching the instance, not after.

2. Essential Commands: Viewing and Modifying Parameter Groups

RDS parameter management uses AWS CLI and SQL queries. Unlike on-premises where you could edit SPFILE directly, RDS requires you to work through AWS APIs or console.

Command 1: List All Parameter Groups in Your AWS Account

AWS CLI - List Parameter Groups
aws rds describe-db-parameter-groups \ --region us-east-1 \ --output table

What this shows you: All custom and default parameter groups available in your region. Default parameter groups are managed by AWS (you cannot modify them). Custom parameter groups are yours to modify.

Command 2: View All Parameters in a Specific Group

AWS CLI - View Parameter Group Details
aws rds describe-db-parameters \ --db-parameter-group-name prod-oracle-params \ --query 'Parameters[?IsModifiable==`true`].[ParameterName,ParameterValue,ApplyType]' \ --output table

Key filter: IsModifiable==true shows only parameters you can actually change. No point reviewing parameters AWS controls.

Command 3: Modify a Parameter (Dynamic - No Reboot)

AWS CLI - Modify Dynamic Parameter
aws rds modify-db-parameter-group \ --db-parameter-group-name prod-oracle-params \ --parameters "ParameterName=optimizer_index_cost_adj,ParameterValue=50,ApplyMethod=immediate"

ApplyMethod options:

  • immediate - Parameter changes apply to existing sessions (dynamic parameters only)
  • pending-reboot - Parameter changes require instance restart to take effect

Command 4: Verify Current Parameters from Inside the Database

SQL - Check Running Parameters
SELECT name, value, isdefault, ismodified FROM v$parameter WHERE name IN ( 'optimizer_mode', 'optimizer_index_cost_adj', 'db_cache_size', 'shared_pool_size', 'processes' ) ORDER BY name;

Why this matters: AWS CLI shows parameter group configuration. This SQL shows what's actually running in the database. They should match, but during changes or after reboot failures, they might not.

Production reality: Always verify parameters inside the database after making changes via AWS CLI. I've seen situations where AWS API confirmed the change, but the database was still running old values because a reboot failed silently. Trust, but verify.

3. Parameter Analysis: What the Output Really Means

Understanding command output is not about reading column names. It's about knowing what decisions the data enables and what actions to take next.

Scenario 1: Identifying Parameter Drift Between Environments

You manage three RDS instances: dev, staging, and production. You suspect parameter drift because staging performance doesn't match production.

Parameter Production Staging Drift?
optimizer_mode ALL_ROWS ALL_ROWS No
optimizer_index_cost_adj 50 100 Yes
cursor_sharing EXACT FORCE Yes
processes 500 300 Yes

What this tells you: Three parameters differ between environments. Two are optimizer-related (likely affecting execution plans), and one is connection-related (might cause connection failures under load).

Correct DBA action:

  • Verify if staging is intentionally configured differently or if this is accidental drift
  • Check if application testing accounts for these differences (test results might not represent production behavior)
  • Document the differences or align the environments - no in-between

What you do NOT do: Immediately sync all parameters to match production. Staging might be intentionally configured differently to test parameter changes before production rollout. Understand why the drift exists before eliminating it.

Scenario 2: Parameter Change Didn't Take Effect

You modified db_cache_size via AWS CLI. AWS API confirmed success. But querying v$parameter shows the old value still running.

Check Expected Actual Status
Parameter Group Setting 2GB 2GB Correct
v$parameter value 2GB 1GB Wrong
Instance Status available available Running
Pending Reboot No Yes ⚠ Action Needed

Root cause: Parameter requires reboot to take effect, but instance hasn't been rebooted yet. AWS API accepted the change but hasn't applied it.

Decision point:

  • If this is production during business hours → Schedule reboot during maintenance window
  • If this is non-production → Reboot immediately to complete the change
  • If multiple parameters are pending → Batch all changes into one reboot, don't reboot multiple times

4. Critical Parameters You Can (and Cannot) Control

This section is organized by impact, not alphabetically. These are the parameters that matter most in production RDS environments, grouped by what control you actually have.

Category 1: High-Impact Parameters You Can Tune

optimizer_mode

  • What it controls: Whether Oracle optimizer favors throughput (ALL_ROWS) or response time (FIRST_ROWS)
  • Production mistake: Leaving default ALL_ROWS when application has interactive queries that need first rows fast
  • Decision checklist:
    • Is this application batch-processing heavy? (Use ALL_ROWS)
    • Do users wait for query results interactively? (Use FIRST_ROWS)
    • Are there a mix of workload types? (Consider session-level setting instead of instance-level)

cursor_sharing

  • What it controls: Whether Oracle reuses execution plans for similar SQL with different literal values
  • Production mistake: Setting to FORCE to "fix" application that doesn't use bind variables, without understanding the side effects
  • Don't use FORCE as permanent solution - it masks application problems and can cause suboptimal execution plans
  • Use EXACT (default) and work with developers to fix SQL with proper bind variables

processes

  • What it controls: Maximum number of concurrent Oracle processes (connections + background processes)
  • Production mistake: Setting too low, causing "maximum number of processes exceeded" during peak load
  • Formula: Set processes = (max concurrent connections × 1.2) + 50 background processes
  • Example: If application can open 400 connections, set processes to at least 530

Category 2: Parameters That Require Reboot

db_cache_size and shared_pool_size

  • What they control: Buffer cache for data blocks and shared SQL area memory
  • Don't tune these manually when AWS automatic memory management is already enabled
  • Don't waste time micro-tuning individual SGA components - AWS handles this better automatically
  • Verify if memory_target or sga_target is set first before attempting manual tuning

Category 3: Parameters AWS Controls (Know Your Limits)

memory_target and memory_max_target

  • What AWS does: Sets these automatically based on RDS instance class (typically 75-85% of instance memory)
  • You cannot override AWS's memory allocation
  • Choose a different instance class with more memory if needed
  • Production reality: Accept that AWS manages total memory allocation. Focus on choosing the right instance size instead of trying to tune memory parameters you don't control

compatible

  • What AWS does: Sets to the Oracle version you selected when launching the instance
  • You cannot change compatibility mode to emulate older Oracle versions
  • Test application compatibility before migrating to newer Oracle versions in RDS
Cloud database parameter configuration dashboard showing AWS RDS settings and monitoring

5. Troubleshooting: When Parameters Aren't the Problem

The hardest troubleshooting skill is recognizing when parameters are NOT the root cause. Junior DBAs see slow performance and immediately start adjusting parameters. Senior DBAs investigate whether parameters matter at all.

Scenario 1: Performance is Slow but Parameters Look Correct

Symptom: Application response time is 2x slower in RDS than on-premises. All parameters match on-premises configuration. AWR shows similar execution plans.

Root cause investigation order:

  • Check storage I/O performance first: RDS storage types (gp2, gp3, io1, io2) have different IOPS and throughput limits
  • Verify instance class matches workload: A db.m5.large has 2 vCPUs and 8GB RAM - your on-premises server might have had 16 cores and 128GB
  • Check network latency: If application and database are in different availability zones, add 1-3ms network latency per query
  • Review AWS throttling: RDS has network and storage throughput limits per instance class

Correct DBA action: Start with CloudWatch metrics (CPU, IOPS, network throughput, storage latency) before touching parameters. If metrics show you're hitting AWS limits, parameters won't help. You need a different instance class or storage type.

Under pressure, DBAs often tune parameters because it feels like taking action. But tuning parameters when you're bottlenecked on IOPS or CPU is like rearranging deck chairs on the Titanic. Fix the resource constraint first, tune parameters second.

Scenario 2: Parameter Change Made Performance Worse

Symptom: You increased db_cache_size expecting better performance. Instead, queries are slower and you're seeing more physical reads than before.

Root cause: In RDS, AWS manages memory allocation between SGA and PGA. When you manually increased db_cache_size, you likely reduced PGA allocation, causing sorts and hash joins to spill to temp tablespace.

Correct DBA action:

  • Check v$pgastat for PGA memory pressure indicators
  • Review AWR for increased temp tablespace I/O
  • Roll back the parameter change and let AWS automatic memory management handle allocation
  • If you truly need more memory, scale up to a larger instance class

What you do NOT do: Try to compensate by tuning other memory parameters. You're likely fighting AWS's automatic management.

6. Automation: Safe Parameter Management in Production

Parameter management in RDS is more automation-friendly than on-premises because changes go through AWS APIs (auditable, version-controlled). But automation still requires discipline.

Safe to Automate

  • Parameter drift detection: Daily comparison of parameter groups across environments
  • Parameter documentation: Automatic export of parameter group settings to Git for version control
  • Alert on pending reboots: Monitoring when parameter changes are pending but haven't been applied
  • Parameter validation: Pre-deployment checks that verify parameters meet standards

Dangerous to Automate

  • Automatic parameter changes based on metrics: Auto-tuning can cause cascading failures
  • Automatic instance reboots: Reboots should be scheduled, not automatic
  • Parameter changes without review: All changes should require human approval in production

Example: Parameter Drift Detection Script

Python Script - Detect Parameter Drift
#!/usr/bin/env python3 import boto3 import json def compare_parameter_groups(pg1_name, pg2_name): """Compare two RDS parameter groups and identify differences.""" rds = boto3.client('rds', region_name='us-east-1') # Get parameters from both groups pg1 = rds.describe_db_parameters(DBParameterGroupName=pg1_name) pg2 = rds.describe_db_parameters(DBParameterGroupName=pg2_name) # Build dictionaries for comparison pg1_params = {p['ParameterName']: p.get('ParameterValue') for p in pg1['Parameters'] if p.get('IsModifiable')} pg2_params = {p['ParameterName']: p.get('ParameterValue') for p in pg2['Parameters'] if p.get('IsModifiable')} # Find differences differences = [] for param_name in pg1_params: if pg1_params[param_name] != pg2_params.get(param_name): differences.append({ 'parameter': param_name, f'{pg1_name}': pg1_params[param_name], f'{pg2_name}': pg2_params.get(param_name, 'NOT SET') }) return differences if __name__ == "__main__": diffs = compare_parameter_groups('prod-oracle-params', 'staging-oracle-params') if diffs: print(f"WARNING: Found {len(diffs)} parameter differences:") print(json.dumps(diffs, indent=2)) else: print("No parameter drift detected")

7. Interview Preparation: Common RDS Parameter Questions

When interviewing for cloud DBA roles, expect questions that test your understanding of managed service trade-offs, not just Oracle parameter knowledge.

Question 1: Memory_target Cannot Be Modified

Interviewer asks: "Your team is migrating from on-premises Oracle to RDS. The lead developer asks why they can't set memory_target to match their current on-premises configuration. How do you explain this?"

Strong answer: "In RDS, AWS manages memory allocation as part of the managed service model. The benefit is that AWS automatically sizes memory based on instance class and handles memory management optimization. The trade-off is reduced control over individual memory parameters. If the team requires precise memory_target control, we have two options: either choose a different RDS instance class that provides the needed memory allocation, or consider self-managed Oracle on EC2 where we have full control but also full operational responsibility. I would recommend starting with RDS using AWS's memory management and only moving to EC2 if we identify specific workloads that genuinely require manual memory tuning that RDS cannot accommodate."

Question 2: Parameter Change Not Taking Effect

Interviewer asks: "After changing a parameter group setting, you verify the AWS console shows the new value, but querying v$parameter in the database shows the old value. Walk me through your troubleshooting process."

Strong answer: "First, I'd check if the parameter requires reboot to take effect by running 'aws rds describe-db-instances' and looking for pending-reboot status. If reboot is pending, the parameter change is staged but not applied yet. Second, I'd verify I'm querying the correct RDS instance. Third, I'd check the instance's parameter group association to ensure it's actually using the parameter group I modified. Fourth, if everything checks out but the value is still wrong, I'd review CloudWatch logs for errors during the last attempted parameter application."

8. Career Impact: Why This Knowledge Matters

Understanding RDS parameter management is not just about technical skills. It's about career positioning in a cloud-first industry.

Salary Impact

DBAs with cloud database expertise (including RDS parameter management) command 15-25% higher salaries than on-premises-only DBAs in most markets. This knowledge signals that you understand managed service trade-offs and can work effectively in cloud environments.

Interview Success

For any role involving cloud databases, expect questions about:

  • Managed service trade-offs (control vs convenience)
  • When to use RDS vs self-managed EC2 Oracle
  • Parameter limitations and workarounds
  • CloudWatch monitoring without OS access

Career Progression

DBAs who only know on-premises Oracle hit a career ceiling around mid-level DBA. Understanding cloud databases (including RDS parameters) opens paths to:

  • Senior DBA roles requiring cloud architecture knowledge
  • Database architect positions designing hybrid environments
  • DevOps/SRE roles managing infrastructure as code
  • Consulting roles helping companies migrate to cloud
Career growth chart showing cloud database skills impact on database administrator career progression

9. FAQ

Is AWS RDS parameter group knowledge tested in Oracle DBA interviews?
Yes, increasingly. For any role involving cloud databases, expect questions about managed service trade-offs, parameter limitations, and when to use RDS vs self-managed databases. Even if the specific job uses PostgreSQL or MySQL RDS, understanding parameter management concepts demonstrates cloud competency. Employers want DBAs who think architecturally about managed services, not just DBAs who memorize parameter names.
Can I mention RDS parameter group experience on my resume if I've only used it in AWS free tier?
Yes, but be honest. Write "Hands-on experience with AWS RDS Oracle parameter groups in lab environment" rather than implying production experience. During interviews, emphasize what you learned about managed service constraints and how you'd apply that understanding to production systems. Employers value learning initiative. What matters is that you understand the operational differences between RDS and on-premises Oracle, not that you've managed petabyte-scale RDS databases.
Should I learn RDS parameter groups if my company uses self-managed Oracle on EC2?
Absolutely. The industry trend is toward managed services. Even if your current company uses EC2, your next job might use RDS or other managed databases (Azure SQL Database, Google Cloud SQL). Understanding RDS parameter constraints helps you evaluate when managed services are appropriate vs when self-managed makes sense. This architectural thinking is valuable regardless of which specific database platform you support.
What's the biggest mistake on-premises Oracle DBAs make when moving to RDS?
Trying to replicate on-premises tuning in RDS without understanding what AWS already optimizes. On-premises DBAs spend hours tuning individual SGA components, adjusting OS kernel parameters, and implementing custom scripts for tasks AWS handles automatically. The correct approach is to learn which parameters AWS manages, accept that managed services involve trade-offs, and focus your tuning efforts on application-level parameters you still control. DBAs who fight AWS restrictions struggle. DBAs who embrace the managed service model thrive.

10. Related Reading from Real Production Systems

About the Author

Chetan Yadav

Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL, and Cloud DBA with 14+ years of hands-on experience managing production databases across on-premises, hybrid, and cloud environments. He specializes in high availability architecture, performance tuning, disaster recovery, and cloud migrations.

Throughout his career, Chetan has supported mission-critical databases for enterprises in finance, healthcare, and e-commerce sectors. He has led multiple successful cloud migration projects, helping teams transition from traditional on-premises Oracle installations to AWS RDS and self-managed cloud environments.

Chetan is passionate about mentoring early-career DBAs and sharing real-world production lessons that aren't found in documentation. His writing focuses on practical decision-making, career growth, and the operational realities of database administration in modern cloud-first organizations.

This blog focuses on real-world DBA problems, career growth, and practical learning — not theoretical documentation or vendor marketing.

Thursday, January 8, 2026

Oracle Listener Health Check: Preventing Silent Production Outages

This guide explains how Oracle Listener failures silently impact production systems Oracle Listener Health Check – Production Monitoring Guide | Chetan Yadav

⏱️ Estimated Reading Time: 14 minutes

Oracle Listener Health Check

It’s 2 AM. Your phone lights up with alerts. Applications are down, dashboards are red, and every connection attempt fails with TNS-12541: TNS:no listener. The database is up — but the business is still dead.

In real production environments, a failed Oracle Listener can block thousands of users, cause SLA breaches, and trigger revenue loss within minutes. We’ve seen P99 login latency jump from milliseconds to total outages.

This guide shows how to implement a production-grade Oracle Listener health check using scripts, monitoring logic, and automation — before the listener becomes your single point of failure.

Monday, January 5, 2026

Oracle 9i on Solaris: A Production Case Study from the Pre-Cloud Era

Oracle 9i on Solaris Case Study – A Deep Dive Retro

⏱️ Estimated Reading Time: 12-13 minutes

Oracle 9i on Solaris Case Study (A Deep Dive Retro)

In the early 2000s, database outages were not masked by load balancers, auto-scaling groups, or managed cloud services. If an Oracle database went down on Solaris, the business felt it immediately.

There was no RAC, no ASM, no cloud snapshots. Every failure meant manual diagnosis, filesystem checks, kernel tuning, and long recovery windows. Downtime translated directly into revenue loss and SLA breaches.

This case study revisits a real-world Oracle 9i on Solaris production environment, breaking down architecture decisions, tuning practices, failures, and the lessons modern DBAs can still apply today.

Saturday, November 29, 2025

Oracle RAC Cluster Health Audit: The 2026 Production DBA Guide

⏱️ Estimated Reading Time: 5–6 minutes


In a production Oracle Real Application Clusters (RAC) environment, stability is everything. A single corrupt OCR, a missing Voting Disk, or an unstable CRS stack can lead to node evictions (split-brain scenarios) and unplanned downtime.

This article provides a comprehensive Shell Script for RAC Cluster Health Audits. It covers CRS status, OCR integrity, Voting Disk validation, and resource stability checks—perfect for daily monitoring or pre-patching validation.


Oracle RAC Cluster Health Audit 2026 Guide for Production DBAs showing high availability database architecture and performance metrics


Table of Contents

  1. Why You Must Audit RAC Cluster Health Daily
  2. Production-Ready RAC Health Check Script (Shell)
  3. Script Output & Analysis Explained
  4. Critical Components: OCR, Voting Disk & CRS
  5. Troubleshooting Common RAC Issues
  6. How to Automate This Audit (Cron)
  7. Interview Questions: RAC Troubleshooting
  8. Final Summary
  9. FAQ
  10. About the Author

1. Why You Must Audit RAC Cluster Health Daily

Oracle RAC relies on a complex stack of clusterware services. Neglecting these checks leads to:

  • Node Evictions: Caused by heartbeat failures or voting disk I/O timeouts.
  • OCR Corruption: Resulting in the inability to start the clusterware stack.
  • Resource Regressions: Services or VIPs flapping between nodes.
  • Split-Brain Syndrome: Where nodes lose communication and fight for control.

Running a unified audit script ensures you catch "INTERMEDIATE" or "OFFLINE" states before they become outages.


2. Production-Ready RAC Health Check Script

This shell script checks the core pillars of RAC stability: CRS Stack, OCR, Voting Disks, and Resource Status.

Note: Execute this script as the grid (or root) user.

#!/bin/bash # ==================================================== # Oracle RAC Cluster Health Audit Script # Author: Chetan Yadav # Usage: ./rac_health_check.sh # ==================================================== # Set Grid Environment (Adjust ORACLE_HOME as needed) export ORACLE_HOME=/u01/app/19.0.0/grid export PATH=$ORACLE_HOME/bin:$PATH echo "==================================================" echo " ORACLE RAC CLUSTER HEALTH AUDIT - $(date) " echo "==================================================" # 1. Check High Availability Services (OHAS) echo -e "\n[1] Checking CRS/OHAS Stack Status..." crsctl check crs # 2. Check Voting Disk Status (Quorum) echo -e "\n[2] Checking Voting Disk Configuration..." crsctl query css votedisk # 3. Check OCR Integrity (Registry) echo -e "\n[3] Checking Oracle Cluster Registry (OCR) Integrity..." # Note: Requires root or grid privileges ocrcheck # 4. Check Cluster Resources (Highlighting Issues) echo -e "\n[4] Scanning for OFFLINE or UNSTABLE Resources..." crsctl stat res -t | grep -E "OFFLINE|INTERMEDIATE|UNKNOWN" # 5. Check Cluster Interconnect (Private Network) echo -e "\n[5] Checking Cluster Interconnects..." oifcfg getif echo -e "\n==================================================" echo " AUDIT COMPLETE. CHECK LOGS FOR ANY ERRORS. " echo "=================================================="

This script consolidates five manual commands into a single health report, saving valuable time during incidents or daily checks.


3. Script Output & Analysis Explained

Check ComponentWhat "Healthy" Looks Like
crsctl check crsCSS, CRS, and EVM should all show "Online". If any are offline, the node is not part of the cluster.
Voting DiskMust show "successful discovery" and list valid disk paths (e.g., ASM disk groups).
ocrcheckLook for "Cluster registry integrity check succeeded". Ensure enough free space is available.
Resource ScanAny resource in "INTERMEDIATE" state implies it is struggling to start or stop. "OFFLINE" is only okay for idle instances.

4. Critical Components: OCR, Voting Disk & CRS

Understanding these acronyms is vital for any RAC DBA:

  • OCR (Oracle Cluster Registry): Stores configuration info (resources, nodes, instances). If this is corrupt, the cluster cannot start.
  • Voting Disk: The "heartbeat" file. Nodes write to this to prove they are alive. Loss of voting disk = immediate node eviction (reboot).
  • CRS (Cluster Ready Services): The main daemon managing high availability.

5. Troubleshooting Common RAC Issues

If the script reports errors, follow this workflow:

  1. CRS Fails to Start: Check $ORACLE_HOME/log/hostname/alerthostname.log. It is often a permission issue or network failure.
  2. Voting Disk Missing: Verify ASM disk group mounting status. Run kfod disks=all to check disk visibility at OS level.
  3. Intermittent Evictions: Check network latency on the private interconnect. High latency leads to "Missed Heartbeats".



Technical diagram of 2-Node Oracle RAC Cluster Architecture verifying Private Interconnect status, Voting Disk integrity, and OCR Registry consistency during a production health audit


6. How to Automate This Audit (Cron)

You can schedule this script to run daily at 7 AM before business hours. Add this line to the Grid user's crontab:

00 07 * * * /home/grid/scripts/rac_health_check.sh > /tmp/rac_health_$(date +\%F).log 2>&1

7. Interview Questions: RAC Troubleshooting

Prepare for these common questions during senior DBA interviews:

  • Q: What is a split-brain scenario in RAC?
    A: When nodes lose private network communication and both try to write to the database. Voting disk prevents this by fencing off one node.
  • Q: How do you backup OCR?
    A: Oracle automatically backs up OCR every 4 hours. You can also manually backup using `ocrconfig -manualbackup`.
  • Q: What command checks the private interconnect IPs?
    A: `oifcfg getif`.

8. Final Summary

A healthy RAC cluster requires vigilant monitoring of the clusterware stack, not just the database instances. The script provided above is a fundamental tool for checking CRS, OCR, and Voting Disk health instantly.

Use this script as part of your Weekly Health Check routine (as suggested in the Nov 2025 schedule) to ensure 99.999% availability.


9. FAQ

Q1: Can I run this script as the 'oracle' user?
A: Most `crsctl` check commands work, but `ocrcheck` and deep diagnostics usually require `grid` or `root` privileges.

Q2: What should I do if OCR check fails?
A: Restore from the latest automatic backup using `ocrconfig -restore`. Do not restart the stack until resolved.

Q3: Does this cause performance impact?
A: No, these are lightweight metadata queries.


About the Author

Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL and Cloud DBA with 14+ years of experience supporting high-traffic production environments across AWS, Azure and on-premise systems. His expertise includes Oracle RAC, ASM, Data Guard, performance tuning, HA/DR design, monitoring frameworks and real-world troubleshooting.

He trains DBAs globally through deep-dive technical content, hands-on sessions and automation workflows. His mission is to help DBAs solve real production problems and advance into high-paying remote roles worldwide.

Explore More Technical Work

Call to Action
If you found this helpful, follow my blog and LinkedIn for deep Oracle, MySQL, and RAC content. I publish real production issues, scripts, and monitoring guides to help you level up your DBA career.



Friday, November 21, 2025

Oracle SQL Monitoring Script (DBA-Friendly, Real-World Monitoring Guide)

Estimated Reading Time: 6–7 minutes

Monitoring is the backbone of every production database. SQL-level monitoring helps DBAs detect slow queries, high load SQLs, blocked sessions, and performance issues before they become incidents.
This article provides a production-ready SQL monitoring script, explains each part of it, and shows how DBAs can integrate it into daily checks.


Oracle SQL Monitoring Script Banner – Real DBA Guide

 


 Table of Contents  

1. What Does SQL Monitoring Mean for DBAs?  

2. Production-Ready SQL Monitoring Script  

3. Script Output Explained  

4. Performance Metrics You Must Watch  

5. Real-World DBA Scenario  

6. Automation Options  

7. FAQ  

8. Final Thoughts  


 



1. What Does SQL Monitoring Mean for Oracle DBAs? 

SQL monitoring helps identify:

  • High buffer gets SQL

  • High CPU-consuming SQL

  • SQL with long execution time

  • SQL causing row locks

  • SQL creating IO bottlenecks

  • SQL generating alerts or session waits

These insights allow proactive tuning and avoid outages.


2. Production-Grade Oracle SQL Monitoring Script 

Below is a clean, ready-to-run script used by senior DBAs in real production environments:

SET LINES 200 SET PAGES 200 SELECT s.sid, s.serial#, s.username, s.status, p.spid AS os_pid, q.sql_id, q.sql_text, ROUND(s.cpu_time/1000000,2) AS cpu_sec, s.blocking_session, s.event, s.wait_class, s.last_call_et AS seconds_since_last_call FROM v$session s LEFT JOIN v$sql q ON s.sql_id = q.sql_id LEFT JOIN v$process p ON s.paddr = p.addr WHERE s.username IS NOT NULL ORDER BY s.cpu_time DESC FETCH FIRST 20 ROWS ONLY;

3. What This Script Reveals 

Each column has a purpose:

ColumnMeaning
SID / SERIAL#Unique session ID
USERNAMEWhich user is running SQL
STATUSACTIVE / INACTIVE
OS_PID (spid)OS-level process identifier
SQL_IDQuery identifier
SQL_TEXTActual SQL query
CPU_SECCPU usage of query
BLOCKING_SESSIONDetect blocking
EVENTWhat the session is waiting on
WAIT_CLASSIO, CPU, Network, etc.
LAST_CALL_ETQuery running time

This helps diagnose real production issues in minutes.

For official Oracle reference, visit the Oracle Database Documentation:

https://docs.oracle.com/en/database/



4. Add-on: Top SQL by Buffer Gets 

SELECT sql_id, buffer_gets, executions, ROUND(buffer_gets/DECODE(executions,0,1,executions)) AS gets_per_exec, sql_text FROM v$sql ORDER BY buffer_gets DESC FETCH FIRST 10 ROWS ONLY;

Use this to find high-cost SQL.


5. Add-on: Top SQL by IO 

SELECT sql_id, disk_reads, direct_writes, sql_text FROM v$sql ORDER BY disk_reads DESC FETCH FIRST 10 ROWS ONLY;

Useful for IO-heavy systems and slow storage layers.


Oracle SQL Monitoring Architecture – User Session to v$ Views Flow Diagram


6. Real-World DBA Use Case

Example scenario:

  • Application suddenly slows down

  • Sessions start waiting on "db file sequential read"

  • Your script shows high CPU session + SQL_ID

  • SQL belongs to a reporting job running full table scans

  • You tune it (add index / rewrite SQL / fix stats)

  • Performance normalizes in seconds

This is exactly what impresses recruiters and decision-makers.


7. How To Automate These Checks 

DBAs usually automate this:

  • Add SQL scripts to a shell script

  • Run at intervals using cron or OEM

  • Forward output to email / Teams / Slack

  • Integrate with CloudWatch / SignalFx

  • Store SQL history for future tuning

I can prepare your automation version in bash or PowerShell if you want.


8. Interview Questions Based on This Topic 

Expect these in interviews:

  1. How do you identify top CPU SQL?

  2. What is SQL_ID and why is it important?

  3. Difference between buffer gets and disk reads?

  4. How will you find blocking sessions?

  5. Which wait events show IO bottlenecks?


9. Final Summary

A good SQL monitoring script gives DBAs visibility into session health, CPU load, IO bottlenecks, and blocking in real time. This script is production-ready and can be used for performance tuning, health checks, or automated monitoring.

 FAQ – Oracle SQL Monitoring


Q1: What is SQL monitoring in Oracle?
SQL monitoring helps identify CPU-heavy SQLs, IO waits, blocking sessions, and real-time performance problems.


Q2: Which Oracle versions support this script?
The script works on 11g, 12c, 18c, and 19c as long as v$session and v$sql views are available.


Q3: When should DBAs run this script?
During slow application response, high CPU alerts, IO bottlenecks, blocking chains, or performance degradation.


Q4: Can this script be automated?
Yes—using cron, OEM, Python, or n8n workflows for scheduled reporting.



Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL & Cloud DBA with 14+ years of experience supporting high-traffic production environments across AWS, Azure, and on-premise systems. His core expertise includes Oracle RAC, ASM, Data Guard, performance tuning, HA/DR design, monitoring frameworks, and real-world troubleshooting.

He also trains DBAs globally through deep-dive technical content, hands-on sessions, and automation workflows using n8n, AI tools, and modern monitoring stacks. His mission is to help DBAs solve real production problems and grow into high-paying remote roles worldwide.

Chetan regularly publishes expert content across Oracle, PostgreSQL, MySQL, and Cloud DBA technologies—including performance tuning guides, DR architectures, monitoring tools, scripts, and real incident-based case studies.


Explore More Technical Work

LinkedIn (Professional Profile & Articles)
https://www.linkedin.com/in/chetanyadavvds/

YouTube – Oracle Foundations Playlist
https://www.youtube.com/playlist?list=PL5TN6ECUWGROHQGXep_5hff-2ageWTp4b

Telegram – LevelUp_Careers DBA Tips
https://t.me/LevelUp_Careers

Instagram – Oracle/Cloud Learning Reels
https://www.instagram.com/levelup_careers/

Facebook Page – OracleDBAInfo
https://www.facebook.com/OracleDBAInfo

These platforms feature guides, scripts, diagrams, troubleshooting workflows, and real-world DBA case studies designed for database professionals worldwide.