Showing posts with label Oracle Database. Show all posts
Showing posts with label Oracle Database. Show all posts

Monday, January 5, 2026

Oracle 9i on Solaris: A Production Case Study from the Pre-Cloud Era

Oracle 9i on Solaris Case Study – A Deep Dive Retro

⏱️ Estimated Reading Time: 12-13 minutes

Oracle 9i on Solaris Case Study (A Deep Dive Retro)

In the early 2000s, database outages were not masked by load balancers, auto-scaling groups, or managed cloud services. If an Oracle database went down on Solaris, the business felt it immediately.

There was no RAC, no ASM, no cloud snapshots. Every failure meant manual diagnosis, filesystem checks, kernel tuning, and long recovery windows. Downtime translated directly into revenue loss and SLA breaches.

This case study revisits a real-world Oracle 9i on Solaris production environment, breaking down architecture decisions, tuning practices, failures, and the lessons modern DBAs can still apply today.

Legacy enterprise data center environment showing Oracle 9i databases running on Solaris UNIX servers with physical storage, console administration, and pre cloud production architecture

Table of Contents

  1. Why You Must Monitor Legacy Oracle Systems Daily
  2. Production-Ready Oracle 9i Monitoring Script
  3. Script Output & Analysis Explained
  4. Critical Components: Oracle 9i and Solaris Concepts
  5. Troubleshooting Common Oracle 9i Issues
  6. How Monitoring Was Automated in the Pre-Cloud Era
  7. Interview Questions: Oracle 9i Troubleshooting
  8. Final Summary
  9. FAQ
  10. About the Author

1. Why You Must Monitor Legacy Oracle Systems Daily

  • Single Instance Failure: No RAC meant complete outage.
  • I/O Saturation: Disk waits pushed response times from 100ms to 5+ seconds.
  • Memory Pressure: Poor SGA tuning caused constant buffer cache misses.
  • Backup Failures: Cold backups increased downtime windows.

2. Production-Ready Oracle 9i Monitoring Script

Prerequisites:
  • Solaris OS access
  • Oracle 9i environment variables set
  • OS user with DBA privileges
📋 oracle9i_health_check.sh
#!/bin/ksh # Script: Oracle 9i Health Check # Author: Chetan Yadav # Usage: Run as oracle user export ORACLE_SID=PROD export ORACLE_HOME=/u01/app/oracle/product/9.2.0 export PATH=$ORACLE_HOME/bin:$PATH sqlplus -s / as sysdba < 10000 ORDER BY time_waited DESC; EOF

3. Script Output & Analysis Explained

Component Healthy State Red Flags
Database Status OPEN MOUNTED / NOT OPEN
db file sequential read < 5 ms avg > 20 ms
log file sync < 10 ms > 50 ms

4. Critical Components: Oracle 9i and Solaris Concepts

SGA (System Global Area)

SGA sizing directly impacted performance. Memory misallocation caused excessive physical reads.

UFS File System

Solaris UFS lacked striping flexibility. Poor layout amplified I/O waits.

Redo Log Architecture

Small redo logs caused frequent log switches and stalls.

5. Troubleshooting Common Oracle 9i Issues

Issue: Database Hung During Peak Hours

Symptom: Sessions stuck on I/O waits.

Root Cause: Single disk redo logs.

Resolution:

  1. Increase redo log size
  2. Spread redo logs across spindles
Technical workflow diagram showing Oracle 9i request flow from application through Solaris kernel, buffer cache, redo logging, and disk I O processing in legacy production systems

6. How Monitoring Was Automated in the Pre-Cloud Era

Method 1: Cron Jobs

📋 cron_oracle9i_monitor.txt
*/10 * * * * /scripts/oracle9i_health_check.sh >> /logs/db_health.log

Method 2: Custom Shell Alerts

Scripts emailed DBAs directly using sendmail.

Method 3: Manual AWR-like Analysis

Statspack was the only performance tool available.

7. Interview Questions: Oracle 9i Troubleshooting

Q: How did you tune Oracle 9i without AWR?

A: Using Statspack, OS metrics, and wait event analysis.

Q: How was high availability handled?

A: Cold standby databases and manual switchovers.

Q: Biggest risk in Oracle 9i?

A: Single points of failure.

Q: How were backups managed?

A: Cold backups and early RMAN usage.

Q: Lessons for modern DBAs?

A: Fundamentals never change.

8. Final Summary

Oracle 9i on Solaris demanded discipline and deep system understanding. Every tuning decision mattered.

Modern automation simplifies life, but the fundamentals learned here still apply.

Key Takeaways:
  • Understand I/O deeply
  • Monitor continuously
  • Design for failure
  • Respect legacy systems

9. FAQ

Is Oracle 9i still supported?

A: No, it is long out of support.

Why study legacy systems?

A: To understand fundamentals.

Can lessons apply to cloud?

A: Yes, principles remain valid.

Was performance tuning harder?

A: Yes, tools were limited.

Is Solaris still used?

A: Rarely, mostly legacy estates.

10. About the Author

Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL and Cloud DBA with 14+ years of experience supporting high-traffic production environments across AWS, Azure and on-premise systems. His expertise includes Oracle RAC, ASM, Data Guard, performance tuning, HA/DR design, monitoring frameworks and real-world troubleshooting.

He trains DBAs globally through deep-dive technical content, hands-on sessions and automation workflows. His mission is to help DBAs solve real production problems and advance into high-paying remote roles worldwide.

Connect & Learn More:
📊 LinkedIn Profile
🎥 YouTube Channel

Saturday, November 29, 2025

Oracle RAC Cluster Health Audit: The 2026 Production DBA Guide

⏱️ Estimated Reading Time: 5–6 minutes


In a production Oracle Real Application Clusters (RAC) environment, stability is everything. A single corrupt OCR, a missing Voting Disk, or an unstable CRS stack can lead to node evictions (split-brain scenarios) and unplanned downtime.

This article provides a comprehensive Shell Script for RAC Cluster Health Audits. It covers CRS status, OCR integrity, Voting Disk validation, and resource stability checks—perfect for daily monitoring or pre-patching validation.


Oracle RAC Cluster Health Audit 2026 Guide for Production DBAs showing high availability database architecture and performance metrics


Table of Contents

  1. Why You Must Audit RAC Cluster Health Daily
  2. Production-Ready RAC Health Check Script (Shell)
  3. Script Output & Analysis Explained
  4. Critical Components: OCR, Voting Disk & CRS
  5. Troubleshooting Common RAC Issues
  6. How to Automate This Audit (Cron)
  7. Interview Questions: RAC Troubleshooting
  8. Final Summary
  9. FAQ
  10. About the Author

1. Why You Must Audit RAC Cluster Health Daily

Oracle RAC relies on a complex stack of clusterware services. Neglecting these checks leads to:

  • Node Evictions: Caused by heartbeat failures or voting disk I/O timeouts.
  • OCR Corruption: Resulting in the inability to start the clusterware stack.
  • Resource Regressions: Services or VIPs flapping between nodes.
  • Split-Brain Syndrome: Where nodes lose communication and fight for control.

Running a unified audit script ensures you catch "INTERMEDIATE" or "OFFLINE" states before they become outages.


2. Production-Ready RAC Health Check Script

This shell script checks the core pillars of RAC stability: CRS Stack, OCR, Voting Disks, and Resource Status.

Note: Execute this script as the grid (or root) user.

#!/bin/bash # ==================================================== # Oracle RAC Cluster Health Audit Script # Author: Chetan Yadav # Usage: ./rac_health_check.sh # ==================================================== # Set Grid Environment (Adjust ORACLE_HOME as needed) export ORACLE_HOME=/u01/app/19.0.0/grid export PATH=$ORACLE_HOME/bin:$PATH echo "==================================================" echo " ORACLE RAC CLUSTER HEALTH AUDIT - $(date) " echo "==================================================" # 1. Check High Availability Services (OHAS) echo -e "\n[1] Checking CRS/OHAS Stack Status..." crsctl check crs # 2. Check Voting Disk Status (Quorum) echo -e "\n[2] Checking Voting Disk Configuration..." crsctl query css votedisk # 3. Check OCR Integrity (Registry) echo -e "\n[3] Checking Oracle Cluster Registry (OCR) Integrity..." # Note: Requires root or grid privileges ocrcheck # 4. Check Cluster Resources (Highlighting Issues) echo -e "\n[4] Scanning for OFFLINE or UNSTABLE Resources..." crsctl stat res -t | grep -E "OFFLINE|INTERMEDIATE|UNKNOWN" # 5. Check Cluster Interconnect (Private Network) echo -e "\n[5] Checking Cluster Interconnects..." oifcfg getif echo -e "\n==================================================" echo " AUDIT COMPLETE. CHECK LOGS FOR ANY ERRORS. " echo "=================================================="

This script consolidates five manual commands into a single health report, saving valuable time during incidents or daily checks.


3. Script Output & Analysis Explained

Check ComponentWhat "Healthy" Looks Like
crsctl check crsCSS, CRS, and EVM should all show "Online". If any are offline, the node is not part of the cluster.
Voting DiskMust show "successful discovery" and list valid disk paths (e.g., ASM disk groups).
ocrcheckLook for "Cluster registry integrity check succeeded". Ensure enough free space is available.
Resource ScanAny resource in "INTERMEDIATE" state implies it is struggling to start or stop. "OFFLINE" is only okay for idle instances.

4. Critical Components: OCR, Voting Disk & CRS

Understanding these acronyms is vital for any RAC DBA:

  • OCR (Oracle Cluster Registry): Stores configuration info (resources, nodes, instances). If this is corrupt, the cluster cannot start.
  • Voting Disk: The "heartbeat" file. Nodes write to this to prove they are alive. Loss of voting disk = immediate node eviction (reboot).
  • CRS (Cluster Ready Services): The main daemon managing high availability.

5. Troubleshooting Common RAC Issues

If the script reports errors, follow this workflow:

  1. CRS Fails to Start: Check $ORACLE_HOME/log/hostname/alerthostname.log. It is often a permission issue or network failure.
  2. Voting Disk Missing: Verify ASM disk group mounting status. Run kfod disks=all to check disk visibility at OS level.
  3. Intermittent Evictions: Check network latency on the private interconnect. High latency leads to "Missed Heartbeats".



Technical diagram of 2-Node Oracle RAC Cluster Architecture verifying Private Interconnect status, Voting Disk integrity, and OCR Registry consistency during a production health audit


6. How to Automate This Audit (Cron)

You can schedule this script to run daily at 7 AM before business hours. Add this line to the Grid user's crontab:

00 07 * * * /home/grid/scripts/rac_health_check.sh > /tmp/rac_health_$(date +\%F).log 2>&1

7. Interview Questions: RAC Troubleshooting

Prepare for these common questions during senior DBA interviews:

  • Q: What is a split-brain scenario in RAC?
    A: When nodes lose private network communication and both try to write to the database. Voting disk prevents this by fencing off one node.
  • Q: How do you backup OCR?
    A: Oracle automatically backs up OCR every 4 hours. You can also manually backup using `ocrconfig -manualbackup`.
  • Q: What command checks the private interconnect IPs?
    A: `oifcfg getif`.

8. Final Summary

A healthy RAC cluster requires vigilant monitoring of the clusterware stack, not just the database instances. The script provided above is a fundamental tool for checking CRS, OCR, and Voting Disk health instantly.

Use this script as part of your Weekly Health Check routine (as suggested in the Nov 2025 schedule) to ensure 99.999% availability.


9. FAQ

Q1: Can I run this script as the 'oracle' user?
A: Most `crsctl` check commands work, but `ocrcheck` and deep diagnostics usually require `grid` or `root` privileges.

Q2: What should I do if OCR check fails?
A: Restore from the latest automatic backup using `ocrconfig -restore`. Do not restart the stack until resolved.

Q3: Does this cause performance impact?
A: No, these are lightweight metadata queries.


About the Author

Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL and Cloud DBA with 14+ years of experience supporting high-traffic production environments across AWS, Azure and on-premise systems. His expertise includes Oracle RAC, ASM, Data Guard, performance tuning, HA/DR design, monitoring frameworks and real-world troubleshooting.

He trains DBAs globally through deep-dive technical content, hands-on sessions and automation workflows. His mission is to help DBAs solve real production problems and advance into high-paying remote roles worldwide.

Explore More Technical Work

Call to Action
If you found this helpful, follow my blog and LinkedIn for deep Oracle, MySQL, and RAC content. I publish real production issues, scripts, and monitoring guides to help you level up your DBA career.



Friday, November 21, 2025

Oracle SQL Monitoring Script (DBA-Friendly, Real-World Monitoring Guide)

Estimated Reading Time: 6–7 minutes

Monitoring is the backbone of every production database. SQL-level monitoring helps DBAs detect slow queries, high load SQLs, blocked sessions, and performance issues before they become incidents.
This article provides a production-ready SQL monitoring script, explains each part of it, and shows how DBAs can integrate it into daily checks.


Oracle SQL Monitoring Script Banner – Real DBA Guide

 


 Table of Contents  

1. What Does SQL Monitoring Mean for DBAs?  

2. Production-Ready SQL Monitoring Script  

3. Script Output Explained  

4. Performance Metrics You Must Watch  

5. Real-World DBA Scenario  

6. Automation Options  

7. FAQ  

8. Final Thoughts  


 



1. What Does SQL Monitoring Mean for Oracle DBAs? 

SQL monitoring helps identify:

  • High buffer gets SQL

  • High CPU-consuming SQL

  • SQL with long execution time

  • SQL causing row locks

  • SQL creating IO bottlenecks

  • SQL generating alerts or session waits

These insights allow proactive tuning and avoid outages.


2. Production-Grade Oracle SQL Monitoring Script 

Below is a clean, ready-to-run script used by senior DBAs in real production environments:

SET LINES 200 SET PAGES 200 SELECT s.sid, s.serial#, s.username, s.status, p.spid AS os_pid, q.sql_id, q.sql_text, ROUND(s.cpu_time/1000000,2) AS cpu_sec, s.blocking_session, s.event, s.wait_class, s.last_call_et AS seconds_since_last_call FROM v$session s LEFT JOIN v$sql q ON s.sql_id = q.sql_id LEFT JOIN v$process p ON s.paddr = p.addr WHERE s.username IS NOT NULL ORDER BY s.cpu_time DESC FETCH FIRST 20 ROWS ONLY;

3. What This Script Reveals 

Each column has a purpose:

ColumnMeaning
SID / SERIAL#Unique session ID
USERNAMEWhich user is running SQL
STATUSACTIVE / INACTIVE
OS_PID (spid)OS-level process identifier
SQL_IDQuery identifier
SQL_TEXTActual SQL query
CPU_SECCPU usage of query
BLOCKING_SESSIONDetect blocking
EVENTWhat the session is waiting on
WAIT_CLASSIO, CPU, Network, etc.
LAST_CALL_ETQuery running time

This helps diagnose real production issues in minutes.

For official Oracle reference, visit the Oracle Database Documentation:

https://docs.oracle.com/en/database/



4. Add-on: Top SQL by Buffer Gets 

SELECT sql_id, buffer_gets, executions, ROUND(buffer_gets/DECODE(executions,0,1,executions)) AS gets_per_exec, sql_text FROM v$sql ORDER BY buffer_gets DESC FETCH FIRST 10 ROWS ONLY;

Use this to find high-cost SQL.


5. Add-on: Top SQL by IO 

SELECT sql_id, disk_reads, direct_writes, sql_text FROM v$sql ORDER BY disk_reads DESC FETCH FIRST 10 ROWS ONLY;

Useful for IO-heavy systems and slow storage layers.


Oracle SQL Monitoring Architecture – User Session to v$ Views Flow Diagram


6. Real-World DBA Use Case

Example scenario:

  • Application suddenly slows down

  • Sessions start waiting on "db file sequential read"

  • Your script shows high CPU session + SQL_ID

  • SQL belongs to a reporting job running full table scans

  • You tune it (add index / rewrite SQL / fix stats)

  • Performance normalizes in seconds

This is exactly what impresses recruiters and decision-makers.


7. How To Automate These Checks 

DBAs usually automate this:

  • Add SQL scripts to a shell script

  • Run at intervals using cron or OEM

  • Forward output to email / Teams / Slack

  • Integrate with CloudWatch / SignalFx

  • Store SQL history for future tuning

I can prepare your automation version in bash or PowerShell if you want.


8. Interview Questions Based on This Topic 

Expect these in interviews:

  1. How do you identify top CPU SQL?

  2. What is SQL_ID and why is it important?

  3. Difference between buffer gets and disk reads?

  4. How will you find blocking sessions?

  5. Which wait events show IO bottlenecks?


9. Final Summary

A good SQL monitoring script gives DBAs visibility into session health, CPU load, IO bottlenecks, and blocking in real time. This script is production-ready and can be used for performance tuning, health checks, or automated monitoring.

 FAQ – Oracle SQL Monitoring


Q1: What is SQL monitoring in Oracle?
SQL monitoring helps identify CPU-heavy SQLs, IO waits, blocking sessions, and real-time performance problems.


Q2: Which Oracle versions support this script?
The script works on 11g, 12c, 18c, and 19c as long as v$session and v$sql views are available.


Q3: When should DBAs run this script?
During slow application response, high CPU alerts, IO bottlenecks, blocking chains, or performance degradation.


Q4: Can this script be automated?
Yes—using cron, OEM, Python, or n8n workflows for scheduled reporting.



Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL & Cloud DBA with 14+ years of experience supporting high-traffic production environments across AWS, Azure, and on-premise systems. His core expertise includes Oracle RAC, ASM, Data Guard, performance tuning, HA/DR design, monitoring frameworks, and real-world troubleshooting.

He also trains DBAs globally through deep-dive technical content, hands-on sessions, and automation workflows using n8n, AI tools, and modern monitoring stacks. His mission is to help DBAs solve real production problems and grow into high-paying remote roles worldwide.

Chetan regularly publishes expert content across Oracle, PostgreSQL, MySQL, and Cloud DBA technologies—including performance tuning guides, DR architectures, monitoring tools, scripts, and real incident-based case studies.


Explore More Technical Work

LinkedIn (Professional Profile & Articles)
https://www.linkedin.com/in/chetanyadavvds/

YouTube – Oracle Foundations Playlist
https://www.youtube.com/playlist?list=PL5TN6ECUWGROHQGXep_5hff-2ageWTp4b

Telegram – LevelUp_Careers DBA Tips
https://t.me/LevelUp_Careers

Instagram – Oracle/Cloud Learning Reels
https://www.instagram.com/levelup_careers/

Facebook Page – OracleDBAInfo
https://www.facebook.com/OracleDBAInfo

These platforms feature guides, scripts, diagrams, troubleshooting workflows, and real-world DBA case studies designed for database professionals worldwide.