Oracle 9i on Solaris Case Study – A Deep Dive Retro
⏱️ Estimated Reading Time: 12-13 minutes
Oracle 9i on Solaris Case Study (A Deep Dive Retro)
In the early 2000s, database outages were not masked by load balancers,
auto-scaling groups, or managed cloud services.
If an Oracle database went down on Solaris,
the business felt it immediately.
There was no RAC, no ASM, no cloud snapshots.
Every failure meant manual diagnosis,
filesystem checks, kernel tuning, and long recovery windows.
Downtime translated directly into revenue loss and SLA breaches.
This case study revisits a real-world Oracle 9i on Solaris production environment,
breaking down architecture decisions, tuning practices, failures,
and the lessons modern DBAs can still apply today.
Table of Contents
Why You Must Monitor Legacy Oracle Systems Daily
Production-Ready Oracle 9i Monitoring Script
Script Output & Analysis Explained
Critical Components: Oracle 9i and Solaris Concepts
Troubleshooting Common Oracle 9i Issues
How Monitoring Was Automated in the Pre-Cloud Era
Interview Questions: Oracle 9i Troubleshooting
Final Summary
FAQ
About the Author
1. Why You Must Monitor Legacy Oracle Systems Daily
Single Instance Failure: No RAC meant complete outage.
I/O Saturation: Disk waits pushed response times from 100ms to 5+ seconds.
#!/bin/ksh
# Script: Oracle 9i Health Check
# Author: Chetan Yadav
# Usage: Run as oracle user
export ORACLE_SID=PROD
export ORACLE_HOME=/u01/app/oracle/product/9.2.0
export PATH=$ORACLE_HOME/bin:$PATH
sqlplus -s / as sysdba < 10000
ORDER BY time_waited DESC;
EOF
3. Script Output & Analysis Explained
Component
Healthy State
Red Flags
Database Status
OPEN
MOUNTED / NOT OPEN
db file sequential read
< 5 ms avg
> 20 ms
log file sync
< 10 ms
> 50 ms
4. Critical Components: Oracle 9i and Solaris Concepts
Statspack was the only performance tool available.
7. Interview Questions: Oracle 9i Troubleshooting
Q: How did you tune Oracle 9i without AWR?
A: Using Statspack, OS metrics, and wait event analysis.
Q: How was high availability handled?
A: Cold standby databases and manual switchovers.
Q: Biggest risk in Oracle 9i?
A: Single points of failure.
Q: How were backups managed?
A: Cold backups and early RMAN usage.
Q: Lessons for modern DBAs?
A: Fundamentals never change.
8. Final Summary
Oracle 9i on Solaris demanded discipline and deep system understanding.
Every tuning decision mattered.
Modern automation simplifies life,
but the fundamentals learned here still apply.
Key Takeaways:
Understand I/O deeply
Monitor continuously
Design for failure
Respect legacy systems
9. FAQ
Is Oracle 9i still supported?
A: No, it is long out of support.
Why study legacy systems?
A: To understand fundamentals.
Can lessons apply to cloud?
A: Yes, principles remain valid.
Was performance tuning harder?
A: Yes, tools were limited.
Is Solaris still used?
A: Rarely, mostly legacy estates.
10. About the Author
Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL and Cloud DBA with 14+ years of experience supporting high-traffic production environments across AWS, Azure and on-premise systems. His expertise includes Oracle RAC, ASM, Data Guard, performance tuning, HA/DR design, monitoring frameworks and real-world troubleshooting.
He trains DBAs globally through deep-dive technical content, hands-on sessions and automation workflows. His mission is to help DBAs solve real production problems and advance into high-paying remote roles worldwide.
In a production Oracle Real Application Clusters (RAC) environment, stability is everything. A single corrupt OCR, a missing Voting Disk, or an unstable CRS stack can lead to node evictions (split-brain scenarios) and unplanned downtime.
This article provides a comprehensive Shell Script for RAC Cluster Health Audits. It covers CRS status, OCR integrity, Voting Disk validation, and resource stability checks—perfect for daily monitoring or pre-patching validation.
Table of Contents
Why You Must Audit RAC Cluster Health Daily
Production-Ready RAC Health Check Script (Shell)
Script Output & Analysis Explained
Critical Components: OCR, Voting Disk & CRS
Troubleshooting Common RAC Issues
How to Automate This Audit (Cron)
Interview Questions: RAC Troubleshooting
Final Summary
FAQ
About the Author
1. Why You Must Audit RAC Cluster Health Daily
Oracle RAC relies on a complex stack of clusterware services. Neglecting these checks leads to:
Node Evictions: Caused by heartbeat failures or voting disk I/O timeouts.
OCR Corruption: Resulting in the inability to start the clusterware stack.
Resource Regressions: Services or VIPs flapping between nodes.
Split-Brain Syndrome: Where nodes lose communication and fight for control.
Running a unified audit script ensures you catch "INTERMEDIATE" or "OFFLINE" states before they become outages.
2. Production-Ready RAC Health Check Script
This shell script checks the core pillars of RAC stability: CRS Stack, OCR, Voting Disks, and Resource Status.
Note: Execute this script as the grid (or root) user.
Prepare for these common questions during senior DBA interviews:
Q: What is a split-brain scenario in RAC? A: When nodes lose private network communication and both try to write to the database. Voting disk prevents this by fencing off one node.
Q: How do you backup OCR? A: Oracle automatically backs up OCR every 4 hours. You can also manually backup using `ocrconfig -manualbackup`.
Q: What command checks the private interconnect IPs? A: `oifcfg getif`.
8. Final Summary
A healthy RAC cluster requires vigilant monitoring of the clusterware stack, not just the database instances. The script provided above is a fundamental tool for checking CRS, OCR, and Voting Disk health instantly.
Use this script as part of your Weekly Health Check routine (as suggested in the Nov 2025 schedule) to ensure 99.999% availability.
9. FAQ
Q1: Can I run this script as the 'oracle' user? A: Most `crsctl` check commands work, but `ocrcheck` and deep diagnostics usually require `grid` or `root` privileges.
Q2: What should I do if OCR check fails? A: Restore from the latest automatic backup using `ocrconfig -restore`. Do not restart the stack until resolved.
Q3: Does this cause performance impact? A: No, these are lightweight metadata queries.
About the Author
Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL and Cloud DBA with 14+ years of experience supporting high-traffic production environments across AWS, Azure and on-premise systems. His expertise includes Oracle RAC, ASM, Data Guard, performance tuning, HA/DR design, monitoring frameworks and real-world troubleshooting.
He trains DBAs globally through deep-dive technical content, hands-on sessions and automation workflows. His mission is to help DBAs solve real production problems and advance into high-paying remote roles worldwide.
Call to Action If you found this helpful, follow my blog and LinkedIn for deep Oracle, MySQL, and RAC content. I publish real production issues, scripts, and monitoring guides to help you level up your DBA career.
Slow queries are one of the biggest reasons for
performance degradation in MySQL and Aurora MySQL environments. High latency
SQL can create CPU spikes, I/O pressure, row lock waits, replication lag, and
application-level timeouts.
This article provides a production-ready MySQL Slow Query
Diagnostic Script, explains how to interpret the results, and
shows how DBAs can use this script for proactive tuning and operational
monitoring.
Table
of Contents
1.What Slow Query Diagnostics Mean for MySQL DBAs
2.Production-Ready MySQL Slow Query Diagnostic Script
3.Script Output Explained
4.Additional Performance Metrics to Watch
5.Add-On Scripts (Top by Buffer Gets, Disk Reads)
6.Real-World MySQL DBA Scenario
7.How to Automate These Checks
8.Interview Questions
9.Final Summary
10.FAQ
11.About
the Author
12.Call
to Action (CTA)
1.
What Slow Query Diagnostics Mean for MySQL DBAs
Slow queries lead to:
·High CPU utilisation
·Increased IOPS and latency
·Row lock waits and deadlocks
·Replication lag in Aurora MySQL / RDS MySQL
·Query timeout issues at the application layer
·Poor customer experience under load
MySQL’s Performance Schema provides deep
visibility into SQL patterns, allowing DBAs to identify:
·High-latency queries
·Full table scans
·Missing index patterns
·SQL causing temporary tables
·SQL responsible for heavy disk reads
·SQL generating high row examinations
Slow query diagnostics are essential for
maintaining consistent performance in production systems.
2.
Production-Ready MySQL Slow Query Diagnostic Script
This script analyses execution time, latency,
row scans and query patterns using Performance Schema:
/* MySQL Slow Query Diagnostic Script
Works on: MySQL 5.7, MySQL 8.0, Aurora MySQL
*/
SELECT
DIGEST_TEXT AS Query_Sample,
SCHEMA_NAME AS Database_Name,
COUNT_STAR AS Execution_Count,
ROUND(SUM_TIMER_WAIT/1000000000000, 4) AS Total_Time_Seconds,
ROUND((SUM_TIMER_WAIT/COUNT_STAR)/1000000000000, 6) AS Avg_Time_Per_Exec,
SUM_ROWS_EXAMINED AS Rows_Examined,
SUM_ROWS_SENT AS Rows_Sent,
FIRST_SEEN,
LAST_SEEN
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME NOTIN ('mysql','sys','performance_schema','information_schema')
ORDERBY Total_Time_Seconds DESC
LIMIT 20;
This is a field-tested script used in multiple
production environments including AWS RDS MySQL and Amazon Aurora MySQL.
3.
Script Output Explained
Column
Meaning
Query_Sample
Normalized version of SQL for pattern analysis
Database_Name
Schema on which SQL is executed
Execution_Count
How many times the SQL pattern ran
Total_Time_Seconds
Total execution time consumed
Avg_Time_Per_Exec
Average latency per execution
Rows_Examined
Total rows scanned (detects full scans)
Rows_Sent
Rows returned by the query
FIRST_SEEN / LAST_SEEN
Time window of activity
These values help DBAs identify the
highest-impact SQL patterns immediately.
4.
Additional Performance Metrics You Must Watch
During slow query investigations, always
check:
·High Rows_Examined
→ Missing index
·High Avg_Time_Per_Exec
→ Expensive joins or sorting
·High Rows_Examined
vs Rows_Sent difference → Inefficient filtering
·High Execution_Count
→ Inefficient query called repeatedly
·Repeated occurrence between FIRST_SEEN and LAST_SEEN → Ongoing issue
MySQL workload analysis becomes easy when
these metrics are evaluated together.
5.
Add-On Script: Top SQL by Buffer Gets
Useful for identifying CPU-heavy SQL:
SELECT
sql_id,
buffer_gets,
executions,
ROUND(buffer_gets/EXECUTIONS, 2) AS gets_per_exec,
sql_text
FROM performance_schema.events_statements_summary_by_digest
ORDERBY buffer_gets DESC
LIMIT 10;
6.
Add-On Script: Top SQL by Disk Reads
Identifies IO-intensive SQL patterns:
SELECT
DIGEST_TEXT,
SUM_ROWS_EXAMINED,
SUM_ROWS_SENT,
SUM_CREATED_TMP_TABLES,
SUM_CREATED_TMP_DISK_TABLES
FROM performance_schema.events_statements_summary_by_digest
ORDERBY SUM_CREATED_TMP_DISK_TABLES DESC
LIMIT 10;
These help diagnose latency issues caused by
slow storage or inefficient joins.
7.
Real-World MySQL DBA Scenario
A typical incident scenario:
1.Application complaints about slow API response
2.CloudWatch shows high read latency
3.Slow query log or Performance Schema shows a SQL digest
consuming high execution time
4.SQL performs a full table scan on a large table
5.Missing index identified on a WHERE clause or JOIN
condition
6.Index added / query refactored
7.Latency drops, performance normalises
This is the real process DBAs follow for
incident resolution.
8.
How to Automate These Checks
DBAs typically automate slow query monitoring
using:
·Linux cron + shell scripts
·Python automation with scheduling
·n8n workflows + MySQL nodes
·AWS CloudWatch + Lambda alerts for Aurora MySQL
·Grafana + Prometheus exporters
·Slack / Teams notifications for high-latency SQL
Automation ensures issues are detected before
users experience downtime.
9.
Interview Questions – Slow Query Diagnostics
Be ready for:
·How do you find top slow queries in MySQL?
·What is the advantage of Performance Schema?
·Difference between Rows_Examined and Rows_Sent?
·What creates temporary disk tables?
·How do you detect missing indexes from slow
queries?
·How do you reduce query execution time?
·How does MySQL slow query log differ from
Performance Schema?
Mentioning these scripts gives you a strong
technical advantage.
10.
Final Summary
Slow query diagnostics are essential for
maintaining high performance in MySQL, Aurora MySQL, and RDS MySQL systems. The
diagnostic script provided above offers deep visibility into SQL patterns,
latency contributors and row scan behaviour.
This script can be used for daily health
checks, tuning analysis, or fully automated monitoring workflows.
11.
FAQ – MySQL Slow Query Diagnostics
Q1:
What causes slow queries in MySQL?
Missing indexes, inefficient joins, large table scans, temporary table
creation, outdated statistics, or poor schema design.
Q2:
Does this script work in Aurora MySQL?
Yes, it works in Aurora MySQL 2.x/3.x because Performance Schema is supported.
Q3:
Should I enable slow query logs as well?
Yes, slow query logs complement Performance Schema for long-running queries.
Q4:
Can this script detect full table scans?
Yes—high Rows_Examined with low Rows_Sent is a clear indicator.
Q5:
Does this script impact performance?
No, Performance Schema summary tables are lightweight.
About the Author
Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL and Cloud DBA with 14+ years of experience supporting high-traffic production environments across AWS, Azure and on-premise systems. His expertise includes Oracle RAC, ASM, Data Guard, performance tuning, HA/DR design, monitoring frameworks and real-world troubleshooting.
He trains DBAs globally through deep-dive technical content, hands-on sessions and automation workflows using n8n, AI tools and modern monitoring stacks. His mission is to help DBAs solve real production problems and advance into high-paying remote roles worldwide.
Chetan regularly publishes expert content across Oracle, PostgreSQL, MySQL and Cloud DBA technologies—including performance tuning guides, DR architectures, monitoring tools, scripts and real incident-based case studies.
These platforms feature guides, scripts, diagrams, troubleshooting workflows and real-world DBA case studies designed for database professionals worldwide.
Call to Action
If you found this helpful, follow my blog and LinkedIn for deep Oracle, MySQL, PostgreSQL and Cloud DBA content. I publish real production issues, scripts, case studies and monitoring guides that help DBAs grow in their career.
How I Use ChatGPT and Automation to Save 3 Hours a Day as a DBA
The New Reality of Database Administration
Database environments today are more dynamic than ever. A DBA manages hybrid and multi-cloud systems across Oracle, PostgreSQL, Aurora MySQL, and other platforms.
While architecture complexity keeps growing, the number of hours in a day does not. Much of a DBA’s time still goes into manual analysis, log checks, and repetitive reporting.
To reclaim that time, I built a workflow using ChatGPT for analysis and n8n for automation. Together they now handle much of the repetitive monitoring and documentation work that used to slow me down.
Step 1: Using ChatGPT as an Analytical Assistant
I use ChatGPT as an intelligent interpreter for the technical data I already collect.
SQL and AWR Analysis
Prompt example:
Analyze this SQL execution plan. Identify expensive operations, missing indexes, and filter or join inefficiencies.
ChatGPT highlights cost-heavy steps, missing statistics, and joins that need review. I then validate insights using DBMS_XPLAN.DISPLAY_CURSOR before making any changes.
Incident Summaries and RCA Drafts
Prompt example:
Summarize the top waits and likely root causes from this AWR report in concise technical language for a status email.
This produces a clean summary that I can send to teams without spending time on formatting or rewriting.
When IO latency crosses a set threshold, the summary reads:
IO wait time on the primary database instance exceeded 60 percent. Possible cause: concurrent updates or storage contention. Review session activity and storage throughput.
Each alert is logged automatically in Google Sheets for trend analysis, so I no longer need to export or merge reports manually.
Step 3: The Measured Impact
After a few weeks, the results were visible:
Around 3 hours of manual effort are saved daily.
Faster communication through structured alerts.
Fewer repetitive RCA summaries.
More focus on architecture, tuning, and mentoring.
This combination of ChatGPT and n8n now runs quietly in the background, reducing operational overhead and improving accuracy.
Key Takeaways
Automation does not replace DBAs; it amplifies their impact.
ChatGPT brings analytical speed and structured communication.
n8n enables event-driven automation that scales without complexity.
If you’re managing complex environments, start with one task — maybe your daily health check or backup report — and automate it. Small steps quickly add up to big efficiency gains.
Final Thought
The next phase of database administration belongs to professionals who merge technical expertise with intelligent automation.
Instead of reacting to alerts, we should design systems that interpret themselves.
Start small, validate your results, and let automation do the routine work so you can focus on engineering.
Where I Share More
If you want to explore DBA automation, Oracle training, or real-world case studies, follow my work here: