Senior Oracle & Cloud DBA Real-World Databases • Cloud • Reliability • Careers LevelUp Careers Initiative
Monday, June 8, 2026
Oracle High CPU Usage: Causes and Fix in 19c
Monday, June 1, 2026
How to Read Oracle AWR Report in 19c: DBA Guide
How to Read an Oracle AWR Report in 19c
A practical reading order from real production incidents, not a feature tour.
An AWR report is a snapshot of where your instance spent its time. Reading it in the right order is half the battle.
02:14. The on-call page hit: checkout API p95 had jumped from 180 ms to 4.2 seconds. No errors. No node eviction. No failover. Just a database that had quietly gone slow under a normal load. The first artifact I pulled was a one-hour AWR report, and within four minutes it pointed straight at the cause.
If you have ever stared at a 30-page AWR report and not known where to look first, this guide is for you. Knowing how to read an Oracle AWR report in 19c is not about understanding every section. It is about reading a handful of sections in the right order so you can go from "the database is slow" to "this SQL on this object is the problem" in minutes. That is exactly what I did at 02:14, and it is the workflow I will walk you through here.
AWR (Automatic Workload Repository) takes regular snapshots of performance statistics and stores them in the SYSAUX tablespace. A report compares two snapshots and shows you the delta: what the instance did, where it waited, and which statements drove the load. The trick is to stop reading top to bottom and start reading by importance.
Junior and mid-level DBAs who can generate an AWR report but freeze when it comes to interpreting it, and senior engineers who want a tighter triage checklist for incidents. Examples use Oracle 19c, but the reading order applies to 12c and 18c as well.
Monday, March 2, 2026
Oracle Performance Engineering Guide: AWR, ASH and SQL Monitor in 19c and 23ai
Oracle Performance Engineering Guide: AWR, ASH and SQL Monitor in 19c and 23ai
Master the essential tools for diagnosing and resolving real-world performance issues
Production Environment Context
Oracle Database 19.18.21 with Oracle Grid Infrastructure 19.18 | 3-Node RAC Cluster | 8.5 TB OLTP Database | 3,200+ concurrent sessions | Peak TPS: 2,100 | 24/7 mission-critical
It's 3 AM on a Tuesday. The monitoring dashboard lights up red. Response times have jumped from 200ms to 5+ seconds. Users are reporting timeouts on critical batch jobs. Your manager's Slack message is already waiting: "Database issue?" You log into the database, check CPU utilization (45%), memory (78% used), disk I/O latency (120ms). Everything looks elevated but not catastrophically bad. Where do you even start investigating?
This is where AWR (Automatic Workload Repository), ASH (Active Session History), and SQL Monitor become your diagnostic lifeline. Over the past 15+ years managing large-scale Oracle databases in production, I have debugged thousands of performance incidents—from runaway SQL queries consuming 800GB of I/O in 10 minutes, to massive lock contention blocking 400+ sessions, to redo log I/O stalls freezing the entire database. These three tools have never failed me. They transform performance troubleshooting from educated guessing into data-driven root cause analysis.
In this guide, you will learn the exact methodology I use in production: how to leverage AWR, ASH, and SQL Monitor to identify root causes in under 15 minutes, pinpoint the exact problematic SQL statements, analyze their execution plans, and implement fixes. Whether you're managing a 3-node RAC cluster running Oracle 19c or a cloud-native 23ai environment, the diagnostic principles remain constant.
Let's dig into real production scenarios and techniques.
Monday, February 9, 2026
GCP AlloyDB Performance Tuning Guide: Oracle 19c Comparison for DBAs (2026)
GCP AlloyDB Performance Tuning Guide: Oracle 19c DBAs' Complete Migration Handbook
Three months after our team moved a critical Oracle 19c OLTP system to GCP AlloyDB, the CFO asked in the quarterly review, "Are we seeing the 4x performance improvement Google promised?"
I had the AWR reports from Oracle and the performance metrics from AlloyDB. Same queries, same data volume, same transaction load. The answer was more nuanced than a simple yes or no.
Some queries ran 6x faster in AlloyDB. Others ran slower. The difference wasn't the database—it was understanding which PostgreSQL-based optimizations AlloyDB uses and how they compare to Oracle's approach.
This guide is what I learned comparing Oracle 19c performance tuning to GCP AlloyDB in production. If you're an Oracle DBA evaluating AlloyDB, this covers the performance differences that actually matter.
Monday, January 5, 2026
Oracle 9i on Solaris: A Production Case Study from the Pre-Cloud Era
⏱️ Estimated Reading Time: 12-13 minutes
Oracle 9i on Solaris Case Study (A Deep Dive Retro)
In the early 2000s, database outages were not masked by load balancers, auto-scaling groups, or managed cloud services. If an Oracle database went down on Solaris, the business felt it immediately.
There was no RAC, no ASM, no cloud snapshots. Every failure meant manual diagnosis, filesystem checks, kernel tuning, and long recovery windows. Downtime translated directly into revenue loss and SLA breaches.
This case study revisits a real-world Oracle 9i on Solaris production environment, breaking down architecture decisions, tuning practices, failures, and the lessons modern DBAs can still apply today.
Saturday, November 29, 2025
Oracle RAC Cluster Health Audit: The 2026 Production DBA Guide
This article provides a comprehensive Shell Script for RAC Cluster Health Audits. It covers CRS status, OCR integrity, Voting Disk validation, and resource stability checks—perfect for daily monitoring or pre-patching validation.
Table of Contents
- Why You Must Audit RAC Cluster Health Daily
- Production-Ready RAC Health Check Script (Shell)
- Script Output & Analysis Explained
- Critical Components: OCR, Voting Disk & CRS
- Troubleshooting Common RAC Issues
- How to Automate This Audit (Cron)
- Interview Questions: RAC Troubleshooting
- Final Summary
- FAQ
- About the Author
1. Why You Must Audit RAC Cluster Health Daily
Oracle RAC relies on a complex stack of clusterware services. Neglecting these checks leads to:
- Node Evictions: Caused by heartbeat failures or voting disk I/O timeouts.
- OCR Corruption: Resulting in the inability to start the clusterware stack.
- Resource Regressions: Services or VIPs flapping between nodes.
- Split-Brain Syndrome: Where nodes lose communication and fight for control.
Running a unified audit script ensures you catch "INTERMEDIATE" or "OFFLINE" states before they become outages.
2. Production-Ready RAC Health Check Script
This shell script checks the core pillars of RAC stability: CRS Stack, OCR, Voting Disks, and Resource Status.
Note: Execute this script as the grid (or root) user.
#!/bin/bash # ==================================================== # Oracle RAC Cluster Health Audit Script # Author: Chetan Yadav # Usage: ./rac_health_check.sh # ==================================================== # Set Grid Environment (Adjust ORACLE_HOME as needed) export ORACLE_HOME=/u01/app/19.0.0/grid export PATH=$ORACLE_HOME/bin:$PATH echo "==================================================" echo " ORACLE RAC CLUSTER HEALTH AUDIT - $(date) " echo "==================================================" # 1. Check High Availability Services (OHAS) echo -e "\n[1] Checking CRS/OHAS Stack Status..." crsctl check crs # 2. Check Voting Disk Status (Quorum) echo -e "\n[2] Checking Voting Disk Configuration..." crsctl query css votedisk # 3. Check OCR Integrity (Registry) echo -e "\n[3] Checking Oracle Cluster Registry (OCR) Integrity..." # Note: Requires root or grid privileges ocrcheck # 4. Check Cluster Resources (Highlighting Issues) echo -e "\n[4] Scanning for OFFLINE or UNSTABLE Resources..." crsctl stat res -t | grep -E "OFFLINE|INTERMEDIATE|UNKNOWN" # 5. Check Cluster Interconnect (Private Network) echo -e "\n[5] Checking Cluster Interconnects..." oifcfg getif echo -e "\n==================================================" echo " AUDIT COMPLETE. CHECK LOGS FOR ANY ERRORS. " echo "=================================================="This script consolidates five manual commands into a single health report, saving valuable time during incidents or daily checks.
3. Script Output & Analysis Explained
| Check Component | What "Healthy" Looks Like |
|---|---|
| crsctl check crs | CSS, CRS, and EVM should all show "Online". If any are offline, the node is not part of the cluster. |
| Voting Disk | Must show "successful discovery" and list valid disk paths (e.g., ASM disk groups). |
| ocrcheck | Look for "Cluster registry integrity check succeeded". Ensure enough free space is available. |
| Resource Scan | Any resource in "INTERMEDIATE" state implies it is struggling to start or stop. "OFFLINE" is only okay for idle instances. |
4. Critical Components: OCR, Voting Disk & CRS
Understanding these acronyms is vital for any RAC DBA:
- OCR (Oracle Cluster Registry): Stores configuration info (resources, nodes, instances). If this is corrupt, the cluster cannot start.
- Voting Disk: The "heartbeat" file. Nodes write to this to prove they are alive. Loss of voting disk = immediate node eviction (reboot).
- CRS (Cluster Ready Services): The main daemon managing high availability.
5. Troubleshooting Common RAC Issues
If the script reports errors, follow this workflow:
- CRS Fails to Start: Check
$ORACLE_HOME/log/hostname/alerthostname.log. It is often a permission issue or network failure. - Voting Disk Missing: Verify ASM disk group mounting status. Run
kfod disks=allto check disk visibility at OS level. - Intermittent Evictions: Check network latency on the private interconnect. High latency leads to "Missed Heartbeats".
6. How to Automate This Audit (Cron)
You can schedule this script to run daily at 7 AM before business hours. Add this line to the Grid user's crontab:
00 07 * * * /home/grid/scripts/rac_health_check.sh > /tmp/rac_health_$(date +\%F).log 2>&17. Interview Questions: RAC Troubleshooting
Prepare for these common questions during senior DBA interviews:
- Q: What is a split-brain scenario in RAC?
A: When nodes lose private network communication and both try to write to the database. Voting disk prevents this by fencing off one node. - Q: How do you backup OCR?
A: Oracle automatically backs up OCR every 4 hours. You can also manually backup using `ocrconfig -manualbackup`. - Q: What command checks the private interconnect IPs?
A: `oifcfg getif`.
8. Final Summary
A healthy RAC cluster requires vigilant monitoring of the clusterware stack, not just the database instances. The script provided above is a fundamental tool for checking CRS, OCR, and Voting Disk health instantly.
Use this script as part of your Weekly Health Check routine (as suggested in the Nov 2025 schedule) to ensure 99.999% availability.
9. FAQ
Q1: Can I run this script as the 'oracle' user?
A: Most `crsctl` check commands work, but `ocrcheck` and deep diagnostics usually require `grid` or `root` privileges.
Q2: What should I do if OCR check fails?
A: Restore from the latest automatic backup using `ocrconfig -restore`. Do not restart the stack until resolved.
Q3: Does this cause performance impact?
A: No, these are lightweight metadata queries.
About the Author
Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL and Cloud DBA with 14+ years of experience supporting high-traffic production environments across AWS, Azure and on-premise systems. His expertise includes Oracle RAC, ASM, Data Guard, performance tuning, HA/DR design, monitoring frameworks and real-world troubleshooting.
He trains DBAs globally through deep-dive technical content, hands-on sessions and automation workflows. His mission is to help DBAs solve real production problems and advance into high-paying remote roles worldwide.
Explore More Technical Work
- LinkedIn: Profile & Articles
- YouTube: Oracle Foundations Playlist
- Telegram: LevelUp_Careers DBA Tips
Call to Action
If you found this helpful, follow my blog and LinkedIn for deep Oracle, MySQL, and RAC content. I publish real production issues, scripts, and monitoring guides to help you level up your DBA career.
Monday, November 24, 2025
MySQL Slow Query Diagnostic Script (2025): A Complete Production DBA Guide
Estimated
Reading Time: 6–7 minutes
Slow queries are one of the biggest reasons for
performance degradation in MySQL and Aurora MySQL environments. High latency
SQL can create CPU spikes, I/O pressure, row lock waits, replication lag, and
application-level timeouts.
This article provides a production-ready MySQL Slow Query
Diagnostic Script, explains how to interpret the results, and
shows how DBAs can use this script for proactive tuning and operational
monitoring.
Table
of Contents
1.
What Slow Query Diagnostics Mean for MySQL DBAs
2.
Production-Ready MySQL Slow Query Diagnostic Script
3.
Script Output Explained
4.
Additional Performance Metrics to Watch
5.
Add-On Scripts (Top by Buffer Gets, Disk Reads)
6.
Real-World MySQL DBA Scenario
7.
How to Automate These Checks
8.
Interview Questions
9.
Final Summary
10. FAQ
11. About
the Author
12. Call
to Action (CTA)
1.
What Slow Query Diagnostics Mean for MySQL DBAs
Slow queries lead to:
·
High CPU utilisation
·
Increased IOPS and latency
·
Row lock waits and deadlocks
·
Replication lag in Aurora MySQL / RDS MySQL
·
Query timeout issues at the application layer
·
Poor customer experience under load
MySQL’s Performance Schema provides deep
visibility into SQL patterns, allowing DBAs to identify:
·
High-latency queries
·
Full table scans
·
Missing index patterns
·
SQL causing temporary tables
·
SQL responsible for heavy disk reads
·
SQL generating high row examinations
Slow query diagnostics are essential for
maintaining consistent performance in production systems.
2.
Production-Ready MySQL Slow Query Diagnostic Script
This script analyses execution time, latency,
row scans and query patterns using Performance Schema:
/* MySQL Slow Query Diagnostic Script Works on: MySQL 5.7, MySQL 8.0, Aurora MySQL*/ SELECT DIGEST_TEXT AS Query_Sample, SCHEMA_NAME AS Database_Name, COUNT_STAR AS Execution_Count, ROUND(SUM_TIMER_WAIT/1000000000000, 4) AS Total_Time_Seconds, ROUND((SUM_TIMER_WAIT/COUNT_STAR)/1000000000000, 6) AS Avg_Time_Per_Exec, SUM_ROWS_EXAMINED AS Rows_Examined, SUM_ROWS_SENT AS Rows_Sent, FIRST_SEEN, LAST_SEENFROM performance_schema.events_statements_summary_by_digestWHERE SCHEMA_NAME NOT IN ('mysql','sys','performance_schema','information_schema')ORDER BY Total_Time_Seconds DESCLIMIT 20;
This is a field-tested script used in multiple
production environments including AWS RDS MySQL and Amazon Aurora MySQL.
3.
Script Output Explained
|
Column |
Meaning |
|
Query_Sample |
Normalized version of SQL for pattern analysis |
|
Database_Name |
Schema on which SQL is executed |
|
Execution_Count |
How many times the SQL pattern ran |
|
Total_Time_Seconds |
Total execution time consumed |
|
Avg_Time_Per_Exec |
Average latency per execution |
|
Rows_Examined |
Total rows scanned (detects full scans) |
|
Rows_Sent |
Rows returned by the query |
|
FIRST_SEEN / LAST_SEEN |
Time window of activity |
These values help DBAs identify the
highest-impact SQL patterns immediately.
4.
Additional Performance Metrics You Must Watch
During slow query investigations, always
check:
·
High Rows_Examined
→ Missing index
·
High Avg_Time_Per_Exec
→ Expensive joins or sorting
·
High Rows_Examined
vs Rows_Sent difference → Inefficient filtering
·
High Execution_Count
→ Inefficient query called repeatedly
·
Repeated occurrence between FIRST_SEEN and LAST_SEEN → Ongoing issue
MySQL workload analysis becomes easy when
these metrics are evaluated together.
5.
Add-On Script: Top SQL by Buffer Gets
Useful for identifying CPU-heavy SQL:
SELECT sql_id, buffer_gets, executions, ROUND(buffer_gets/EXECUTIONS, 2) AS gets_per_exec, sql_textFROM performance_schema.events_statements_summary_by_digestORDER BY buffer_gets DESCLIMIT 10;
Identifies IO-intensive SQL patterns:
SELECT DIGEST_TEXT, SUM_ROWS_EXAMINED, SUM_ROWS_SENT, SUM_CREATED_TMP_TABLES, SUM_CREATED_TMP_DISK_TABLESFROM performance_schema.events_statements_summary_by_digestORDER BY SUM_CREATED_TMP_DISK_TABLES DESCLIMIT 10;
These help diagnose latency issues caused by
slow storage or inefficient joins.
7.
Real-World MySQL DBA Scenario
A typical incident scenario:
1.
Application complaints about slow API response
2.
CloudWatch shows high read latency
3.
Slow query log or Performance Schema shows a SQL digest
consuming high execution time
4.
SQL performs a full table scan on a large table
5.
Missing index identified on a WHERE clause or JOIN
condition
6.
Index added / query refactored
7.
Latency drops, performance normalises
This is the real process DBAs follow for
incident resolution.
8.
How to Automate These Checks
DBAs typically automate slow query monitoring
using:
·
Linux cron + shell scripts
·
Python automation with scheduling
·
n8n workflows + MySQL nodes
·
AWS CloudWatch + Lambda alerts for Aurora MySQL
·
Grafana + Prometheus exporters
·
Slack / Teams notifications for high-latency SQL
Automation ensures issues are detected before
users experience downtime.
9.
Interview Questions – Slow Query Diagnostics
Be ready for:
·
How do you find top slow queries in MySQL?
·
What is the advantage of Performance Schema?
·
Difference between Rows_Examined and Rows_Sent?
·
What creates temporary disk tables?
·
How do you detect missing indexes from slow
queries?
·
How do you reduce query execution time?
·
How does MySQL slow query log differ from
Performance Schema?
Mentioning these scripts gives you a strong
technical advantage.
10.
Final Summary
Slow query diagnostics are essential for
maintaining high performance in MySQL, Aurora MySQL, and RDS MySQL systems. The
diagnostic script provided above offers deep visibility into SQL patterns,
latency contributors and row scan behaviour.
This script can be used for daily health
checks, tuning analysis, or fully automated monitoring workflows.
11.
FAQ – MySQL Slow Query Diagnostics
Q1:
What causes slow queries in MySQL?
Missing indexes, inefficient joins, large table scans, temporary table
creation, outdated statistics, or poor schema design.
Q2:
Does this script work in Aurora MySQL?
Yes, it works in Aurora MySQL 2.x/3.x because Performance Schema is supported.
Q3:
Should I enable slow query logs as well?
Yes, slow query logs complement Performance Schema for long-running queries.
Q4:
Can this script detect full table scans?
Yes—high Rows_Examined with low Rows_Sent is a clear indicator.
Q5:
Does this script impact performance?
No, Performance Schema summary tables are lightweight.
Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL and Cloud DBA with 14+ years of experience supporting high-traffic production environments across AWS, Azure and on-premise systems. His expertise includes Oracle RAC, ASM, Data Guard, performance tuning, HA/DR design, monitoring frameworks and real-world troubleshooting.
He trains DBAs globally through deep-dive technical content, hands-on sessions and automation workflows using n8n, AI tools and modern monitoring stacks. His mission is to help DBAs solve real production problems and advance into high-paying remote roles worldwide.
Chetan regularly publishes expert content across Oracle, PostgreSQL, MySQL and Cloud DBA technologies—including performance tuning guides, DR architectures, monitoring tools, scripts and real incident-based case studies.
Explore More Technical Work
LinkedIn (Professional Profile & Articles)
https://www.linkedin.com/in/chetanyadavvds/
YouTube – Oracle Foundations Playlist
https://www.youtube.com/playlist?list=PL5TN6ECUWGROHQGXep_5hff-2ageWTp4b
Telegram – LevelUp_Careers DBA Tips
https://t.me/LevelUp_Careers
Instagram – Oracle/Cloud Learning Reels
https://www.instagram.com/levelup_careers/
Facebook Page – OracleDBAInfo
https://www.facebook.com/OracleDBAInfo
These platforms feature guides, scripts, diagrams, troubleshooting workflows and real-world DBA case studies designed for database professionals worldwide.
Call to Action
If you found this helpful, follow my blog and LinkedIn for deep Oracle, MySQL, PostgreSQL and Cloud DBA content. I publish real production issues, scripts, case studies and monitoring guides that help DBAs grow in their career.
.png)
.jpg)

