Showing posts with label Patroni. Show all posts
Showing posts with label Patroni. Show all posts

Thursday, December 25, 2025

Patroni Test Lab Setup Guide

Patroni Test Lab Setup Guide – PostgreSQL HA for Production DBAs

⏱️ Estimated Reading Time: 11-12 minutes

Patroni Test Lab Setup Guide

At 1:20 AM, your primary PostgreSQL node goes down. Applications freeze, connection pools exhaust, and failover doesn’t happen. The problem is not PostgreSQL — it’s the lack of a tested HA setup.

In production, PostgreSQL without a proven failover mechanism becomes a single point of failure. Downtime leads to transaction loss, SLA breaches, and emergency firefighting during peak hours.

This guide walks you through building a Patroni-based PostgreSQL HA test lab that behaves like production — allowing you to test leader election, failover, and recovery safely before going live.

PostgreSQL high availability test lab architecture showing Patroni cluster nodes, distributed configuration store, leader election process, and client connections simulating production failover scenarios

Table of Contents

  1. Why You Must Monitor Patroni Clusters Daily
  2. Production-Ready Patroni Test Lab Setup
  3. Script Output & Analysis Explained
  4. Critical Components: Patroni Architecture Concepts
  5. Troubleshooting Common Patroni Issues
  6. How to Automate This Monitoring
  7. Interview Questions: Patroni Troubleshooting
  8. Final Summary
  9. FAQ
  10. About the Author

1. Why You Must Monitor Patroni Clusters Daily

  • Leader Election Failure: No primary available, writes blocked.
  • Replication Lag: Standby lag exceeds 5–10 seconds under load.
  • Split Brain Risk: Two primaries due to DCS inconsistency.
  • Application Impact: P99 latency spikes from 40ms to 5+ seconds.

2. Production-Ready Patroni Test Lab Setup

Lab Requirements:
  • 3 Linux VMs (Primary + 2 Replicas)
  • PostgreSQL 14 or higher
  • etcd or Consul as DCS
  • Passwordless SSH between nodes
📋 patroni.yml
scope: pg-ha-lab name: node1 restapi: listen: 0.0.0.0:8008 connect_address: 10.0.0.1:8008 etcd: host: 10.0.0.10:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 initdb: - encoding: UTF8 - data-checksums postgresql: listen: 0.0.0.0:5432 connect_address: 10.0.0.1:5432 data_dir: /var/lib/postgresql/data bin_dir: /usr/pgsql-14/bin authentication: replication: username: replicator password: repl_pass superuser: username: postgres password: pg_pass

3. Script Output & Analysis Explained

Check Component Healthy State Red Flags
Leader Status Single primary No leader / multiple leaders
Replication Lag < 1 second > 10 seconds
Failover Time < 10 seconds > 30 seconds

4. Critical Components: Patroni Architecture Concepts

Distributed Configuration Store (DCS)

DCS (etcd/Consul) stores cluster state. If DCS is unhealthy, leader election fails.

Leader Election

Patroni ensures only one writable primary. Broken fencing leads to split-brain scenarios.

Replication Slots

Prevent WAL loss but can cause disk bloat if lag grows.

5. Troubleshooting Common Patroni Issues

Issue: No Primary After Restart

Symptom: All nodes in replica mode.

Root Cause: DCS unreachable.

Resolution:

  1. Check etcd health: etcdctl endpoint health
  2. Restart Patroni service
Technical workflow diagram showing Patroni leader election, health checks, failover decision process, promotion of replica to primary, and client reconnection during PostgreSQL high availability events

6. How to Automate This Monitoring

Method 1: Cron-Based Health Check

📋 patroni_health.sh
#!/bin/bash curl -s http://localhost:8008/health | jq .

Method 2: Cloud Monitoring

Export Patroni metrics to Prometheus or CloudWatch.

Method 3: Third-Party Tools

Use Grafana dashboards for replication and failover visibility.

7. Interview Questions: Patroni Troubleshooting

Q: How does Patroni prevent split brain?

A: By using a distributed configuration store and strict leader locks.

Q: What happens if DCS is down?

A: Patroni freezes leader changes to avoid data corruption.

Q: How do you test failover?

A: Stop PostgreSQL on primary and observe leader promotion.

Q: Can Patroni work with RDS?

A: No. Patroni requires OS-level PostgreSQL access.

Q: How do you monitor Patroni?

A: REST API, Prometheus exporters, and logs.

8. Final Summary

A Patroni test lab is mandatory before production rollout. It exposes real-world failure modes safely.

With proper monitoring and automation, Patroni delivers predictable PostgreSQL high availability.

Key Takeaways:
  • Always test failover
  • Monitor DCS health
  • Track replication lag
  • Automate health checks

9. FAQ

Does Patroni impact performance?

A: Minimal overhead, mostly control-plane traffic.

Is Patroni production-ready?

A: Yes, widely used at scale.

Can it run on Kubernetes?

A: Yes, with StatefulSets.

Common mistakes?

A: Weak fencing and no DCS monitoring.

Is it better than repmgr?

A: Patroni is more automation-focused.

10. About the Author

Chetan Yadav is a Senior Oracle, PostgreSQL, MySQL and Cloud DBA with 14+ years of experience supporting high-traffic production environments across AWS, Azure and on-premise systems. His expertise includes Oracle RAC, ASM, Data Guard, performance tuning, HA/DR design, monitoring frameworks and real-world troubleshooting.

He trains DBAs globally through deep-dive technical content, hands-on sessions and automation workflows. His mission is to help DBAs solve real production problems and advance into high-paying remote roles worldwide.

Connect & Learn More:
📊 LinkedIn Profile
🎥 YouTube Channel


x