Multi-Cloud Database Architecture: Oracle, AWS, Azure, GCP Reference Guide
Six months ago, our CTO announced we were moving to a "multi-cloud strategy." The board wanted vendor independence. Engineering wanted best-of-breed services. Finance wanted competitive pricing.
I was the DBA who had to make it actually work.
Designing multi-cloud database architecture isn't about running the same database everywhere. It's about knowing which workload belongs on which platform, how to synchronize data across clouds, and when vendor lock-in is actually acceptable.
This guide covers reference architectures from production systems running across Oracle on-premises, AWS RDS, Azure SQL Managed Instance, and GCP AlloyDB. If you're a DBA planning or managing multi-cloud databases, these are the patterns that work.
- Why Multi-Cloud? Business Drivers and Technical Realities
- Cloud Provider Database Services Comparison Matrix
- Reference Architecture #1: Hybrid Oracle + Cloud Read Replicas
- Reference Architecture #2: Multi-Region Active-Active (AWS + Azure)
- Reference Architecture #3: Data Lake Federation Across All Clouds
- Data Synchronization Patterns: CDC, ETL, and Replication
- Network Architecture: VPN, Peering, and Private Connectivity
- Disaster Recovery: Cross-Cloud Backup and Failover
- FAQ
- Related Reading from Real Production Systems
1. Why Multi-Cloud? Business Drivers and Technical Realities
Let's be honest: multi-cloud is complex and expensive. You don't choose it for technical elegance. You choose it because business or regulatory requirements force your hand.
Valid Reasons for Multi-Cloud Database Architecture
| Business Driver | Why It Matters | Example Scenario |
|---|---|---|
| Vendor Independence | Avoid single-vendor lock-in | Board mandate after AWS price increase |
| Best-of-Breed Services | Use best service per workload | AWS for ML, Azure for .NET apps, GCP for analytics |
| M&A Integration | Acquired company on different cloud | Parent on AWS, acquisition on Azure |
| Regulatory Compliance | Data residency requirements | EU data must stay in Azure Europe regions |
| Disaster Recovery | True independence in DR | Production AWS, DR in GCP (different blast radius) |
Invalid Reasons (That Lead to Regret)
- ❌ "We want to negotiate better pricing" - Multi-cloud costs more than single-cloud discounts
- ❌ "We'll move workloads to cheapest provider monthly" - Migration costs exceed savings
- ❌ "We want developers to choose their favorite cloud" - Operational nightmare
- ❌ "Cloud-native means multi-cloud" - No, it means avoiding vendor-specific services
Reality check from production: Multi-cloud increases operational complexity by 3-4x. You need expertise in multiple cloud platforms, monitoring tools, billing systems, and security models. Budget accordingly.
2. Cloud Provider Database Services Comparison Matrix
Each cloud provider offers different database services. Here's what matters for Oracle DBAs planning multi-cloud architecture.
| Feature | AWS RDS/Aurora | Azure SQL/Managed Instance | GCP AlloyDB/Cloud SQL | Oracle Cloud (OCI) |
|---|---|---|---|---|
| PostgreSQL-compatible | RDS PostgreSQL, Aurora | Azure Database for PostgreSQL | AlloyDB, Cloud SQL | — |
| Oracle-compatible | RDS Oracle (limited) | — | — | Full Oracle (RAC, Exadata) |
| SQL Server | RDS SQL Server | Azure SQL, Managed Instance | Cloud SQL SQL Server | — |
| MySQL | RDS MySQL, Aurora MySQL | Azure Database for MySQL | Cloud SQL MySQL | MySQL HeatWave |
| Max Database Size | 64TB (Aurora), 32TB (RDS) | 16TB (Managed Instance) | Unlimited (AlloyDB) | Unlimited (Exadata) |
| Read Replicas | Up to 15 (Aurora) | Built-in (Business Critical) | Built-in (AlloyDB) | Active Data Guard |
| Backup Retention | 35 days max | 35 days max | 365 days | 60 days (automated) |
Which Cloud for Which Workload?
AWS (RDS/Aurora) - Best for:
- ✅ High-scale OLTP (Aurora's distributed architecture)
- ✅ Serverless databases (Aurora Serverless)
- ✅ Mature ecosystem with most third-party tool support
Azure (SQL Managed Instance) - Best for:
- ✅ SQL Server workloads (native integration)
- ✅ .NET applications (lowest latency to Azure App Service)
- ✅ Hybrid cloud (strong on-prem to Azure connectivity)
GCP (AlloyDB) - Best for:
- ✅ Analytical queries (columnar engine built-in)
- ✅ Machine learning integration (BigQuery, Vertex AI)
- ✅ Cost optimization (often 30-40% cheaper than AWS)
Oracle Cloud (OCI) - Best for:
- ✅ Oracle Database with RAC/Exadata features
- ✅ Bring-your-own-license (BYOL) scenarios
- ✅ Legacy Oracle applications requiring specific Oracle versions
3. Reference Architecture #1: Hybrid Oracle + Cloud Read Replicas
This is the most common multi-cloud starting point: Keep Oracle on-premises as source of truth, distribute read workloads to cloud providers.
📐 Architecture Diagram: Hybrid Oracle with Multi-Cloud Read Replicas
┌─────────────────────────────────────────────────────────────────┐
│ ON-PREMISES DATA CENTER │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Oracle 19c RAC (Primary) │ │
│ │ - OLTP transactions (writes) │ │
│ │ - GoldenGate Change Data Capture │ │
│ │ - 24/7 operations │ │
│ └───────────────────┬───────────────────────────────────────┘ │
└────────────────────┼────────────────────────────────────────────┘
│
┌──────────┼──────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ AWS │ │ AZURE │ │ GCP │
└─────────┘ └─────────┘ └─────────┘
│ │ │
┌─────▼────┐ ┌──▼──────┐ ┌─▼────────┐
│ RDS │ │ SQL MI │ │ AlloyDB │
│ PostgreSQL│ │ Read │ │ Read │
│ Read │ │ Replica │ │ Replica │
│ Replica │ └─────────┘ └──────────┘
└──────────┘
│ │ │
┌────▼───┐ ┌───▼────┐ ┌───▼────┐
│ Lambda │ │ Azure │ │ Cloud │
│ Apps │ │ Func. │ │ Run │
└────────┘ └────────┘ └────────┘
USE CASES:
• AWS: Real-time analytics dashboards
• Azure: .NET reporting applications
• GCP: Machine learning model training
Implementation Using Oracle GoldenGate
Pros and Cons of This Architecture
Advantages:
- ✅ Oracle remains authoritative source (familiar to DBAs)
- ✅ Read workloads offloaded to cheaper cloud providers
- ✅ Each cloud optimized for its specific workload
- ✅ Gradual migration path (can phase Oracle out later)
Disadvantages:
- ❌ GoldenGate licensing costs (Oracle premium feature)
- ❌ Replication lag (typically 5-30 seconds)
- ❌ Schema changes require coordination across all replicas
- ❌ Still dependent on Oracle infrastructure availability
4. Reference Architecture #2: Multi-Region Active-Active (AWS + Azure)
For true high availability and disaster recovery, deploy active-active databases across different cloud providers. Each region serves production traffic.
📐 Architecture Diagram: Multi-Cloud Active-Active
┌────────────────────────────────────────────────────────────────┐
│ GLOBAL LOAD BALANCER │
│ (AWS Route 53 / Azure Traffic Manager) │
│ Geographic routing + health checks │
└─────────┬──────────────────────────────────┬──────────────────┘
│ │
┌─────▼─────┐ ┌─────▼─────┐
│ AWS │ │ AZURE │
│ us-east-1│◄────Bidirectional────►│ eastus │
└───────────┘ Replication └───────────┘
│ │
┌─────▼─────────┐ ┌────────▼──────────┐
│ Aurora │ │ SQL Managed │
│ PostgreSQL │ │ Instance │
│ - Primary │◄──Conflict──►│ - Primary │
│ - Read replicas│ Resolution │ - Read replicas │
└───────────────┘ └───────────────────┘
│ │
┌─────▼─────┐ ┌─────▼─────┐
│ App │ │ App │
│ Servers │ │ Servers │
│ (EC2) │ │ (VMs) │
└───────────┘ └───────────┘
TRAFFIC ROUTING:
• North America users → AWS us-east-1
• Europe users → Azure westeurope
• Failover: automatic (30-60 second RTO)
Bidirectional Replication Configuration
Critical Considerations for Active-Active
Conflict resolution is inevitable:
- Same row updated in both AWS and Azure simultaneously
- Strategies: Last-write-wins, first-write-wins, custom business logic
- Monitor conflicts daily (they indicate design problems)
Network costs are significant:
- Cross-cloud data transfer: $0.08-0.12 per GB
- Example: 10TB/month replication = $800-1200/month just in bandwidth
- Mitigation: Replicate only critical tables, not entire database
5. Reference Architecture #3: Data Lake Federation Across All Clouds
For analytical workloads, federation allows querying data across all clouds without moving it. Each cloud's data warehouse queries other clouds via connectors.
📐 Architecture Diagram: Federated Data Lake
┌────────────────────────────────────────────────────────────────┐
│ UNIFIED QUERY INTERFACE (Trino/Presto) │
│ Single SQL query across all clouds │
└────┬─────────────────────┬─────────────────────┬──────────────┘
│ │ │
┌────▼────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ AWS │ │ AZURE │ │ GCP │
│ Redshift│ │ Synapse │ │ BigQuery │
└────┬────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
┌────▼────────┐ ┌─────▼─────────┐ ┌─────▼──────────┐
│ S3 Data Lake│ │ ADLS Gen2 │ │ Cloud Storage │
│ - Parquet │ │ - Delta Lake │ │ - Avro files │
│ - 200TB │ │ - 150TB │ │ - 180TB │
└─────────────┘ └───────────────┘ └────────────────┘
QUERY EXAMPLE:
SELECT
aws.customer_id,
azure.order_total,
gcp.ml_prediction_score
FROM aws.customers aws
JOIN azure.orders azure ON aws.id = azure.customer_id
JOIN gcp.predictions gcp ON aws.id = gcp.customer_id
WHERE aws.region = 'US';
Implementation with Trino (Formerly Presto)
Performance Considerations
Latency matters:
- Cross-cloud queries 3-10x slower than single-cloud
- Network latency AWS↔Azure: 10-30ms, AWS↔GCP: 15-40ms
- Best practice: Move small tables, federate large tables
6. Data Synchronization Patterns: CDC, ETL, and Replication
Moving data between clouds requires choosing the right synchronization pattern based on latency requirements, data volume, and consistency needs.
| Pattern | Latency | Use Case | Tools |
|---|---|---|---|
| Change Data Capture (CDC) | Near real-time (5-60s) | Transactional sync | GoldenGate, Debezium, AWS DMS |
| Batch ETL | Hours to daily | Analytics, reporting | Informatica, Talend, Airflow |
| Streaming | Seconds | Event-driven systems | Kafka, Kinesis, Pub/Sub |
| Database Replication | Near real-time (1-30s) | HA, DR, read scaling | Native replication, pglogical |
CDC Pattern with Debezium (Open Source)
7. Network Architecture: VPN, Peering, and Private Connectivity
Multi-cloud requires secure, low-latency network connectivity between clouds. Public internet is not acceptable for production database traffic.
Connectivity Options Comparison
| Option | Latency | Cost | Setup Complexity | Best For |
|---|---|---|---|---|
| Site-to-Site VPN | 15-50ms | Low ($50-200/mo) | Low | Dev/Test, low-volume |
| AWS Direct Connect | 5-15ms | High ($300-2000/mo) | High | Production, high-volume |
| Azure ExpressRoute | 5-15ms | High ($300-2000/mo) | High | Production, high-volume |
| GCP Cloud Interconnect | 5-15ms | High ($300-2000/mo) | High | Production, high-volume |
| Multi-Cloud Backbone (Equinix) | 2-10ms | Very High ($2000-10k/mo) | Very High | Enterprise multi-cloud |
AWS to Azure Private Connectivity
8. Disaster Recovery: Cross-Cloud Backup and Failover
The ultimate test of multi-cloud architecture: can you survive a complete regional outage in one cloud provider?
DR Strategy Matrix
| Strategy | RTO | RPO | Cost | Complexity |
|---|---|---|---|---|
| Backup/Restore (Cross-Cloud) | 4-8 hours | 24 hours | Low | Low |
| Warm Standby (Different Cloud) | 30-60 min | 5-15 min | Medium | Medium |
| Active-Active (Multi-Cloud) | 30-60 sec | 0-1 min | High | High |
| Pilot Light (Different Cloud) | 1-2 hours | 15-30 min | Low-Medium | Medium |
No comments:
Post a Comment