Multi-Cloud AI Strategy: Running ML Workloads Across AWS, Azure, and GCP

How to architect AI systems across AWS Bedrock, Azure OpenAI, and Vertex AI. Real egress costs, identity federation, and production patterns from enterprise.

Multi-Cloud AI Strategy: Running ML Workloads Across AWS, Azure, and GCP

Most enterprises do not choose multi-cloud. Multi-cloud chooses them. You acquire a company running on Azure. Your core platform lives on AWS. Your data science team fell in love with BigQuery three years ago. Your CISO mandated that no single vendor controls all AI inference. Your CFO noticed that Azure OpenAI gives GPT-4o access with enterprise SLAs that AWS Bedrock cannot match. Nobody wakes up and says "I want to manage three identity systems, three networking stacks, and three billing consoles." They arrive at multi-cloud because business constraints demand it.

After building multi-cloud AI systems across energy companies, healthcare systems, and federal agencies, I can tell you the engineering challenge is real but manageable — if you understand the actual tradeoffs instead of vendor marketing.

AWS Bedrock vs Azure OpenAI vs Vertex AI: The Operational Reality

Enterprise teams do not pick clouds based on feature matrices. They pick based on three questions: Which model gives the best output quality for my specific use case? Which cloud has the compliance posture I need? Where does my data already live?

AWS Bedrock

Bedrock provides a unified API across multiple model providers — Anthropic Claude, Meta Llama, Mistral, Cohere, and Amazon Titan. You call InvokeModel and swap the model ID; your application code barely changes when switching between providers.

Where it wins: Deep integration with the AWS ecosystem (S3 for data, SageMaker for custom training, Lambda for serverless triggers, CloudWatch for monitoring). Knowledge Bases connects S3 documents to RAG pipelines with managed OpenSearch Serverless vector storage. Guardrails provides content filtering, PII redaction, and topic blocking at the API layer. If your platform is AWS-native, Bedrock slots in cleanly.

Where it falls short: Model availability lags Azure by weeks to months for new OpenAI releases. Pricing carries a convenience tax — high-volume Claude inference is sometimes cheaper through Anthropic's direct API. Custom model training through Bedrock is limited compared to SageMaker.

Azure OpenAI Service

Azure OpenAI is Microsoft's managed service for OpenAI models — GPT-4o, GPT-4 Turbo, o1, o3, DALL-E 3, and Whisper. The Microsoft-OpenAI partnership means Azure gets new models before any other cloud.

Where it wins: First access to frontier OpenAI models. Enterprise compliance within Azure's boundary — your data never leaves your tenant. Microsoft's BAA covers the service for HIPAA. FedRAMP High authorized regions exist. Provisioned Throughput Units (PTUs) provide reserved capacity with guaranteed latency, eliminating the noisy-neighbor problem for production workloads.

Where it falls short: Vendor lock-in to the OpenAI model family exclusively. Quota management is painful — new accounts start with low quotas, scaling requires support tickets and wait times. No access to Claude, Gemini, or other non-OpenAI models through this service.

Google Vertex AI

Vertex AI is Google's unified ML platform: Gemini models, Model Garden for third-party models, AutoML, custom training, and full MLOps lifecycle.

Where it wins: Gemini 2.5 Pro offers up to 1M token context windows at aggressive pricing that undercuts OpenAI. BigQuery ML integration lets you run inference directly in SQL — SELECT ML.PREDICT(MODEL vertex_ai.gemini_pro, ...) — with zero data movement and zero egress charges. TPU access for training offers better price-performance than NVIDIA GPUs for certain TensorFlow and JAX workloads.

Where it falls short: Fewer enterprise compliance certifications in fewer regions compared to Azure. Narrower FedRAMP coverage. Enterprise support has improved but still trails AWS and Azure in responsiveness for large accounts during production incidents.

Check out our Architecture Blueprints for multi-cloud AI reference architectures with Terraform modules covering all three providers.

Data Gravity and Egress Costs: The Tax Nobody Budgets For

Data gravity is the principle that applications migrate toward the data they consume. In multi-cloud AI, data gravity is the strongest force shaping your architecture — and the one most teams ignore until the first bill arrives.

Egress costs as of Q1 2026:

Source Destination Cost per GB
AWS to Internet Any $0.09 (first 10TB)
AWS to Azure (Internet) Azure $0.09
AWS to Azure (ExpressRoute) Azure $0.02 (dedicated)
Azure to Internet Any $0.087 (first 5TB)
GCP to Internet Any $0.12 (first 1TB)
GCP to AWS (Interconnect) AWS $0.05

A concrete scenario: Your RAG system stores 500GB of documents in S3, runs vector search on AWS OpenSearch, but sends inference requests to Azure OpenAI for GPT-4o.

At 10M requests/month, each sending 8KB of context to Azure: - Inference egress from AWS: 80GB = ~$7.20/month (manageable) - Weekly corpus re-embedding via Azure: 500GB x 4 = 2TB = ~$180/month - Training data movement from BigQuery to SageMaker (one-time 10TB): ~$1,200 in GCP egress

At 100M requests/month, inference egress alone reaches $72/month, plus re-embedding costs push total data movement to ~$252/month before accounting for model artifact transfers or centralized logging.

The architecture principles that follow from these numbers:

  1. Inference close to the model. Pre-process and compress on the source cloud, send minimal context to the inference cloud.
  2. Training close to the data. If training data lives in BigQuery, train on Vertex AI. Do not pay to move 10TB to SageMaker.
  3. Embeddings close to the vector store. Generate embeddings on the same cloud where your vector database lives.
  4. Cache aggressively at cloud boundaries. A Redis cache on AWS that catches repeated prompts to Azure OpenAI saves 15-30% in both egress and inference costs.

The Identity Federation Problem Nobody Talks About

Your AWS services use IAM roles. Azure services use Entra ID with managed identities. GCP services use service accounts. When your AI pipeline spans all three, every cross-cloud call needs authenticated identity.

The nightmare scenario: Your AI inference router on EKS needs to read context from S3 (AWS IAM), call Azure OpenAI for GPT-4o (Entra ID service principal), log analytics to BigQuery (GCP service account), and store results back in S3. Without federation, you manage three sets of credentials, three token refresh cycles, three expiration policies, and three audit logs.

Workload Identity Federation solves this. Each cloud now supports trusting tokens from another cloud's identity provider:

  • GCP Workload Identity Federation with AWS: Create a GCP Workload Identity Pool, add an AWS provider that trusts AWS STS tokens. Your EKS workload exchanges its AWS identity for a GCP access token — no stored key files.
  • Azure Federated Identity with AWS: Add a federated credential to an Azure app registration that trusts your AWS IAM role's OIDC token. Your EKS workload obtains Azure AD tokens using its native AWS identity.

The key insight: workload identity federation eliminates long-lived credentials entirely. Your inference router never stores a GCP key file or an Azure client secret. Short-lived tokens only.

For audit compliance across three clouds, use correlation IDs in every cross-cloud request, mapping AWS CloudTrail entries to Azure Monitor entries to GCP Audit Logs via a shared trace ID. Centralized observability through Datadog, New Relic, or Dynatrace handles multi-cloud correlation natively — often the most pragmatic choice.

Explore our Cloud Infrastructure Toolkits for Terraform modules implementing workload identity federation across AWS, Azure, and GCP.

Production Pattern: Primary on AWS, Azure OpenAI for GPT-4o

This is the most common multi-cloud AI pattern I encounter. The organization runs its core platform on AWS and adopted Azure OpenAI because it needed GPT-4o with enterprise SLAs and HIPAA BAA coverage.

Architecture: - API Gateway and context retrieval (S3 + OpenSearch) run on AWS where the data lives - Prompt assembly happens on AWS — only the final prompt (2-8KB) crosses to Azure - ExpressRoute + Private Link between AWS and Azure reduces cross-cloud latency from 20-50ms (internet transit) to 2-5ms (co-located metro) - Response caching on AWS ElastiCache Redis reduces Azure OpenAI calls by 15-30% - Automatic fallback to AWS Bedrock Claude if Azure OpenAI is unavailable or quota-limited

Cost profile at 10M requests/month:

Component Monthly Cost
Azure OpenAI GPT-4o inference $28,000-$42,000
AWS infrastructure (EKS, OpenSearch, S3) $8,500
ExpressRoute circuit (1 Gbps) $1,750
Cross-cloud egress $250
Redis cache (r7g.xlarge) $520
Total $39,020-$53,020

The Redis cache saving 20% of Azure OpenAI calls pays for itself 15x over. The ExpressRoute circuit saving 20-45ms of latency per request matters for user-facing applications where p99 response time is an SLA metric.

Avoiding the Complexity Spiral

Multi-cloud AI works when you follow three rules:

1. Minimize cross-cloud calls in the hot path. Every cross-cloud API call adds latency, cost, and a failure mode. Assemble prompts locally, cache responses locally, and only cross cloud boundaries for the inference call itself.

2. Use one cloud as the control plane. Pick one cloud to own orchestration, monitoring, and CI/CD. Run your Kubernetes control plane, your Grafana dashboards, and your deployment pipelines on a single cloud. Distribute workloads across clouds, but do not distribute operational tooling.

3. Abstract provider interfaces in your application code. Your inference router should speak a common interface. When you add a new provider or swap models, only the router configuration changes — application code is untouched.

The teams that fail at multi-cloud are the ones who let it become an excuse for architectural sprawl. The teams that succeed are the ones who treat each cloud as a specialized compute target accessed through a clean, narrow interface.

Learn multi-cloud architecture patterns with our free Cloud Infrastructure course covering AWS, Azure, GCP, and cross-cloud networking.

Frequently Asked Questions

When should an enterprise adopt a multi-cloud AI strategy versus staying single-cloud?

Stay single-cloud if your data, team expertise, and compliance requirements align with one provider. Go multi-cloud when business constraints force it: an acquisition brings a second cloud, a specific model (GPT-4o on Azure, Claude on Bedrock) is only available on one provider, or regulatory requirements mandate vendor diversification. Do not adopt multi-cloud for theoretical "avoiding lock-in" — the operational overhead of running three identity systems, three networking stacks, and three billing consoles is substantial and only justified by concrete business requirements.

How much does cross-cloud data movement actually cost for AI workloads?

At moderate scale (10M inference requests/month with a 500GB document corpus), cross-cloud data movement costs $200-$500/month. At high scale (100M+ requests, frequent re-embedding, model artifact transfers), costs reach $2,000-$5,000/month. The real cost is not just egress fees — it is the engineering time to build reliable cross-cloud data pipelines, handle network failures, and maintain consistency. Architect your system so the heaviest data processing happens on the same cloud as the data source.

Can we use workload identity federation instead of storing credentials for cross-cloud access?

Yes, and you should. All three major clouds support workload identity federation as of 2026. AWS workloads can obtain Azure and GCP tokens using their native IAM identity without storing any long-lived secrets. The setup requires creating trust relationships (GCP Workload Identity Pools, Azure federated credentials) but eliminates the credential rotation, secret storage, and key management overhead entirely. For regulated environments under SOC 2 or FedRAMP, this approach also satisfies the principle of least privilege and eliminates a major audit finding category.


Ready to architect multi-cloud AI systems? Browse 320 premium cloud architecture blueprints or start with our 17 free courses covering AWS, Azure, GCP, and multi-cloud strategies.

Kehinde Ogunlowo

Senior Multi-Cloud DevSecOps Architect & AI Engineer

AWS, Azure, GCP Certified | Secret Clearance | FedRAMP, CMMC, HIPAA

Enterprise experience at Cigna Healthcare, Lockheed Martin, NantHealth, BP Refinery, and Patterson UTI.

Start Your Cloud Career Today

Access 17 free courses covering AWS, Azure, GCP, DevOps, AI/ML, and cloud security — built by a practicing Senior Cloud Architect with enterprise experience.

Get Free Cloud Career Resources

You might also like