


Citadel Cloud Management
Multi-Tenant SaaS Architecture Blueprint
Architecture BlueprintsCreated by Kenny Ogunlowo
Product Description
The Problem This Blueprint Solves
Your SaaS application serves 340 tenants from a shared database where tenant isolation depends on WHERE clauses in application queries. A bug in one API endpoint last month leaked data between tenants, requiring breach notification to 12 enterprise customers. Your largest customer wants dedicated infrastructure, three customers require SOC 2 compliance evidence for data isolation, and your current architecture cannot provide either without a complete rewrite.
This blueprint is the multi-tenant SaaS architecture I built for a B2B analytics platform serving 800 tenants (including 6 Fortune 500 companies) with zero cross-tenant data leakage incidents across 3 years of operation and the ability to offer shared, pooled, or dedicated infrastructure per tenant.
What You Get
- Architecture diagrams — Tenant isolation models (silo, pool, bridge), data partitioning strategies, tenant routing flow, onboarding automation pipeline, and billing metering architecture (Draw.io)
- Terraform modules — Tenant provisioning pipeline (shared and dedicated), RDS with Row-Level Security policies, API Gateway with tenant-aware routing, Cognito user pools per tenant, and metering with CloudWatch custom metrics
- Tenant isolation framework — Row-Level Security policies for PostgreSQL, IAM session policies with tenant context, S3 prefix-based isolation with bucket policies, and DynamoDB partition key design for tenant scoping
- Onboarding automation — Step Functions workflow that provisions tenant resources, configures isolation policies, seeds initial data, and sends welcome notifications
Key Architecture Decisions
- Row-Level Security over application-level WHERE clauses — Application-level filtering depends on every developer remembering to add the tenant filter to every query. PostgreSQL Row-Level Security enforces tenant isolation at the database engine level — even a raw SQL query without a WHERE clause returns only the current tenant's data. The policy is attached to the table, not the query.
- Tiered isolation model over one-size-fits-all — Small tenants share infrastructure (pool model) for cost efficiency. Mid-tier tenants get dedicated database schemas (bridge model) for query isolation. Enterprise tenants get dedicated infrastructure (silo model) for compliance requirements. One architecture supports all three tiers through configuration, not code changes.
- Tenant context in JWT claims, not request headers — Request headers can be spoofed by clients. Tenant ID embedded in the JWT token during authentication is cryptographically signed and tamper-proof. The API Gateway Lambda authorizer extracts the tenant context and passes it to backend services, which use it for RLS policy evaluation.
- Metering at the API Gateway level — Billing accuracy requires counting every API call, storage byte, and compute unit per tenant. API Gateway access logs with tenant ID provide an immutable record of API usage. CloudWatch metric filters aggregate per-tenant usage for billing integration without instrumenting application code.
Who This Blueprint Is For
- SaaS Architects designing multi-tenant data isolation for the first time
- Backend Engineers implementing Row-Level Security for tenant data partitioning
- Product Managers who need to offer different isolation tiers (shared, dedicated) to different customer segments
- Compliance Officers who need to demonstrate tenant data isolation for SOC 2 Type II audits
Your First 48 Hours
Deploy the PostgreSQL RDS instance with the provided RLS policy Terraform module. Create two test tenants and insert sample data for each. Connect as Tenant A and verify that queries return only Tenant A data — even SELECT * FROM orders without a WHERE clause. Attempt to access Tenant B data explicitly and confirm the RLS policy blocks it. On day two, deploy the API Gateway with tenant-aware routing and Cognito user pools. Authenticate as a Tenant A user and verify the JWT contains the correct tenant_id claim that the backend uses for RLS context.
Limitations and Trade-offs
Row-Level Security adds 5-15% query overhead due to policy evaluation on every row access. PostgreSQL RLS does not apply to superuser roles — the database admin role must be restricted from application access. The silo model (dedicated infrastructure per tenant) costs 3-5x more per tenant than the pool model; the cost model spreadsheet helps determine the break-even tenant size for dedicated infrastructure. Tenant provisioning for the silo model takes 8-12 minutes (VPC, RDS, ECS service creation) — set customer expectations for onboarding time.