
Citadel Cloud Management
API Gateway Architecture with Rate Limiting
Architecture BlueprintsCreated by Kenny Ogunlowo
Product Description
The Problem This Blueprint Solves
Your microservices expose APIs directly, each with its own authentication logic, rate limiting implementation, and versioning strategy. Clients call 8 different endpoints with 8 different auth headers. A DDoS attack on one service brings down the entire platform because there is no centralized throttling. Your mobile app team spends more time managing API endpoint discovery than building features.
This blueprint is the API gateway architecture I built for a B2B SaaS platform serving 2,300 enterprise clients through a unified API surface, processing 45,000 requests per second with sub-15ms gateway overhead and 99.99% measured availability over 12 months.
What You Get
- Architecture diagrams — Gateway topology with routing rules, authentication flow, rate limiting tiers, caching layers, and backend service mesh integration (Draw.io)
- Terraform modules — API Gateway (HTTP API or REST API), custom domain with ACM certificates, Lambda authorizer for JWT validation, usage plans with API keys, WAF v2 integration, and CloudWatch dashboards
- API design standards — Versioning strategy (URL path vs header), error response format (RFC 7807), pagination patterns, and OpenAPI specification templates
- Developer portal configuration — Auto-generated API documentation from OpenAPI specs, API key self-service, and usage dashboard
Key Architecture Decisions
- HTTP API over REST API for new builds — REST API supports more features (request validation, caching, API keys) but HTTP API costs 71% less ($1.00 vs $3.50 per million requests), has 60% lower latency, and supports JWT authorizers natively. Use REST API only when you need request/response transformation or AWS service integrations.
- Lambda Authorizer with JWT caching over Cognito integration — Cognito User Pools provide built-in JWT validation but lock you into Cognito as the identity provider. A Lambda Authorizer with 5-minute response caching supports any OIDC provider (Auth0, Okta, Azure AD) with a single code change and adds less than 1ms after the first request warms the cache.
- Per-client rate limiting over global rate limiting — Global rate limits punish all clients when one client misbehaves. Usage plans with per-API-key throttling let you set 100 req/sec for free tier clients and 10,000 req/sec for enterprise clients. Abusive clients hit their own limits without affecting others.
- Gateway-level caching for read-heavy endpoints — API Gateway response caching (REST API only) serves cached responses for GET requests without hitting your backend. A 60-second TTL on frequently-read endpoints reduces backend load by 80%+ and improves P95 latency from 200ms to 8ms for cached responses.
Who This Blueprint Is For
- API Platform Engineers building centralized API management for microservices
- Backend Architects designing API strategies for external developer consumption
- Product Managers launching developer APIs as a revenue channel
- Security Engineers implementing API authentication and rate limiting at scale
Your First 48 Hours
Deploy the HTTP API Gateway with a custom domain and Lambda authorizer Terraform module. Configure one route that proxies to a backend service (or the included mock Lambda). Test JWT authentication end-to-end: valid token passes, expired token returns 401, missing token returns 403. On day two, configure the usage plan with two tiers and verify that the lower-tier API key gets throttled at its configured rate while the higher-tier key passes through. This validates authentication and rate limiting — the two most critical gateway functions.
Limitations and Trade-offs
HTTP API does not support response caching, request validation, or AWS service integrations — if you need these, use REST API at the higher price point. Lambda authorizer cold starts add 300-800ms to the first request after idle periods; provisioned concurrency ($6/month per instance) eliminates this. API Gateway has a hard limit of 10,000 requests per second per account per region (expandable via support request). WebSocket APIs have different pricing and connection limits (500 concurrent connections per route by default) — size accordingly for real-time features.