
Citadel Cloud Management
Kubernetes RBAC and Policy Templates
DevOps PipelinesCreated by Kenny Ogunlowo
Product Description
Kubernetes RBAC and Policy Templates
Deploying to Kubernetes without a structured pipeline means someone is running kubectl apply from their laptop with a kubeconfig that has cluster-admin privileges. I have seen this at three different enterprises before I helped them fix it. At one energy sector client, a developer accidentally applied a staging manifest to production because their kubeconfig context was wrong. The service mesh routed 100% of traffic to an unconfigured pod for 22 minutes. This template makes that class of error structurally impossible.
This pipeline implements GitOps-aligned Kubernetes deployment via GitHub Actions. Every manifest change is version-controlled, reviewed, scanned, and promoted through environments with gates — not kubectl commands typed into terminals.
Pipeline Stages
-
manifest-lint —
instrumenta/kubeval@v0.16.1validates manifests against Kubernetes OpenAPI schemas. Catches invalid field names, wrong API versions, and missing required fields before anything touches a cluster. -
policy-check —
bridgecrewio/checkov-action@v12enforces security policies: no privileged containers, no host network access, resource limits required, nolatestimage tags, read-only root filesystem. -
build-and-scan — Builds the container image, scans with Trivy, signs with Cosign. The image digest (not tag) is injected into the Kubernetes manifests via
kustomize edit set image. -
deploy-dev —
azure/k8s-deploy@v5oraws-actions/amazon-eks-kubectl@v1applies to the dev cluster. Uses namespace isolation. Runs a post-deploy health check:kubectl rollout status deployment/app --timeout=300s. - integration-test — Port-forwards the service and runs the integration test suite against the deployed pods. Tests service mesh routing, database connectivity, and external API mocks.
- deploy-staging — Promotion via environment protection rules. Kustomize overlay patches the replica count, resource limits, and ingress hostname for staging. Same manifests, different configuration.
- deploy-prod — Canary deployment: 10% traffic shift, 5-minute bake time, automated metric check (error rate < 0.1%, p99 latency < 500ms), then full rollout. Manual approval gate with two required reviewers.
-
rollback-on-failure — If the canary metrics breach thresholds, the pipeline runs
kubectl rollout undoand opens an incident issue with the deployment SHA, metric values, and pod logs attached.
Security Gates
- Checkov/OPA — Enforces pod security standards. No containers run as root. All images must come from approved registries. NetworkPolicies must exist for every namespace.
- Image digest pinning — Manifests reference images by SHA256 digest, not mutable tags. Prevents supply chain attacks where a tag is overwritten with a compromised image.
- RBAC-scoped service accounts — The GitHub Actions deployer service account has namespace-scoped permissions only. Cannot modify cluster-level resources, RBAC, or other namespaces.
- Admission controller integration — Cosign image signatures are verified by Kyverno or OPA Gatekeeper at admission time. Unsigned images are rejected by the cluster.
Environment Matrix
Dev namespace auto-deploys on PR merge. Staging requires a release candidate tag and one approval. Production requires two approvals, passing staging integration tests, and a canary deployment window. Each environment runs in a separate cluster (or namespace with NetworkPolicy isolation) with distinct IAM roles and Secrets Manager paths.
Top 3 Failures
-
ImagePullBackOff from ECR token expiry — EKS nodes cache ECR credentials for 12 hours. Long-running nodes with expired tokens cannot pull new images. Fix: ensure
amazon-k8s-cniand ECR credential helper are updated, or useimagePullSecretswith a CronJob that refreshes the token. - Resource quota exceeded in namespace — The deployment specifies resource requests that exceed the namespace ResourceQuota. Fix: right-size resource requests based on actual usage metrics from Prometheus, and set the quota 20% above the expected peak.
-
Kustomize overlay merge conflicts — Two PRs modify the same Kustomize patch file. The merge produces invalid YAML that passes GitHub merge checks but fails
kustomize build. Fix: add akustomize buildstep in the PR check pipeline that validates the merged output.