Skip to main content

Promotion, Approvals, and Gates

Control flow of changes through environments; approve releases with appropriate rigor.

TL;DR

Environments: dev → staging → production. Artifact is promoted (automatically or manually) from one to the next only if it passes gates. Gates can be automated (tests, security scans) or manual (human approval).

Dev gates: automatic (low risk). Staging gates: moderate approval needed. Prod gates: strict approval required. Automate approvals based on objective criteria (all tests passed). Reserve manual approvals for subjective decisions (high-risk, architecture, regulatory).

Learning Objectives

  • Design multi-environment promotion workflows
  • Define appropriate approval gates by environment
  • Distinguish between automated and manual approval triggers
  • Implement audit trails for compliance
  • Reduce toil through approval automation
  • Balance speed with safety and governance

Motivating Scenario

Your team wants to deploy a critical payment service update. Currently, approvals take 4 hours: manual code review, manual approval request to ops team, waiting for on-call lead to review, then finally deploying.

A competitor with approval automation: push code → CI/CD runs tests → if all pass → automatic approval based on quality gates → deployed to staging → staging tests pass → automatic production deployment. Total time: 20 minutes.

Your team has security requirements (code review, compliance checks) but those are manual and slow. Competitor automates policy checks: SAST scan, dependency vulnerability check, code review bot that catches common issues. Humans only review if automation flags something.

Result: You ship features 10x slower due to approval toil.

Core Concepts

Environment Promotion Architecture

flowchart LR Code["Code Pushed<br/>to Main"] --> Build["Build Artifact<br/>Docker image"] Build --> Dev["Deploy to Dev<br/>Run tests"] Dev --> DevGate{"All Checks<br/>Passed?"} DevGate -->|No| Fail["Build Failed<br/>Feedback to Dev"] DevGate -->|Yes| Staging["Promote to Staging<br/>Production-like env"] Staging --> StagingTests["Run Integration<br/>& Performance Tests"] StagingTests --> StagingGate{"Quality<br/>Passed?"} StagingGate -->|No| Revert["Investigate<br/>Fix"] StagingGate -->|Yes| Approval{"Manual<br/>Approval?"} Approval -->|Required| Wait["Wait for PM/Lead<br/>Approval"] Approval -->|Auto| Prod Wait --> Prod["Deploy to Prod<br/>Rolling/Canary"] Prod --> Monitor["Monitor Metrics<br/>Auto-rollback if needed"]

Gate Types by Environment

Development Environment:

  • Automated gates only
  • Unit tests, linting, type checking
  • Fast feedback (seconds to 2 minutes)
  • No human approval
  • Fail early, iterate quickly

Staging Environment:

  • Automated quality gates (integration tests, security scans)
  • Optional human approval for high-risk changes
  • Slow tests (10-30 minutes) are acceptable
  • Production-like data and infrastructure
  • Last chance to catch issues before production

Production Environment:

  • All automated gates from staging
  • Mandatory human approval (usually PM or tech lead)
  • Approval based on: business impact, error budget, deploy window
  • Audit trail required for compliance
  • Deployment strategy (canary, blue-green, rolling) chosen by risk

Approval Strategies

Automatic Approval:

  • Trigger: All quality gates passed
  • Condition: Low-risk change (documentation, internal tools, bug fix)
  • Owner: System/CI/CD
  • Audit: Automatic log of "approval"

Single Approval:

  • Trigger: Quality gates + manual review
  • Reviewer: Tech lead or PM
  • Time to approval: 15 minutes to 2 hours
  • Use for: Feature changes, configuration updates

Multi-stage Approval:

  • Trigger: Quality gates + sequential approvals
  • Approvers: Tech lead → PM → On-call lead (for prod)
  • Time to approval: 1-4 hours
  • Use for: High-impact changes, major refactors, security-critical

Scheduled Release:

  • Trigger: Approved, waiting for release window
  • Release window: Business hours, low-traffic time
  • Approval locked: No changes after approval
  • Use for: Database migrations, breaking API changes

Practical Examples

# .github/workflows/promotion-pipeline.yml
name: Promotion Pipeline

on:
push:
branches: [main]
workflow_dispatch:

env:
REGISTRY: ghcr.io
IMAGE_NAME: mycompany/payment-service

jobs:
# Stage 1: Build and test (Dev environment)
build-and-test:
runs-on: ubuntu-latest
permissions:
contents: read
security-events: write
outputs:
image-tag: ${{ steps.meta.outputs.tags }}

steps:
- uses: actions/checkout@v4

- name: Run unit tests
run: |
npm install
npm run test:unit
npm run coverage

- name: Run linter
run: npm run lint

- name: Build Docker image
run: docker build -t ${{ env.IMAGE_NAME }}:${{ github.sha }} .

- name: Security scanning (SAST)
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.IMAGE_NAME }}:${{ github.sha }}
format: 'sarif'
output: 'trivy-results.sarif'

- name: Check for vulnerabilities
run: |
if grep -q '"CRITICAL"' trivy-results.sarif; then
echo "Critical vulnerabilities found!"
exit 1
fi

- name: Publish to registry
run: |
docker tag ${{ env.IMAGE_NAME }}:${{ github.sha }} \
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}

# Stage 2: Deploy to staging (Staging environment)
deploy-staging:
needs: build-and-test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'

steps:
- uses: actions/checkout@v4

- name: Deploy to staging
run: |
kubectl set image deployment/payment-service \
payment-service=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
-n staging

- name: Wait for rollout
run: |
kubectl rollout status deployment/payment-service \
-n staging --timeout=5m

- name: Run integration tests
run: |
npm run test:integration -- --env=staging

- name: Run performance tests
run: |
npm run test:performance -- --env=staging

- name: Check metrics
run: |
# Verify error rate is acceptable
ERROR_RATE=$(curl -s http://prometheus-staging:9090/api/v1/query?query=error_rate | jq '.data.result[0].value[1]')
if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
echo "Error rate too high in staging: $ERROR_RATE"
exit 1
fi

# Stage 3: Request approval (before prod)
request-approval:
needs: deploy-staging
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'

steps:
- name: Create deployment approval issue
uses: actions/github-script@v7
with:
script: |
const issue = await github.rest.issues.create({
owner: context.repo.owner,
repo: context.repo.repo,
title: `Deploy approval needed: ${{ github.sha }}`,
body: `**Commit**: ${{ github.sha }}\n**Branch**: main\n**Author**: ${{ github.actor }}\n\nAll checks passed in staging.\n\n**To approve**: Reply with /approve\n**To reject**: Reply with /reject`,
labels: ['deployment', 'approval-needed']
});

console.log(`Created approval issue: ${issue.data.number}`);

# Stage 4: Deploy to production (Production environment)
deploy-production:
needs: request-approval
runs-on: ubuntu-latest
if: github.event_name == 'workflow_dispatch' # Manual trigger only

steps:
- uses: actions/checkout@v4

- name: Verify approval (in real scenario, check approval system)
run: echo "Approved for production deployment"

- name: Deploy with canary strategy
run: |
# Deploy to 10% of traffic
kubectl set image deployment/payment-service \
payment-service=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
-n production

# Use traffic splitting (e.g., Istio, Flagger)
kubectl apply -f - <<EOF
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: payment-service
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: payment-service
progressDeadlineSeconds: 60
service:
port: 8080
analysis:
interval: 1m
threshold: 5
maxWeight: 100
stepWeight: 20
metrics:
- name: error-rate
thresholdRange:
max: 1
- name: latency
thresholdRange:
max: 1000
EOF

- name: Monitor canary
run: |
kubectl wait --for=condition=ready \
canary/payment-service -n production \
--timeout=10m

When to Automate vs Manual Approvals

Automatic vs Manual Approvals
Automate Approval
  1. Tests are comprehensive and deterministic
  2. Change is low-risk (documentation, logging)
  3. Security scans pass without vulnerabilities
  4. Performance regression tests pass
  5. Deployment is incremental (canary)
  6. Rollback is fast and automatic
Manual Approval
  1. High-risk change (breaking API, data migration)
  2. Requires business judgment (feature flags)
  3. Security implications need review
  4. Regulatory or compliance requirements
  5. Affects critical customer journeys
  6. Rollback is manual or risky

Patterns and Pitfalls

Dev (instant) → Staging (automatic after passing) → Prod (approval required). Each stage tests more thoroughly. Staging tests integration and performance; Prod tests with real data, canary deployment.
Define approval policies in code, not in tribal knowledge. 'Production changes need tech-lead approval.' Store in version control, reviewable, auditable.
Manual approval takes 4+ hours because approvers are unavailable. Result: Features queue up. Solution: Expand approver pool, rotate on-call, or automate more gates.
Approver always clicks 'approve' without reading details. Approval becomes meaningless toil. Solution: Make approval automated if you trust it, or require substantive review.
Staging is not like production (old data, missing services, different configs). Change passes staging but fails in prod. Solution: Sync staging data weekly, use production-like infrastructure.

Design Review Checklist

  • Each environment has defined promotion criteria (which gates must pass?)
  • Automated gates are objective and measurable (tests pass, coverage >80%, no vulns)
  • Manual approvals are required only for high-risk changes (clearly defined)
  • Approval process is documented and includes decision criteria
  • Audit trail exists for all approvals (who, when, change details)
  • Approval time is &lt;2 hours (SLA for time-sensitive deploys)
  • Approvers have clear escalation path if blocked
  • Dev environment is fast-feedback (no manual approval)
  • Prod environment has at least 2-layer approval for breaking changes
  • Approval policies are version-controlled and reviewed

Self-Check

  • Can you deploy a low-risk change from code to production in <30 minutes?
  • What is your average approval wait time for production deployments?
  • Are there subjective approval decisions that could be automated (gates)?
  • Have you ever had approval become a bottleneck to deployment?
  • Is your approval audit trail queryable and compliance-friendly?

Next Steps

  1. Week 1: Document current promotion workflow and approval gates
  2. Week 2: Map gates to "why do we need this?" (remove if not justified)
  3. Week 3: Implement 3 automated quality gates (tests, security scans, lint)
  4. Week 4: Reduce manual approval scope by automating objective decisions
  5. Ongoing: Monitor approval wait times; iterate based on feedback

References

  1. Humble, J., & Farley, D. (2010). Continuous Delivery. Addison-Wesley.
  2. Forsgren, N., et al. (2018). Accelerate. IT Revolution Press.
  3. Accelerate State of DevOps Report. cloud.google.com/state-of-devops ↗️