Skip to main content

Policy as Code and Guardrails

Enforce infrastructure standards and security policies automatically; prevent non-compliant deployments.

TL;DR

Policy as Code transforms compliance and security governance from manual reviews into automated enforcement. Instead of asking teams to "follow best practices," you encode the rules—"all databases must be encrypted," "S3 buckets must have logging enabled," "resources must have cost-center tags"—and prevent non-compliant infrastructure from deploying. Tools like Open Policy Agent (OPA), HashiCorp Sentinel, and AWS CloudFormation Guard evaluate policies on every infrastructure change. Violations block deployment, shifting left security and compliance decisions into the development workflow rather than discovering problems during manual audits.

Learning Objectives

  • Understand the business value of policy-as-code enforcement
  • Design clear, implementable infrastructure policies
  • Choose and implement appropriate policy engines
  • Balance strictness with developer productivity
  • Document and communicate policy exceptions
  • Audit and report on policy compliance

Motivating Scenario

Your organization has grown to 50 engineers deploying infrastructure across AWS. Last month, a database was accidentally left unencrypted, exposing customer data. This month, a developer created an S3 bucket with public read access. You've hired security and compliance staff, but they can't review every infrastructure change—there are hundreds per week. Manual reviews are too slow and unreliable.

You need a system that automatically blocks dangerous configurations before they reach production, regardless of who wrote the code.

Core Concepts

What Are Infrastructure Policies?

Policies are rules that define how infrastructure must be configured. They answer questions:

  • Encryption: Are databases encrypted at rest? Is data encrypted in transit?
  • Access Control: Are public S3 buckets allowed? Which IAM roles can assume which service roles?
  • Tagging: Do all resources have required tags (cost-center, owner, environment, compliance-level)?
  • Sizing: Are oversized instances used (dev environments with prod-level resources waste money)?
  • Compliance: Do configurations meet regulatory requirements (HIPAA, PCI-DSS, SOC2)?
  • Networking: Are databases accessible only from private subnets? Are security groups properly restricted?

Policies operate at the infrastructure-as-code level, evaluating Terraform, CloudFormation, or Kubernetes manifests before they're applied.

Policy Engines and Tools

Policy as Code Evaluation Flow

Open Policy Agent (OPA): Language-agnostic policy engine using Rego. Evaluates any JSON/YAML data. Popular in Terraform, Kubernetes, and cloud-native ecosystems. Flexible but requires learning Rego.

HashiCorp Sentinel: Policy as Code for Terraform Enterprise. Integrates tightly with Terraform. Simpler than OPA for most use cases but Terraform-specific.

AWS CloudFormation Guard: Evaluates CloudFormation templates against rules. AWS-native but limited to CloudFormation.

Kubernetes Admission Controllers: Built-in policies for Kubernetes manifests. Examples: Pod Security Policy, NetworkPolicy validation.

Custom Scripts: Shell, Python, or Go scripts evaluating IaC before deployment. Simple but not standardized.

Categories of Policies

Policies typically fall into these categories:

  • Security: Encryption, public access, firewall rules, secrets not hardcoded
  • Compliance: Resource naming conventions, audit logging enabled, data residency
  • Cost: Prevent expensive instance types in dev, require auto-scaling policies
  • Operational: Resource tags mandatory, backups configured, monitoring enabled
  • Governance: Limit who can create certain resources, require approval for changes

Guardrails vs. Gates

Guardrails are automatic, infrastructure-level controls that prevent bad configurations. They're passive—the infrastructure itself enforces the policy. Example: a policy that requires all S3 buckets to enable versioning. If you try to create one without it, the policy blocks the deployment.

Gates are checkpoints in the deployment process. A policy engine checks your code, reports findings, and blocks if violations exist. Developers can't proceed until the policy passes.

Both are Policy as Code; they just operate at different points.

Practical Example

Let's implement a practical policy system using OPA and Terraform.

# policies/terraform.rego
# Enforce database encryption policy

package terraform

import future.keywords.contains
import future.keywords.if
import future.keywords.in

# Database encryption required
deny[msg] {
resource := input.planned_values.root_module.resources[_]
resource.type == "aws_rds_cluster"
resource.values.storage_encrypted != true
msg := sprintf(
"RDS cluster '%s' must have storage_encrypted = true",
[resource.address]
)
}

# S3 bucket logging required
deny[msg] {
resource := input.planned_values.root_module.resources[_]
resource.type == "aws_s3_bucket"
logging := resource.values.logging
logging == null
msg := sprintf(
"S3 bucket '%s' must have logging configuration",
[resource.address]
)
}

# All resources must have required tags
deny[msg] {
resource := input.planned_values.root_module.resources[_]
required_tags := ["Environment", "CostCenter", "Owner"]
tags := resource.values.tags
missing := [tag |
tag := required_tags[_]
not tag in object.keys(tags)
]
count(missing) > 0
msg := sprintf(
"Resource '%s' missing required tags: %s",
[resource.address, missing]
)
}

# Public S3 buckets not allowed
deny[msg] {
resource := input.planned_values.root_module.resources[_]
resource.type == "aws_s3_bucket_public_access_block"
not resource.values.block_public_acls
msg := sprintf(
"S3 bucket '%s' must block public ACLs",
[resource.address]
)
}

# Database must not be publicly accessible
deny[msg] {
resource := input.planned_values.root_module.resources[_]
resource.type == "aws_rds_instance"
resource.values.publicly_accessible == true
msg := sprintf(
"RDS instance '%s' cannot be publicly accessible",
[resource.address]
)
}

When to Use / When Not to Use

Use When:
  1. Security and compliance are non-negotiable (financial services, healthcare, government)
  2. You have regulatory requirements (SOC2, HIPAA, PCI-DSS) that mandate certain configurations
  3. You've had security incidents from misconfigured infrastructure
  4. Your organization has grown large enough that manual reviews don't scale
  5. You need to enforce consistent tagging and cost governance
  6. Developers frequently deploy without following organizational standards
Avoid When:
  1. Your organization is very small (< 5 engineers) and policies change constantly
  2. You're prototyping or experimenting (policies should block after validated learning)
  3. Your infrastructure is simple and low-risk
  4. You lack the expertise to write and maintain policies
  5. Your policies change so frequently they become maintenance burden rather than protection

Patterns and Pitfalls

Patterns and Pitfalls

Treat policies like code: review them, test them, version control them. When a policy needs to change, socialize the change with teams first. Don't silently change policies and break builds. Use deprecation timelines: announce a policy change 2 weeks in advance, giving teams time to fix violations before the policy becomes blocking.
Policies will have exceptions. Don't hardcode exceptions in policy code. Instead, maintain an exceptions registry: document why the exception exists, who approved it, and when it expires. Review exceptions quarterly. Example: You might allow a legacy database to remain unencrypted because the cost to migrate is high, but set a 6-month deadline for migration.
Start with high-value, low-friction policies (encryption, tagging). Avoid overly strict policies that make common development tasks impossible. If teams routinely violate a policy, it's either poorly designed or too strict. Adjust rather than enforce by punishment.
When a policy blocks a deployment, the error message must explain why, not just what's wrong. Bad: 'RDS instance fails policy check.' Good: 'RDS instance must have storage_encrypted = true to comply with HIPAA. See https://wiki.example.com/policies/encryption for details.' Clear messages reduce frustration and support tickets.
Anti-pattern: Security team writes all policies in isolation and teams are forced to comply. This leads to resentment and circumvention. Better: Security team defines high-level policy goals, teams help implement enforceable policies. Make policies collaborative.
Don't just block deployments; measure compliance continuously. What percentage of infrastructure is compliant? Which policies are most violated? Which teams struggle? Use these metrics to identify systemic problems and prioritize policy improvements.

Design Review Checklist

  • Are policy requirements documented and accessible to all engineers?
  • Do policies exist for security-critical concerns (encryption, public access, secrets)?
  • Are policy violations blocking deployment automatically?
  • Do policy error messages explain why and how to fix violations?
  • Is there a documented process for requesting policy exceptions?
  • Are exceptions tracked and reviewed regularly (quarterly)?
  • Do teams understand which policies apply to their infrastructure?
  • Are policies versioned and reviewed before changes?
  • Can compliance be measured and reported on?
  • Are new policies tested in non-blocking mode before enforcement?
  • Do policies have owners responsible for maintenance?
  • Is there a feedback mechanism for teams to suggest policy improvements?

Self-Check Questions

  1. Policy Scope: What are the 5 most important policies for your organization? (Hint: Start with security and compliance.)
  2. Exception Management: How do you track infrastructure that doesn't comply with policies? Do you have a process to resolve exceptions?
  3. Communication: Can a developer understand why a policy blocked their deployment within 2 minutes?
  4. Measurement: Can you report how much of your infrastructure is policy-compliant?
  5. Governance: Who owns each policy? Who can approve exceptions?

Next Steps

  1. Identify High-Value Policies: List the top security and compliance risks in your infrastructure. Start with those.
  2. Choose a Policy Engine: Evaluate OPA, Sentinel, or CloudFormation Guard based on your IaC tooling.
  3. Write Pilot Policies: Start with 3-5 policies that address real risks. Test in non-blocking mode.
  4. Set Up Reporting: Implement compliance dashboards so teams can see their status.
  5. Iterate with Teams: Get feedback from developers and refine policies based on their input.
  6. Document Exceptions: Create a process for exceptions with clear timelines for resolution.

References

  1. Styra OPA - Open Policy Agent ↗️
  2. HashiCorp Sentinel ↗️
  3. AWS CloudFormation Guard ↗️
  4. Humble, J., & Farley, D. (2010). Continuous Delivery. Addison-Wesley.
  5. OWASP Cheat Sheets ↗️