Image & Artifact Management

TL;DR

Centralized container registries (Docker Hub, ECR, GCR, Artifactory) store versioned images. Scan images for vulnerabilities before deployment. Tag strategies (semantic versioning, latest, git-sha) enable reproducible rollbacks. Garbage collection reclaims disk space. Push attestations (signatures, SBOM) for supply-chain integrity.

Learning Objectives

Design registry architecture and access control policies.
Implement image scanning and vulnerability thresholds.
Choose tagging strategies for reproducibility and rollback.
Configure garbage collection to prevent disk exhaustion.
Integrate attestations and provenance into CI/CD.

Motivating Scenario

A developer accidentally tags a production image "latest" while testing. Pods restart with untested code; production outage. With proper tagging (semantic versioning + git-sha), "latest" is immutable; accidents are prevented. Image scanning catches zero-day vulnerabilities before they reach production.

Core Concepts

Registry: Central repository storing images with metadata (manifests, layers, configs, signatures). Examples: Docker Hub, ECR (AWS), GCR (Google), Harbor (self-hosted), Artifactory. Provides RBAC, webhook hooks, and vulnerability scanning integration.

Tagging Strategy: Naming convention for images. Semantic versioning (v1.2.3) ensures predictable rollbacks. Git SHA (abc123d) enables reproducibility—same commit always produces same image. "latest" tag is dangerous in production: forces re-pull on every deployment; causes surprises on pod restart. Use immutable tags (semver or SHA) in production; "latest" only for development/staging.

Image Scanning: Automated vulnerability detection against CVE databases (Trivy, Clair, Snyk). Scans before push (in CI) and periodically after push (drift detection). Blocks deployment if critical vulns found; requires remediation.

Garbage Collection: Remove unused images and orphaned layers to reclaim disk space. Images not pulled in 30 days, or with zero running deployments, are candidates for deletion. Prevents disk exhaustion on registries managing millions of images.

Attestation & Provenance: Cryptographic proof of image origin and integrity. Cosign signatures verify image was built by your CI/CD, not compromised. SBOM (Software Bill of Materials) lists all dependencies; enables supply-chain audit for compliance (SLSA framework, CISA requirements).

Practical Example

Registry Access Control
Image Scanning (Trivy)
Tagging Strategy (Semver + Git SHA)

# Harbor Registry (self-hosted, secure)
apiVersion: v1
kind: ConfigMap
metadata:
  name: harbor-config
data:
  core-security.conf: |
    # Enforce pull image from harbor only
    registry_url: harbor.internal:443
    
    # RBAC: separate namespaces per team
    project:
      - name: platform-team
        access: ["read", "write"]
      - name: data-science
        access: ["read", "write"]
      - name: public
        access: ["read"]  # Public images, read-only

    # Webhook: notify on push
    webhook_url: https://ci.example.com/harbor-webhook

    # Image retention (auto-delete old tags)
    retention_policy:
      - pattern: "*.*.*.dev-*"
        keep: 5  # Keep last 5 dev builds
        type: "tag"
      - pattern: "*.*.*.prod-*"
        keep: 20  # Keep last 20 prod builds
        type: "tag"
---
apiVersion: v1
kind: Secret
metadata:
  name: harbor-pull-secret
  namespace: production
type: kubernetes.io/dockercfg
data:
  .dockercfg: |
    {
      "harbor.internal:443": {
        "auth": "base64-encoded-username:password",
        "email": "k8s@example.com"
      }
    }

apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-image-scan
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: scanner
          containers:
          - name: trivy
            image: aquasec/trivy:latest
            args:
            - image
            - --severity
            - "HIGH,CRITICAL"
            - --exit-code
            - "1"  # Fail if vulns found
            - myregistry/myapp:v1.2.3
            
            env:
            - name: TRIVY_USERNAME
              valueFrom:
                secretKeyRef:
                  name: registry-creds
                  key: username
            - name: TRIVY_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: registry-creds
                  key: password

          restartPolicy: OnFailure
---
# Admission webhook: reject images with vulnerabilities
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: image-scan-validator
webhooks:
- name: scanner.example.com
  clientConfig:
    service:
      name: image-scanner
      namespace: kube-system
      path: "/validate"
    caBundle: ...
  rules:
  - operations: ["CREATE"]
    apiGroups: [""]
    apiVersions: ["v1"]
    resources: ["pods"]
  failurePolicy: Fail

#!/bin/bash
# CI/CD: build and tag images consistently

VERSION=$(git describe --tags --always)  # e.g., v1.2.3-45-gabc123d
GIT_SHA=$(git rev-parse --short HEAD)    # e.g., abc123d
BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
REGISTRY=myregistry.azurecr.io
REPO=myapp

# Build with multiple tags
docker build \
  --build-arg VERSION=$VERSION \
  --build-arg BUILD_DATE=$BUILD_DATE \
  -t $REGISTRY/$REPO:$VERSION \
  -t $REGISTRY/$REPO:$GIT_SHA \
  -t $REGISTRY/$REPO:latest \
  .

# Push all tags
docker push $REGISTRY/$REPO:$VERSION
docker push $REGISTRY/$REPO:$GIT_SHA
docker push $REGISTRY/$REPO:latest

# Attestation (cosign)
cosign sign --key cosign.key $REGISTRY/$REPO:$VERSION

# In production: pin to specific version
kubectl set image deployment/api api=$REGISTRY/$REPO:$VERSION

Decision Checklist

Self-Check

Why should "latest" tag be avoided in production? (Answer: forces re-pull; unpredictable image on restart.)
How does image scanning prevent vulnerabilities? (Answer: detects CVEs before deployment; blocks HIGH+ via admission webhook.)
What is image garbage collection, and why is it needed? (Answer: registry disk exhaustion after months of builds; GC reclaims space from unused images.)
How do you implement reproducible deployments? (Answer: pin to specific semver or git-SHA, not "latest"; enables bit-for-bit identical deployments.)
What is an admission webhook, and why does it matter for images? (Answer: validates images before they run; blocks unsigned, unscanned, or vulnerable images automatically.)

One Takeaway

Treat images as immutable, versioned artifacts. Use semantic versioning + git-SHA for reproducibility, scan before deployment to catch vulnerabilities, and sign with cosign for supply-chain integrity. Enforce via admission webhooks; prevent "latest" tag in production. Small tagging/scanning investments prevent major security incidents and deployment surprises.

Next Steps

Study Supply-Chain Security.
Explore Cost Controls.

Advanced Patterns

Layer Caching Optimization

# ❌ Inefficient: Copies everything, invalidates layer on any file change
COPY . /app
RUN npm install && npm run build

# ✅ Efficient: Cache dependencies separately from code
FROM node:18-alpine AS builder

COPY package*.json ./
RUN npm install

COPY . ./
RUN npm run build

FROM node:18-alpine
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["node", "dist/index.js"]

# Benefits:
# - npm install cached until package.json changes
# - Code changes only rebuild code layer
# - Rebuilds 5x faster after code-only change

Image Vulnerability Scanning in CI/CD

# GitHub Actions example
name: Build and Scan Image
on: [push]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Build image
        run: docker build -t myapp:${{ github.sha }} .

      - name: Scan with Trivy
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: myapp:${{ github.sha }}
          format: 'sarif'
          output: 'trivy-results.sarif'
          severity: 'CRITICAL,HIGH'

      - name: Upload to GitHub Security
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'

      # Fail pipeline if HIGH/CRITICAL found
      - name: Check vulnerabilities
        run: |
          VULNS=$(docker run --rm aquasec/trivy:latest image \
            --severity CRITICAL,HIGH \
            --exit-code 1 \
            myapp:${{ github.sha }})
          if [ $? -ne 0 ]; then
            echo "Vulnerabilities found! Push blocked."
            exit 1
          fi

      - name: Push to registry
        if: success()
        run: docker push myapp:${{ github.sha }}

Signed Images with Cosign

#!/bin/bash
# Sign images during build

REGISTRY=myregistry.azurecr.io
IMAGE=$REGISTRY/myapp:$VERSION

# Build and push
docker build -t $IMAGE .
docker push $IMAGE

# Sign image with private key
cosign sign --key cosign.key $IMAGE

# Client verifies signature before pulling
# cosign verify --key cosign.pub $IMAGE

# In Kubernetes, admission webhook enforces:
# "Block images without valid signature"

Real-World Multi-Tenant Scenario

# Enterprise with multiple teams, environments, compliance needs

Registries:
  Public images: Docker Hub Mirror (cache, cost savings)
  Internal images: Harbor (air-gapped, compliance)
  Ephemeral builds: ECR (temporary staging, auto-delete)

Tagging Strategy:
  Development: dev-branch-abc123d (auto-delete after 7 days)
  Staging: staging-v1.2.3-rc1 (auto-delete after 30 days)
  Production: v1.2.3 + v1.2 + v1 (immutable, retained 2 years)

Scanning:
  On push (CI): Block if CRITICAL
  On schedule (nightly): Detect drift (new CVEs)
  Before deployment (admission webhook): Recheck before running

Compliance:
  Base images: Signed by security team (vetted, hardened)
  Application images: Must be signed by CI/CD (attestation)
  Supply chain: SBOM generated and stored (audit trail)

Lifecycle:
  Development: 7 days
  Staging: 30 days
  Production: 2 years
  Archived: 5 years (compliance retention)

Additional Patterns & Pitfalls

Pattern: Multi-Stage Dockerfile: Build stage installs dependencies; runtime stage uses only compiled artifacts. Result: 500MB build image → 20MB runtime image. Layer caching speeds rebuilds 5-10x.

Pattern: Distroless Base Images: No shell, package manager, or OS tools in runtime image. Reduces attack surface (no exploitable shells) and image size. Trade-off: harder to debug; use alpine for dev, distroless for production.

Pattern: Image Pull Secrets for Private Registries: Create secret with registry credentials; reference in pod imagePullSecrets. Enables deploying proprietary images without leaking credentials in manifests.

Pattern: Registry Mirrors: Docker Hub outages affect all CI/CD jobs. Use registry mirror (Docker Hub Mirror, Artifact Hub) or cache (Harbor, Artifactory) to reduce failures and improve speed by 10x.

Pattern: Content Addressing: Reference images by digest (sha256:abc123...) instead of tag. Ensures reproducible deployments: same digest = same image, always.

Pitfall: "latest" Tag in Production: Pod restarts pull new "latest" image; rolling back requires manual re-tag. Always pin production to specific version (v1.2.3 or abc123d). "latest" only for dev/staging.

Pitfall: Image Scan Finds Vulns, But Deployment Proceeds: Admission webhook not enforcing scan results. Configure ValidatingWebhook to block images with HIGH/CRITICAL vulns automatically.

Pitfall: Base Image Updates Break Backward Compatibility: Alpine 3.16 → 3.18 changes glibc version; compiled binaries fail. Pin base image version in Dockerfile (alpine:3.18.6, not alpine:3.18); plan major version upgrades with testing.

Pitfall: Orphaned Images Exhaust Registry Disk: Daily builds accumulate; 1 year = 365 images per service. Retention policy: keep last 30 images (dev), last 100 (prod). Auto-delete via garbage collection.

Pitfall: No Rollback Plan: Deployed v1.2.3; found critical bug; need v1.2.2. If images auto-deleted, can't rollback. Retain last 20 production images; tag with date for easy rollback.

Pitfall: Secrets in Images: Docker images are files; attacker can extract layers. Never bake secrets into images. Use Kubernetes Secrets, external vaults (HashiCorp Vault), or environment variables at runtime.

References

Harbor Registry: Official Website ↗️
Trivy Scanner: GitHub Repository ↗️
Cosign Supply-Chain Security: GitHub Repository ↗️
SLSA Framework: Supply-Chain Levels for Software Artifacts ↗️
Kubernetes Admission Webhooks: Official Documentation ↗️
"Container Security" (Liz Rice, O'Reilly) — comprehensive guide
Docker Best Practices and security hardening

Image & Artifact Management

TL;DR​

Learning Objectives​

Motivating Scenario​

Core Concepts​

Practical Example​

Decision Checklist​

Self-Check​

One Takeaway​

Next Steps​

Advanced Patterns​

Layer Caching Optimization​

Image Vulnerability Scanning in CI/CD​

Signed Images with Cosign​

Real-World Multi-Tenant Scenario​

Additional Patterns & Pitfalls​

References​