Skip to content

Manual Test Runbook — X5: Pipeline Integration Tests

Owner: Sagar  |  Time: ~4 min (Part A offline) · +20–40 min (Part C live consumers)  |  Cloud: none for Part A · sandbox for Part C

Promotes X5 (tests/pipeline-integration/ + .github/workflows/pipeline-integration.yml + it-* consumers) from 🟦 Code Complete → 🟩 Shipped. Part A is the offline reusable-workflow contract gate ($0). Part C runs the live it-* consumers against the X1 sandbox.


Prerequisites

  • Local tooling: python3 + PyYAML (pip install pyyaml)
  • (Part C only) gh authenticated; sandbox secrets/vars configured on the repo:
  • vars: SANDBOX_ACR_LOGIN_SERVER, SANDBOX_NOTATION_CERT_KEY_ID, SANDBOX_STATE_RG, SANDBOX_STATE_SA, SANDBOX_ARGOCD_SERVER
  • secrets: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID, ARGOCD_AUTH_TOKEN
  • Working directory: repo root

Steps

Part A — offline contract gate (~4 min, $0)

  1. Run the full offline gate (unit tests + repo contract check):
./tests/pipeline-integration/validate.sh

Expected: ==> OK: X5 offline gate passed. — 11 unit tests pass and the repo contract check reports OK: every caller honours its reusable-workflow contract. (one expected WARN for the client-only image-scan/I1 workflow).

  1. (Optional) Prove the gate actually catches drift. Temporarily break a caller and confirm a FAIL, then revert:
# add a bogus input to a template caller
sed -i.bak 's/  image_name:/  bogus_input: x\n      image_name:/' \
  templates/client-repo/.github/workflows/container-build-sign.yml
python3 tests/pipeline-integration/contract_check.py; echo "exit: $?"   # expect FAIL + exit 1
mv templates/client-repo/.github/workflows/container-build-sign.yml.bak \
  templates/client-repo/.github/workflows/container-build-sign.yml

Expected: FAIL …: unknown input(s) not declared by the reusable workflow: bogus_input, exit 1.

  1. Confirm the CI gate wiring parses:
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/pipeline-integration.yml')); print('OK')"

Part C — live consumers against the sandbox (~20–40 min)

Each it-* workflow is dispatch-only and runs the corresponding reusable workflow end-to-end. Run the ones whose sandbox dependencies are available.

  1. C2 — container build/sign/scan (needs sandbox ACR + AKV cert):
gh workflow run it-container-build-sign.yml -f fixture=clean
gh workflow run it-container-build-sign.yml -f fixture=vulnerable -f fail_on_scan_findings=true

Expected: clean succeeds (image pushed + signed + Grype passes); vulnerable fails at the Grype gate (proves fail-on-findings). Clean up the pushed repos per container-build-sign/README.md.

  1. C3 — AKS deploy (needs sandbox AKS + ArgoCD). Install the fixture app once:
kubectl apply -f tests/pipeline-integration/aks-deploy/argocd-app.yaml
gh workflow run it-aks-deploy.yml
gh workflow run it-aks-deploy.yml -f rollback_drill=true

Expected: deploy syncs + smoke probe returns 200; the rollback drill deploys a bad image and asserts auto-rollback. Tear down per aks-deploy/README.md.

  1. C1 — terraform plan (needs sandbox state backend):
gh workflow run it-terraform-plan-apply.yml -f working_directory=sandbox

Expected: OIDC login + backend init + fmt/validate + OPA post-plan gate all pass in plan-only mode; a compliance snapshot artifact is emitted. Nothing is applied (plan_only: true).


Pass criteria

  • Part A — validate.sh passes (11 unit tests + clean repo contract check)
  • Part A — the drift-injection check (step 2) produces a FAIL + exit 1
  • (Part C) at least one it-* consumer runs green against the sandbox
  • (Part C) the C2 vulnerable fixture fails the scan gate (negative path)
  • All sandbox test artifacts cleaned up

Failure mode

A reusable workflow's workflow_call interface changes and a caller is not updated. Detected offline by contract_check.py (the PR gate) rather than at run time in a client repo. A caller targeting a workflow at a pinned @ref is checked against the current definition — bump the pin when the interface changes (noted in the README).

Cost impact

Part A is $0 (pure parsing). Part C costs sandbox ACR storage (a few cents, purged by X7) + transient AKS workload; C1 is plan-only ($0).

Removal path

Delete tests/pipeline-integration/ (checker + fixtures), the it-* consumer workflows, and .github/workflows/pipeline-integration.yml. No infra is created by the offline gate; Part C artifacts are sandbox-only and X7-cleaned.


Sign-Off

Field Value
Part A (offline gate) ☐ PASS
Part A (drift detection) ☐ PASS
Part C (live consumers) ☐ PASS / ☐ skipped
Tester
Date
Result ☐ PASS