Manual Test Runbook — S1: Scheduled Terraform Drift Detection
Owner: Sagar | Time: ~6 min (offline) / ~20 min (with live plan) | Sandbox: snowops-sandbox-01
Overview
S1 detects configuration drift: a scheduled terraform plan against a stack
whose committed config hasn't changed should be a no-op, so any proposed action
means the live infrastructure has drifted. Two pieces:
apps/drift-detector/— the TS tool (read-only: it plans, never applies).- S1 workflow (
.github/workflows/drift-detection.yml) — daily cron that plans each stack, runs the tool, and opens/updates one GitHub Issue per drifted stack.
It builds on E0's substrate: a versioned JSON artifact (drift.json) plus a
structured diff between two reports as the signal (diffReports, mirroring
E0's diffSnapshots). Parts A + B verify the classifier + diff + ticket logic
offline; Part C is the live scheduled plan.
Part A — Offline (no cloud, ~5 min)
A1. Build + typecheck
A2. Unit tests
Expect: 3 suites, 18 tests pass (action classifier + report build + diff; ticket construction + GitHub upsert idempotency + dry-run; markdown rendering).
A3. Offline CLI — clean vs. drifted, plus the diff
# Clean plan → no drift, exit 0, no ticket:
node dist/index.js --stack sandbox --input examples/plan.clean.json \
--out-dir /tmp/drift-clean
# Drifted plan → 3 resources, exit 2 with --fail-on-drift, dry-run ticket:
node dist/index.js --stack payments-prod --input examples/plan.drifted.json \
--out-dir /tmp/drift-1 --fail-on-drift true ; echo "exit=$?"
# Re-run diffed against the first report (no new drift this time):
node dist/index.js --stack payments-prod --input examples/plan.drifted.json \
--baseline /tmp/drift-1/drift.json --out-dir /tmp/drift-2
cat /tmp/drift-1/summary.md
Confirm:
/tmp/drift-clean/drift.jsonhasdrifted: false,summary.total: 0.- The drifted run prints
DRIFT — 3 resource(s)and exits2. /tmp/drift-1/summary.mdlists the storage-accountupdate, the NSG-rulecreate, and the storage-containerreplace(the data-sourcereadis excluded)./tmp/drift-1/issue.mdcarries the<!-- snowops-drift:stack=payments-prod -->dedupe marker.schemaVersion: "1.0"in everydrift.json.
A4. Diff-as-signal (newly-drifted detection)
# Baseline: only the storage-account update drifts.
cat > /tmp/base-plan.json <<'JSON'
{ "resource_changes": [
{ "address": "azurerm_storage_account.state", "type": "azurerm_storage_account",
"name": "state", "mode": "managed", "change": { "actions": ["update"] } } ] }
JSON
node dist/index.js --stack sandbox --input /tmp/base-plan.json --out-dir /tmp/drift-base
# Current: the storage-account drift plus a NEW NSG rule.
node dist/index.js --stack sandbox --input examples/plan.drifted.json \
--baseline /tmp/drift-base/drift.json --out-dir /tmp/drift-cur
grep -A4 "Newly drifted" /tmp/drift-cur/summary.md
Confirm the ## Change since … section shows Drift worsened and the
### Newly drifted list includes the NSG rule + storage container.
Part B — Workflow lint (~1 min)
Confirm the workflow carries: schedule cron + workflow_dispatch;
permissions.issues: write; the detect job's stack matrix; and the
Detect drift + file/update ticket step invoking dist/index.js --open-issue github.
Part C — Live scheduled plan (~15 min, ~$0)
Prerequisite: the OIDC SP for the sandbox environment has Reader on the
sandbox subscription, and the stack matrix backend_* values match the real
sandbox backend.
- Trigger the workflow via
workflow_dispatch(Actions → drift-detection → Run). With no drift, the run is green and files no issue. - Seed drift out-of-band: change one sandbox resource in the Azure Portal that
Terraform manages (e.g. add a tag, or bump a storage account's
min_tls_version). - Re-run the workflow. Confirm:
- The
Detect drift + file/update ticketstep reportsDRIFT — N resource(s). - A GitHub Issue labelled
driftis opened forsandboxwith the resource table. - A
drift-sandbox-<run_id>artifact is attached. - Re-run again without fixing the drift → the same issue is updated
(not duplicated). Verify only one open
driftissue exists for the stack. - Reconcile the drift (revert the portal change or re-apply via C1), re-run, and close the issue by hand.
Pass criteria
-
npm testgreen (3 suites, 18 tests) - Offline clean run:
drifted: false, exit 0; drifted run: 3 resources, exit 2 (A3) - Diff shows the newly-drifted resources against a baseline (A4)
-
drift-detection.ymlparses withissues: write+ the matrix + the detect step (B) - (Live) seeded drift opens one
driftissue; re-run updates it in place (C)
Teardown
- Offline:
rm -rf /tmp/drift-clean /tmp/drift-1 /tmp/drift-2 /tmp/drift-base /tmp/drift-cur /tmp/base-plan.json. - Live: revert the seeded portal change; close the drift issue; artifacts auto-expire (30-day retention).
Sign-off
- Tester: Sagar Chhabra | Date: 3/6/2026 | Result: PASS
- Notes: Only offline part done