Manual Test Runbook — C1: terraform-plan-apply reusable workflow
Owner: Sagar | Time: ~30 min | Sandbox: X1
Purpose
Validate that .github/workflows/terraform-plan-apply.yml (C1) correctly:
- Lints fmt on every event.
- Plans against the X1 sandbox stack and posts a PR comment.
- Gates apply behind the
sandboxGitHub Environment and runs it only on merge tomainwhen the plan reported changes.
Prerequisites
- X1 sandbox subscription provisioned and
sandbox/applied at least once (DoD criterion 4 for X1 may still be pending; that's OK — C1 only needs the state backend to exist). - GitHub repo has these repository variables set (Settings → Secrets and variables → Actions → Variables):
SANDBOX_STATE_RG=snowops-sandbox-state-rgSANDBOX_STATE_SA=snowopssandboxstate<random>(frombootstrap.shoutput)SANDBOX_STATE_KEY=sandbox.tfstate- GitHub repo has these repository secrets:
AZURE_CLIENT_ID= the Azure AD app registration consuming OIDC (created out-of-band; B2 will codify this)AZURE_TENANT_IDAZURE_SUBSCRIPTION_ID- GitHub Environment
sandboxexists. Optionally configure: - Required reviewers (Sagar, Nidhi)
- Wait timer
- Deployment branch rule:
mainonly - Azure AD federated credential on the app registration permits this repo. Two subjects, one each for plan and apply:
repo:<org>/snowops-automation:pull_request— for plan-on-PRrepo:<org>/snowops-automation:environment:sandbox— for apply- Repo branch protection:
mainrequires thesandbox-plan-apply / terraform / terraform plancheck before merge.
Steps
1. Confirm local prerequisites
terraform -chdir=sandbox fmt -check -recursive
terraform -chdir=sandbox init -backend=false -input=false
terraform -chdir=sandbox validate
- All three commands pass cleanly.
2. Open a no-op PR
Create a branch with a comment-only change in sandbox/:
git checkout -b test/c1-no-op
printf '\n# c1 test\n' >> sandbox/README.md
git commit -am "test: trigger C1 with no-op change"
git push -u origin test/c1-no-op
gh pr create --title "Test C1: no-op" --body "Verifies terraform-plan-apply on a no-op."
-
sandbox-plan-applyworkflow runs. -
fmtjob succeeds. -
planjob succeeds with exit code0(no infra changes). - A PR comment titled "Terraform plan —
sandbox" appears reading "✅ No changes — plan matches state." - No
applyjob runs (PR event).
3. Open a change-bearing PR
git checkout -b test/c1-budget-bump
# Bump the sandbox budget cap in sandbox/terraform.tfvars by 50, or
# add a placeholder tag to extra_tags via tfvars.
git commit -am "test: bump sandbox budget for C1 verification"
git push -u origin test/c1-budget-bump
gh pr create --title "Test C1: changes" --body "Verifies plan comment + apply gate."
- PR comment is updated in place (not duplicated) on subsequent pushes.
- Comment shows "📝 Changes proposed" with the expected diff in the
<details>block. - Workflow run shows
tfplan-<run-id>artifact uploaded. - No apply yet (PR event).
4. Merge to main and watch the gate
- Merge the PR (squash). Confirm the
push-triggered run starts. -
planjob succeeds with exit2(changes). -
applyjob enters "Waiting for review" if reviewers required, OR runs immediately if none. - Apply approval succeeds;
terraform applyconsumes the savedtfplan.binaryand completes without re-planning. - Azure portal reflects the new state (e.g., budget amount updated).
5. Drift test (manual mutation)
# Manually edit the budget in the portal (or via az CLI) to a different amount.
az consumption budget list --query "[?name=='snowops-sandbox-monthly-budget']"
- Open a no-op PR. Plan now reports the manual drift as a change to revert.
- This validates that state is being read from the remote backend, not from a local file.
6. Failure-mode probe
- Temporarily introduce a syntax error in
sandbox/budget.tf. Push to a PR. -
fmtjob fails fast (no Azure auth attempted). Revert.
Pass criteria
- Plan comment renders on PRs and updates in place on resync.
- No apply runs on PR events.
- Apply runs only on push-to-main + plan-exit-2 + env approval (when required).
- Plan binary artifact is the one consumed by apply (no replan between approval and apply).
-
fmtfailure blocks the workflow before any auth attempt. - Drift surfaces on the next no-op PR.
Failure modes & escalation
| Symptom | Likely cause | Action |
|---|---|---|
azure/login fails with AADSTS70025 / 70021 |
Federated credential subject mismatch | Verify subject strings exactly match repo:<org>/<repo>:pull_request and …:environment:sandbox |
| Plan succeeds but apply hangs at "Waiting" | Reviewers not set on environment | Edit environment → Required reviewers, or remove the requirement |
| Apply re-plans from scratch | tfplan.binary artifact missing or mismatched run |
Check artifact upload step ran; ensure same github.run_id between jobs |
| Plan comment posts twice | Existing-comment detection broke (e.g., header changed) | Inspect actions/github-script step in workflow logs |
Backend reinit required |
Lock file drifted | Re-run; cache key includes lock-file hash and refreshes |
Sign-off
- Tester: ___ | Date: _ | Result: PASS / FAIL / N/A
- Notes: