Runbook Testing Sequence & Sign-Off Order

Last Updated: 2026-06-04
Status: 4 of 44 runbooks signed off (D1, D3, X3, C4, R1)

Overview

This document defines the canonical order for executing manual test runbooks across all SnowOps assets. The sequence respects dependency flows: foundational tools before consumers, detection before visibility, and setup before ops.

Each runbook takes ~5–15 min to complete. Total time to full sign-off: ~6–8 hours over multiple sessions.

Master Sequence

Phase 1: Visualization & Detection (12 min)

Foundation: outputs generation, visualization, and anomaly detection.

Order	Asset	Name	Dependencies	Time	Notes
1	V2	Architecture Diagrams	None (offline)	10m	Golden-file test; no cloud auth required
2	V3	Runbook Generator	None (offline)	5m	Markdown generation from TF outputs
3	E0	Evidence Collector	V2, V3 (use outputs)	10m	Snapshot + compliance evidence

Rationale: V2 and V3 are 100% offline; they generate client-facing artifacts from TF outputs (F0 contracts). E0 depends on having evidence collection working before we can validate compliance dashboards.

Phase 2: Detection & Visibility (15 min)

Anomaly detection and compliance visibility.

Order	Asset	Name	Dependencies	Time	Notes
4	S1	Drift Detector	E0 (evidence format)	10m	Detects delta between plan + state → ticket
5	S2	Compliance Dashboard	E0 (snapshots)	5m	Displays evidence history + L4 DR panel

Rationale: S1 detects drift; S2 visualizes compliance. Both depend on E0 generating snapshots.

Phase 3: Reliability & Disaster Recovery (20 min)

Backup, replication, and failover.

Order	Asset	Name	Dependencies	Time	Notes
6	L1	Backup Policy	None	5m	Defines backup schedule, retention
7	L2	Cross-Region Replication	L1 (backup foundation)	8m	Replicates backed-up state across regions
8	L4	Restore Drill Automation	L1, L2 (backups in place)	10m	Restore → validate → teardown → report

Rationale: L1 establishes backup baseline; L2 adds geographic redundancy; L4 validates the entire restore chain end-to-end.

Phase 4: Bootstrap & Registry (10 min)

Client setup and module management.

Order	Asset	Name	Dependencies	Time	Notes
9	B6	Client Bootstrap	None	5m	Prerequisite checker + permission validator
10	F11	Module Registry	B6 (client env validated)	5m	Version manifest + pin audit

Rationale: B6 validates the client environment before we attempt any module operations. F11 depends on a clean environment.

Phase 5: Sandbox Cleanup & Utilities (5 min)

Order	Asset	Name	Dependencies	Time	Notes
11	X7	Ephemeral RG Cleanup	None	5m	Nightly cleanup of sandbox resources

Rationale: Sandbox hygiene; can run in parallel with other work.

Phase 6: Cost & Identity Utilities (10 min)

Order	Asset	Name	Dependencies	Time	Notes
12	U1	Utility: Cost export	B6 (bootstrap)	5m	Pull cost data for billing
13	U2	Utility: Identity policies	B6 (bootstrap)	5m	Generate AAD policy reports

Rationale: Depends on client environment being set up.

Phase 7: Azure Networking (20 min)

Foundational cloud infrastructure modules.

Order	Asset	Name	Dependencies	Time	Notes
14	N5	Network Security Groups	None	8m	F-module: NSG rules + test import block
15	N6	Route Tables	N5 (NSG foundation)	8m	F-module: routes + test import block

Rationale: Basic network foundation before cluster/workload modules.

Phase 8: Core Infrastructure Modules (45 min)

F-series (Terraform modules) + monitoring.

Order	Asset	Name	Dependencies	Time	Notes
16	M1	Resource Group	None	5m	F-module: RG + tags
17	M2	Monitoring (Log Analytics)	M1 (RG)	8m	F-module: LA workspace + retention
18	M3	Alert Rules	M2 (LA workspace)	8m	F-module: metric + log alerts
19	M6	Managed Grafana	M2, M3 (LA, alerts)	10m	F-module: Grafana instance + datasources
20	J1	Identity: Service Principal Registry	M1 (RG)	5m	F-module: SP credential rotation automation
21	J2	Identity: AAD Roles	M1 (RG)	8m	F-module: custom roles + assignments
22	J6	Identity: Workload Identity Bindings	M1, J1, J2 (identity foundation)	10m	F-module: pod identity → storage/KV bindings

Rationale: M1 creates the RG; M2/M3/M6 build the observability stack; J1/J2/J6 establish the identity foundation for workloads.

Phase 9: Advanced Infrastructure (30 min)

Container registry, secrets, state management.

Order	Asset	Name	Dependencies	Time	Notes
23	H5	Service Principal Rotation	J1, J2 (identity)	8m	F-module: automated SP credential rotation
24	H7	Azure Automation Account	M1 (RG)	5m	F-module: automation account for runbooks
25	F8	ArgoCD GitOps	J1, J2, J6 (identity) + M1, M2 (monitoring)	15m	Helm + kyverno policies + app-of-apps

Rationale: H5/H7 are identity/automation ops. F8 (GitOps) depends on identity + monitoring in place.

Phase 10: Compute & Storage Modules (40 min)

Order	Asset	Name	Dependencies	Time	Notes
26	B5	AKS Private Cluster	M1, N5, N6, J1, J2, J6 (all foundation)	15m	F-module: AKS + private link + workload identity
27	B2	Container Registry	M1, B5 (RG + AKS)	8m	F-module: ACR + network rules + purge policy
28	C3	Azure Container Insights	M2, B5 (LA + AKS)	8m	F-module: AKS monitoring → LA workspace
29	C2	Key Vault	M1, J1, J2, J6 (RG + identity)	8m	F-module: KV + RBAC + network rules
30	H1	Storage Account	M1, C2 (RG + KV)	8m	F-module: storage + encryption + access tiers
31	H2	Cosmos DB	M1, C2 (RG + KV)	8m	F-module: Cosmos + encryption + network rules
32	H3	SQL Database	M1, C2 (RG + KV)	8m	F-module: SQL + encryption + audit logging

Rationale: B5 is the compute foundation; B2 stores images. C3 monitors it. C2 (KV) protects secrets. H1/H2/H3 are stateful data stores, all depend on KV.

Phase 11: Data, Networking & Foundational Policies (35 min)

Order	Asset	Name	Dependencies	Time	Notes
33	F3	Storage Firewall Rules	H1, N5, N6 (storage + NSG)	5m	F-module: fine-grained NSG + storage rules
34	F5	Network Peering	N5, N6 (networks)	5m	F-module: hub ↔ spoke peering
35	F4	Private Endpoints	H1, H2, H3, C2, B2 (services)	8m	F-module: private endpoints for services
36	D4	Kyverno Policies	B5 (AKS)	8m	Policy: pod security + image verification + labels
37	F2	RBAC Role Definitions	J1, J2, J6 (identity)	5m	F-module: custom Azure RBAC roles
38	F0	Cloud-Agnostic Contracts	All F-modules (definition)	5m	Contract: validates all F0 outputs across modules

Rationale: F3–F5 refine networking. D4 enforces pod security on AKS. F2 defines the RBAC contract. F0 is the schema; validate once all modules exist.

Phase 12: CI/CD Pipelines (15 min)

Order	Asset	Name	Dependencies	Time	Notes
39	B1	GitHub Onboarder	B6 (bootstrap), C1–C3 (pipelines exist)	10m	GitHub App + repo provisioning on Closed Won

Rationale: B1 consumes C1–C3 workflows; must come after C2/C3 are tested.

Phase 13: Legacy Runbooks (Optional, ~20 min)

Order	Asset	Name	Dependencies	Time	Notes
40+	M1–M3 era	Legacy modules	Deprecated; signoff optional	—	Archive or migrate to new naming

Rationale: Only if you have legacy M1-era assets that haven't been migrated.

Quick Reference: Current Sign-Off Status

✅ SHIPPED (4):      D1, D3, X3, C4, R1
🟦 CODE-COMPLETE:    E0, V2, V3, S1, S2, K1, K2, L1, L2, L4, F12, B6, F11, D5, [44 total minus shipped]
⏳ PENDING:          All phases above

Execution Tips

Before You Start

Verify sandbox subscription access (PIM activation if needed)
Clone repo: git clone https://github.com/snowopsdev/snowops-automation.git

Install tooling:

# Terraform, Terragrunt, kubectl, helm, tflint, conftest, gitleaks, checkov
brew install terraform terragrunt kubectl helm checkov gitleaks
brew tap terraform-linters/tflint && brew install tflint
brew tap instrumenta/instrumenta && brew install conftest

During Execution

Work in phases — don't jump around. Phase 1 runbooks must all pass before Phase 2.
Fill sign-off blocks — every runbook has a "Sign-off" section with tester/date/result.
Keep a log — maintain a simple checklist (below) in your local notes.
Link PRs — if you discover a bug, create an issue + PR; reference in the runbook notes.
Reusable workflows — C1–C3 pipeline tests can be batched; verify once, reference in C2/C3.

Sample Execution Log

## Execution Log — June 4, 2026

| Phase | Asset | Tester | Date | Result | Notes |
|-------|-------|--------|------|--------|-------|
| 1 | V2 | Sagar | 2026-06-04 | PASS | Golden-file matches; d2 render skipped (no d2 binary) |
| 1 | V3 | Sagar | 2026-06-04 | PASS | All markdown templates render correctly |
| 1 | E0 | Sagar | 2026-06-04 | FAIL | Vanta adapter missing; opened issue #XXX |
| — | — | — | — | — | (continued next session) |

Dependencies Map (Graphical Reference)

┌─────────────────────────────────────────────────────────────────┐
│ PHASE 1: Visualization & Detection                             │
│ V2 (diagrams) → V3 (runbooks) → E0 (evidence)                 │
└──────────────────────┬──────────────────────────────────────────┘
                       ↓
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 2: Detection & Visibility                                │
│ S1 (drift) → S2 (dashboard)   [depends: E0]                   │
└──────────────────────┬──────────────────────────────────────────┘
                       ↓
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 3: Reliability (L1 → L2 → L4)                            │
└──────────────────────┬──────────────────────────────────────────┘
                       ↓
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 4–5: Bootstrap, Registry, Utilities                       │
│ B6 → F11, X7, U1, U2                                           │
└──────────────────────┬──────────────────────────────────────────┘
                       ↓
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 6–7: Networking Foundation                               │
│ N5 → N6                                                         │
└──────────────────────┬──────────────────────────────────────────┘
                       ↓
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 8: Core Infrastructure                                   │
│ M1 → M2 → M3 → M6                                              │
│ M1 → J1 → J2 → J6                                              │
└──────────────────────┬──────────────────────────────────────────┘
                       ↓
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 9: Advanced Ops                                           │
│ J1, J2 → H5 (SP rotation)                                      │
│ J1, J2, J6 + M1, M2 → F8 (GitOps)                              │
└──────────────────────┬──────────────────────────────────────────┘
                       ↓
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 10: Compute & Data                                        │
│ (M1 + N5, N6 + J1, J2, J6) → B5 (AKS)                          │
│ M1, B5 → B2 (ACR)                                              │
│ M2, B5 → C3 (Container Insights)                               │
│ M1, J1, J2, J6 → C2 (KV)                                       │
│ M1, C2 → H1, H2, H3 (storage/DB)                               │
└──────────────────────┬──────────────────────────────────────────┘
                       ↓
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 11: Data & Policy                                         │
│ H1, N5, N6 → F3 (firewall)                                     │
│ N5, N6 → F5 (peering)                                          │
│ H1, H2, H3, C2, B2 → F4 (private endpoints)                    │
│ B5 → D4 (Kyverno)                                              │
│ J1, J2, J6 → F2 (RBAC)                                         │
│ (All F-modules) → F0 (contracts)                               │
└──────────────────────┬──────────────────────────────────────────┘
                       ↓
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 12: CI/CD                                                │
│ (C1, C2, C3 ready) + B6 → B1 (GitHub Onboarder)               │
└─────────────────────────────────────────────────────────────────┘

FAQs

Q: Can I skip a runbook?
A: No. Every asset must have its sign-off block filled. If the asset is not applicable (N/A), mark result as "N/A" with a note.

Q: What if a runbook fails?
A: Create a bug issue, link it in the runbook notes, and move to the next asset. Return to the failed asset after the bug is fixed.

Q: How do I run runbooks in parallel?
A: Within a phase, assets with no inter-dependencies can run in parallel. For example, in Phase 1, you could theoretically run V2 + V3 in parallel, but they're short enough to do sequentially. In Phase 10, B2 and C3 are independent; you could test both at once if you have two terminals.

Q: Who approves the sign-offs?
A: For now, Sagar. Once we ship, community contributors can propose sign-offs; Sagar reviews.

Q: How often do I re-run a signed-off runbook?
A: Only if the asset's code changes. If the code is stable, you don't need to re-run it. The sign-off is a watermark for "this works in sandbox as of this date."

Next Steps

Print or bookmark this page.
Start Phase 1 with V2 (offline, ~10 min).
Update your sign-off as you go — copy the table above into a spreadsheet or Markdown file.
After each phase, update docs/context/06-project-state.md with the runbook backlog progress.
When all 44 are complete, update CLAUDE.md § Machine State block and commit.

Document version: 1.0
Sync with: CLAUDE.md (§ 0. Session Handoff), docs/context/06-project-state.md, docs/context/09-testing-dod.md