Skip to content

Runbook Testing Sequence & Sign-Off Order

Last Updated: 2026-06-04
Status: 4 of 44 runbooks signed off (D1, D3, X3, C4, R1)


Overview

This document defines the canonical order for executing manual test runbooks across all SnowOps assets. The sequence respects dependency flows: foundational tools before consumers, detection before visibility, and setup before ops.

Each runbook takes ~5–15 min to complete. Total time to full sign-off: ~6–8 hours over multiple sessions.


Master Sequence

Phase 1: Visualization & Detection (12 min)

Foundation: outputs generation, visualization, and anomaly detection.

Order Asset Name Dependencies Time Notes
1 V2 Architecture Diagrams None (offline) 10m Golden-file test; no cloud auth required
2 V3 Runbook Generator None (offline) 5m Markdown generation from TF outputs
3 E0 Evidence Collector V2, V3 (use outputs) 10m Snapshot + compliance evidence

Rationale: V2 and V3 are 100% offline; they generate client-facing artifacts from TF outputs (F0 contracts). E0 depends on having evidence collection working before we can validate compliance dashboards.


Phase 2: Detection & Visibility (15 min)

Anomaly detection and compliance visibility.

Order Asset Name Dependencies Time Notes
4 S1 Drift Detector E0 (evidence format) 10m Detects delta between plan + state → ticket
5 S2 Compliance Dashboard E0 (snapshots) 5m Displays evidence history + L4 DR panel

Rationale: S1 detects drift; S2 visualizes compliance. Both depend on E0 generating snapshots.


Phase 3: Reliability & Disaster Recovery (20 min)

Backup, replication, and failover.

Order Asset Name Dependencies Time Notes
6 L1 Backup Policy None 5m Defines backup schedule, retention
7 L2 Cross-Region Replication L1 (backup foundation) 8m Replicates backed-up state across regions
8 L4 Restore Drill Automation L1, L2 (backups in place) 10m Restore → validate → teardown → report

Rationale: L1 establishes backup baseline; L2 adds geographic redundancy; L4 validates the entire restore chain end-to-end.


Phase 4: Bootstrap & Registry (10 min)

Client setup and module management.

Order Asset Name Dependencies Time Notes
9 B6 Client Bootstrap None 5m Prerequisite checker + permission validator
10 F11 Module Registry B6 (client env validated) 5m Version manifest + pin audit

Rationale: B6 validates the client environment before we attempt any module operations. F11 depends on a clean environment.


Phase 5: Sandbox Cleanup & Utilities (5 min)

Order Asset Name Dependencies Time Notes
11 X7 Ephemeral RG Cleanup None 5m Nightly cleanup of sandbox resources

Rationale: Sandbox hygiene; can run in parallel with other work.


Phase 6: Cost & Identity Utilities (10 min)

Order Asset Name Dependencies Time Notes
12 U1 Utility: Cost export B6 (bootstrap) 5m Pull cost data for billing
13 U2 Utility: Identity policies B6 (bootstrap) 5m Generate AAD policy reports

Rationale: Depends on client environment being set up.


Phase 7: Azure Networking (20 min)

Foundational cloud infrastructure modules.

Order Asset Name Dependencies Time Notes
14 N5 Network Security Groups None 8m F-module: NSG rules + test import block
15 N6 Route Tables N5 (NSG foundation) 8m F-module: routes + test import block

Rationale: Basic network foundation before cluster/workload modules.


Phase 8: Core Infrastructure Modules (45 min)

F-series (Terraform modules) + monitoring.

Order Asset Name Dependencies Time Notes
16 M1 Resource Group None 5m F-module: RG + tags
17 M2 Monitoring (Log Analytics) M1 (RG) 8m F-module: LA workspace + retention
18 M3 Alert Rules M2 (LA workspace) 8m F-module: metric + log alerts
19 M6 Managed Grafana M2, M3 (LA, alerts) 10m F-module: Grafana instance + datasources
20 J1 Identity: Service Principal Registry M1 (RG) 5m F-module: SP credential rotation automation
21 J2 Identity: AAD Roles M1 (RG) 8m F-module: custom roles + assignments
22 J6 Identity: Workload Identity Bindings M1, J1, J2 (identity foundation) 10m F-module: pod identity → storage/KV bindings

Rationale: M1 creates the RG; M2/M3/M6 build the observability stack; J1/J2/J6 establish the identity foundation for workloads.


Phase 9: Advanced Infrastructure (30 min)

Container registry, secrets, state management.

Order Asset Name Dependencies Time Notes
23 H5 Service Principal Rotation J1, J2 (identity) 8m F-module: automated SP credential rotation
24 H7 Azure Automation Account M1 (RG) 5m F-module: automation account for runbooks
25 F8 ArgoCD GitOps J1, J2, J6 (identity) + M1, M2 (monitoring) 15m Helm + kyverno policies + app-of-apps

Rationale: H5/H7 are identity/automation ops. F8 (GitOps) depends on identity + monitoring in place.


Phase 10: Compute & Storage Modules (40 min)

Order Asset Name Dependencies Time Notes
26 B5 AKS Private Cluster M1, N5, N6, J1, J2, J6 (all foundation) 15m F-module: AKS + private link + workload identity
27 B2 Container Registry M1, B5 (RG + AKS) 8m F-module: ACR + network rules + purge policy
28 C3 Azure Container Insights M2, B5 (LA + AKS) 8m F-module: AKS monitoring → LA workspace
29 C2 Key Vault M1, J1, J2, J6 (RG + identity) 8m F-module: KV + RBAC + network rules
30 H1 Storage Account M1, C2 (RG + KV) 8m F-module: storage + encryption + access tiers
31 H2 Cosmos DB M1, C2 (RG + KV) 8m F-module: Cosmos + encryption + network rules
32 H3 SQL Database M1, C2 (RG + KV) 8m F-module: SQL + encryption + audit logging

Rationale: B5 is the compute foundation; B2 stores images. C3 monitors it. C2 (KV) protects secrets. H1/H2/H3 are stateful data stores, all depend on KV.


Phase 11: Data, Networking & Foundational Policies (35 min)

Order Asset Name Dependencies Time Notes
33 F3 Storage Firewall Rules H1, N5, N6 (storage + NSG) 5m F-module: fine-grained NSG + storage rules
34 F5 Network Peering N5, N6 (networks) 5m F-module: hub ↔ spoke peering
35 F4 Private Endpoints H1, H2, H3, C2, B2 (services) 8m F-module: private endpoints for services
36 D4 Kyverno Policies B5 (AKS) 8m Policy: pod security + image verification + labels
37 F2 RBAC Role Definitions J1, J2, J6 (identity) 5m F-module: custom Azure RBAC roles
38 F0 Cloud-Agnostic Contracts All F-modules (definition) 5m Contract: validates all F0 outputs across modules

Rationale: F3–F5 refine networking. D4 enforces pod security on AKS. F2 defines the RBAC contract. F0 is the schema; validate once all modules exist.


Phase 12: CI/CD Pipelines (15 min)

Order Asset Name Dependencies Time Notes
39 B1 GitHub Onboarder B6 (bootstrap), C1–C3 (pipelines exist) 10m GitHub App + repo provisioning on Closed Won

Rationale: B1 consumes C1–C3 workflows; must come after C2/C3 are tested.


Phase 13: Legacy Runbooks (Optional, ~20 min)

Order Asset Name Dependencies Time Notes
40+ M1–M3 era Legacy modules Deprecated; signoff optional Archive or migrate to new naming

Rationale: Only if you have legacy M1-era assets that haven't been migrated.


Quick Reference: Current Sign-Off Status

✅ SHIPPED (4):      D1, D3, X3, C4, R1
🟦 CODE-COMPLETE:    E0, V2, V3, S1, S2, K1, K2, L1, L2, L4, F12, B6, F11, D5, [44 total minus shipped]
⏳ PENDING:          All phases above

Execution Tips

Before You Start

  • Verify sandbox subscription access (PIM activation if needed)
  • Clone repo: git clone https://github.com/snowopsdev/snowops-automation.git
  • Install tooling:
    # Terraform, Terragrunt, kubectl, helm, tflint, conftest, gitleaks, checkov
    brew install terraform terragrunt kubectl helm checkov gitleaks
    brew tap terraform-linters/tflint && brew install tflint
    brew tap instrumenta/instrumenta && brew install conftest
    

During Execution

  1. Work in phases — don't jump around. Phase 1 runbooks must all pass before Phase 2.
  2. Fill sign-off blocks — every runbook has a "Sign-off" section with tester/date/result.
  3. Keep a log — maintain a simple checklist (below) in your local notes.
  4. Link PRs — if you discover a bug, create an issue + PR; reference in the runbook notes.
  5. Reusable workflows — C1–C3 pipeline tests can be batched; verify once, reference in C2/C3.

Sample Execution Log

## Execution Log — June 4, 2026

| Phase | Asset | Tester | Date | Result | Notes |
|-------|-------|--------|------|--------|-------|
| 1 | V2 | Sagar | 2026-06-04 | PASS | Golden-file matches; d2 render skipped (no d2 binary) |
| 1 | V3 | Sagar | 2026-06-04 | PASS | All markdown templates render correctly |
| 1 | E0 | Sagar | 2026-06-04 | FAIL | Vanta adapter missing; opened issue #XXX |
| — | — | — | — | — | (continued next session) |

Dependencies Map (Graphical Reference)

┌─────────────────────────────────────────────────────────────────┐
│ PHASE 1: Visualization & Detection                             │
│ V2 (diagrams) → V3 (runbooks) → E0 (evidence)                 │
└──────────────────────┬──────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 2: Detection & Visibility                                │
│ S1 (drift) → S2 (dashboard)   [depends: E0]                   │
└──────────────────────┬──────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 3: Reliability (L1 → L2 → L4)                            │
└──────────────────────┬──────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 4–5: Bootstrap, Registry, Utilities                       │
│ B6 → F11, X7, U1, U2                                           │
└──────────────────────┬──────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 6–7: Networking Foundation                               │
│ N5 → N6                                                         │
└──────────────────────┬──────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 8: Core Infrastructure                                   │
│ M1 → M2 → M3 → M6                                              │
│ M1 → J1 → J2 → J6                                              │
└──────────────────────┬──────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 9: Advanced Ops                                           │
│ J1, J2 → H5 (SP rotation)                                      │
│ J1, J2, J6 + M1, M2 → F8 (GitOps)                              │
└──────────────────────┬──────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 10: Compute & Data                                        │
│ (M1 + N5, N6 + J1, J2, J6) → B5 (AKS)                          │
│ M1, B5 → B2 (ACR)                                              │
│ M2, B5 → C3 (Container Insights)                               │
│ M1, J1, J2, J6 → C2 (KV)                                       │
│ M1, C2 → H1, H2, H3 (storage/DB)                               │
└──────────────────────┬──────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 11: Data & Policy                                         │
│ H1, N5, N6 → F3 (firewall)                                     │
│ N5, N6 → F5 (peering)                                          │
│ H1, H2, H3, C2, B2 → F4 (private endpoints)                    │
│ B5 → D4 (Kyverno)                                              │
│ J1, J2, J6 → F2 (RBAC)                                         │
│ (All F-modules) → F0 (contracts)                               │
└──────────────────────┬──────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 12: CI/CD                                                │
│ (C1, C2, C3 ready) + B6 → B1 (GitHub Onboarder)               │
└─────────────────────────────────────────────────────────────────┘

FAQs

Q: Can I skip a runbook?
A: No. Every asset must have its sign-off block filled. If the asset is not applicable (N/A), mark result as "N/A" with a note.

Q: What if a runbook fails?
A: Create a bug issue, link it in the runbook notes, and move to the next asset. Return to the failed asset after the bug is fixed.

Q: How do I run runbooks in parallel?
A: Within a phase, assets with no inter-dependencies can run in parallel. For example, in Phase 1, you could theoretically run V2 + V3 in parallel, but they're short enough to do sequentially. In Phase 10, B2 and C3 are independent; you could test both at once if you have two terminals.

Q: Who approves the sign-offs?
A: For now, Sagar. Once we ship, community contributors can propose sign-offs; Sagar reviews.

Q: How often do I re-run a signed-off runbook?
A: Only if the asset's code changes. If the code is stable, you don't need to re-run it. The sign-off is a watermark for "this works in sandbox as of this date."


Next Steps

  1. Print or bookmark this page.
  2. Start Phase 1 with V2 (offline, ~10 min).
  3. Update your sign-off as you go — copy the table above into a spreadsheet or Markdown file.
  4. After each phase, update docs/context/06-project-state.md with the runbook backlog progress.
  5. When all 44 are complete, update CLAUDE.md § Machine State block and commit.

Document version: 1.0
Sync with: CLAUDE.md (§ 0. Session Handoff), docs/context/06-project-state.md, docs/context/09-testing-dod.md