Skip to content

SnowOps — Project State

Current Phase

  • M1: Sign-offs in flight
  • M2a: Core Baseline FUNCTIONALLY CODE-COMPLETE (v0.40) — every non-postponed asset is 🟦+. W-series postponed (D35).
  • M2b: 14-asset core CODE-COMPLETE — E0 (v0.42) + V2 + V3 (v0.43) + S1 (v0.44) + S2 (v0.45) + L1 (v0.46) + L2 (v0.47) + L4 (v0.48) + F12 (v0.49) + B6 (v0.50) + F11 (v0.51) shipped; K1 + K2 (PR #12) + D5 (PR #13) landed externally and merged. 14/14 in-repo — nothing external remaining. Focus now: runbook sign-offs.
  • GTM: Track A COMPLETE (v0.34) — 39 files under gtm/. Awaiting human sign-offs.

Last Updated: 2026-05-31 (v0.51)


In Flight

Item Status Notes
M2b 14-asset core 🟦 code-complete all 14 in-repo (E0+V2+V3+S1+S2+L1+L2+L4+F12+B6+F11 + K1+K2+D5); nothing external remaining
M2a runbook sign-offs 🟧 X7+U+N+M+J offline parts are ~5 min each
GTM Track A 🟦 complete Human sign-offs pending (see §0)

Open Issues / Tech Debt (v0.55 repo review — 2026-06-04)

Surfaced by a full-repo review. Full detail + IDs in docs/context/10-gap-register.md (G15–G20). Fixed in the same pass except G18 (seeded, ongoing).

# Issue Severity Status
G15 B1 test red — apps/github-onboarder/src/load-template.test.ts asserted a stale workflow file list High ✅ Fixed — test now walks disk dynamically; 38/38 pass
G16 No PR CI job ran apps/* unit tests; B1 had no workflow High ✅ Fixed — added .github/workflows/app-tests.yml (dynamic matrix, 13 apps)
G17 CLAUDE.md §2 repo layout stale Med ✅ Fixed — §2 refreshed
G18 Zero ADRs despite 50 decisions; DoD #5 + §3 require ADRs Med 🟧 Seeded — docs/adr/ README + template + ADRs 0001–0004; backfill rest ongoing
G19 F7 live/validate.sh false-positive hclfmt warning (deprecated CLI flags) Low ✅ Fixed — modern hcl fmt --check, now a hard fail
G20 U3/K3 scaffold READMEs described absent behavior Low ✅ Fixed — "SCAFFOLD — postponed" banners + §2 tags

Verified passing in this review: terraform fmt -check (modules/live/sandbox) ✅ · Go terratest vet + compile ✅ · all 14 app suites now green (github-onboarder fixed) · live/validate.sh ✅. Tooling: node 26, go 1.26, terraform 1.15.


Runbook Sign-Off Backlog (Ordered)

Execute offline Parts A+B first (~5 min each). Cloud parts (C/D) are optional promotion-path extenders.

Priority Runbook Cloud? Time Cost
1 V2 docs/runbooks/test/V2.md No ~5 min $0
2 V3 docs/runbooks/test/V3.md No ~5 min $0
3 E0 docs/runbooks/test/E0.md Partial (Part C) ~6 min offline $0
4 X7 docs/runbooks/test/X7.md No ~5 min $0
5 U1 docs/runbooks/test/U1.md Yes (Part C) ~5 min offline $0
6 U2 docs/runbooks/test/U2.md Yes (Part C/D) ~5 min offline $0
7 N5 docs/runbooks/test/N5.md Yes (Part C/D) ~5 min offline $0
8 N6 docs/runbooks/test/N6.md Yes (Part C/D) ~5 min offline $0
9 M1 docs/runbooks/test/M1.md Yes ~5 min offline $0
10 M2 docs/runbooks/test/M2.md Yes (Part C) ~5 min offline ~$1
11 M3 docs/runbooks/test/M3.md Yes ~5 min offline $0
12 M6 docs/runbooks/test/M6.md Yes ~5 min offline $0
13 J1 docs/runbooks/test/J1.md Yes (Part C/D) ~5 min offline $0
14 J2 docs/runbooks/test/J2.md Yes (Part C/D) ~5 min offline $0
15 J6 docs/runbooks/test/J6.md Yes (Part C/D) ~5 min offline $0
16 H5 docs/runbooks/test/H5.md Yes (Part C) ~5 min offline $0
17 H7 docs/runbooks/test/H7.md Yes (Part C/D) ~5 min offline $0 (needs P1)
18 F8 docs/runbooks/test/F8.md Optional (kind) ~5 min offline $0
19 B5 docs/runbooks/test/B5.md Yes (Part C) ~5 min offline $0 (needs P2)
20 B4 docs/runbooks/test/B4.md Yes (Part C) ~8 min $0
21 B3 docs/runbooks/test/B3.md Yes (Part C) ~12 min $0
22 B2 docs/runbooks/test/B2.md Yes (Part C) ~25 min $0
23 C3 docs/runbooks/test/C3.md Yes (Parts C–F) ~75 min $0
24 C2 docs/runbooks/test/C2.md Yes (Parts C–E) ~40 min <$1
25 H1 docs/runbooks/test/H1.md Yes ~25 min $0
26 H2 docs/runbooks/test/H2.md Yes (needs P1) ~30 min $0
27 H3 docs/runbooks/test/H3.md Yes (needs P2) ~30 min $0
28 F3 docs/runbooks/test/F3.md Yes (Part C) ~30 min ~$10
29 F5 docs/runbooks/test/F5.md Yes (Part C) ~25 min ~$2
30 F4 docs/runbooks/test/F4.md Yes (Part C) ~30 min ~$5
31 D4 docs/runbooks/test/D4.md Optional (kind) ~5 min offline $0
32 F2 docs/runbooks/test/F2.md Yes (Part C) ~35 min ~$5
33 F0 docs/runbooks/test/F0.md No ~15 min $0
34 B1 docs/runbooks/test/B1.md Yes ~60 min $0
35+ D2, X1, X2, C1, G0–G6, A1, A5, F1, F6 Mix Various Various

M2b 14-Asset Core Progress (D36)

# Asset Scope Status Est. Time Cloud Cost
1 ✅ E0 Compliance snapshot (Policy + Defender score, wired to C1) 🟦 v0.42
2 ✅ V2 Architecture diagram generator (apps/diagram-generator/) 🟦 v0.43
3 ✅ V3 Runbook generator (apps/runbook-generator/) 🟦 v0.43
4 ✅ S1 Drift detection (scheduled terraform plan → ticket via TicketPlatform) 🟦 v0.44
5 ✅ S2 Azure Policy compliance dashboard (apps/compliance-dashboard/) 🟦 v0.45
6 ✅ K1 IR runbook library (docs/runbooks/incident/) 🟦 (external, merged)
7 ✅ K2 On-call integration (modules/azure/oncall-integration/) 🟦 (external, merged)
8 ✅ L1 Azure Backup policy module (modules/azure/backup-policy/) 🟦 v0.46
9 ✅ L2 Cross-region replication (object replication + SQL failover group) 🟦 v0.47
10 ✅ L4 Automated restore drill (apps/restore-drill/ → S2 DR panel) 🟦 v0.48
11 ✅ D5 Policy waiver engine (waivers/, OPA exception records, CI expiry enforcement) 🟦 (external, merged PR #13)
12 ✅ F12 Brownfield import library (modules/azure/import-blocks/, 9 modules) 🟦 v0.49
13 ✅ B6 Client self-service bootstrap (prerequisite checker + validator) (apps/client-bootstrap/) 🟦 v0.50
14 ✅ F11 Module versioning + private registry (apps/module-registry/) 🟦 v0.51

Rule (D36): ALL other M2b/M3 assets are POSTPONED until these 14 are code-complete + signed off. Depth before breadth.


Next 5 Selected (v0.53 — D48)

After the M2b 14-asset core went code-complete, the next 5 most-important postponed/unbuilt items were selected. Reconciliation: C5 (ADO pipelines) was found already built in commit 51c7fc4 — docs were stale; it's now marked 🟦 code-complete and dropped from the candidate set.

# Asset Rationale Status
1 I3 — CodeQL SAST No code-analysis layer existed; D2 covered only IaC/secrets 🟦 code-complete (v0.53)
2 I2 — Dependency scanning dependabot.yml existed but no PR gate / alert digest 🟦 code-complete (v0.53)
3 I1 — Container image scanning Closes G6 (non-K8s container security); reusable image scan 🟦 code-complete (v0.53)
4 E7 — TicketPlatform adapters Closes G8; unblocks E6/I5/K4/P3; generalizes the S1 seed 🟦 code-complete (v0.54)
5 F7 — Terragrunt live-infra reference The missing per-env/region module wiring repo; deploy enablement 🟦 code-complete (v0.55)

Done (v0.53): I1 + I2 + I3 — the M2a CI security-scanning suite (docs/runbooks/test/I1.md, I2.md, I3.md). Done (v0.54): E7 — apps/ticket-platform/ (GitHub/Jira/Linear/ADO adapters + CLI, 26 tests); S1 repointed (interface-compatible). Runbook docs/runbooks/test/E7.md. Done (v0.55): F7 — live/ Terragrunt reference (root + _envcommon + bootstrap + per-env/region units; baseline→net/kv/acr DAG; offline validate.sh). Runbook docs/runbooks/test/F7.md.

✅ Next-5 (D48) COMPLETE. Candidate next batch: runbook sign-offs (in parallel), then M3 tail (W4 client offboarding) / M2b additional (J4 alert pack, X5 pipeline integration tests, X8 synthetic monitoring) / M4 advanced.


Next 5 Selected (v0.56 — D51)

With the next-5 (D48) complete, the next 5 most-important postponed items were selected. Priority rule: depth before breadth (D36) + milestone order — finish the remaining M2b "additional" assets (M2b §84) before advancing to M3 tail / M4. All five are M2b. The heavier network items (N3 WAF, N4 DDoS — CO-owned, cloud-cost) are deferred to a later batch.

# Asset Rationale Status
1 J4 — Alert rule pack No detection-rule layer existed; identity/network/privilege/data-exfil KQL alerts over the J1 LAW, wired to K2 action groups 🟦 code-complete (v0.56)
2 I5 — Defender → ticket via E7 Newly unblocked by E7 (D49); first consumer proving the TicketPlatform adapter; closes the Defender-alert→ticket loop 🟦 code-complete (v0.58)
3 X5 — Pipeline integration tests M2a CI gates (C1–C3) had no integration test consumers; reusable-workflow test repos 🟦 code-complete (v0.57)
4 X8 — Synthetic monitoring No availability/latency synthetic probes; Azure Monitor standard webtests + alert rules 🟦 code-complete (v0.57)
5 R2 — Production change log Merged-PR → changelog generator; reuses E7 for change-record tickets where required 🟦 code-complete (v0.59)

✅ D51 batch COMPLETE (all 5): J4 (v0.56) · X5 + X8 (v0.57) · I5 (v0.58) · R2 (v0.59).

Done (v0.56): J4 — modules/azure/alert-rule-pack/ (curated scheduled-query alert rules across four threat domains — identity/privilege/network/data-exfil; domain toggles + per-rule overrides + freeform custom rules; consumes the J1 workspace + K2 action groups by ARM ID). Offline TestAlertRulePackValidate green. Runbook docs/runbooks/test/J4.md.

Done (v0.57) — X series complete (X5 + X8): - X5tests/pipeline-integration/ reusable-workflow contract gate (contract_check.py + test_contract_check.py, 11 unit tests; offline validate.sh; CI .github/workflows/pipeline-integration.yml) + live it-{container-build-sign,aks-deploy,terraform-plan-apply}.yml consumers driving the existing fixtures against the sandbox. Catches workflow_call interface drift across all callers offline. Runbook docs/runbooks/test/X5.md. - X8modules/azure/synthetic-monitoring/ (App Insights standard availability tests + per-test availability metric alerts; optional workspace-based AI component; consumes the J1 workspace + K2 action groups by ARM ID). Offline TestSyntheticMonitoringValidate green. Runbook docs/runbooks/test/X8.md.

X series: X1✅ X2✅ X3🟩 X4✅ X5✅ X6(ongoing runbooks) X7✅ X8✅ — all X assets now code-complete/shipped except the ongoing X6 runbook track.

Done (v0.58): I5 — apps/defender-ticketer/ (Defender for Cloud alerts → idempotent tickets via the E7 snowops-ticket CLI; the first E7 consumer). Pure normalize/filter/dedupe core behind a Collector seam; consumes E7 at run time (no build-coupling, per D49). 21 jest tests; offline dry-run + E7 output-contract verified. Runbook docs/runbooks/test/I5.md.

Done (v0.59): R2 — apps/change-log/ (production change log: merged PRs / squash commits → categorized Keep-a-Changelog markdown; pure categorize/render core behind a collector seam — git log / gh pr list / fixture; optional E7 change-record ticket per release via the same run-time bridge as I5). 32 jest tests; offline + live git log + prepend verified. Runbook docs/runbooks/test/R2.md. D51 batch complete — next: open backlog (runbook sign-offs in parallel; M4 advanced; heavier net N3/N4).


Sequenced Full Roadmap (Track B — Sagar)

Priority Action Status Milestone
Next code K1 + K2 (IR + on-call) 🟦 external (merged) M2b
After K-series L1 + L2 + L4 (backup + DR + restore drill) M2b
After L-series D5 (policy waivers) 🟦 external (merged PR #13) M2b
After D5 F12 (brownfield imports) M3
After F12 B6 (self-service bootstrap) 🟦 v0.50 M3
After B6 F11 (module versioning) 🟦 v0.51 M3
Then C5, E7, W4 (ADO + ticket adapters + client offboarding) M3
Then Advanced package (E1–E6 full, J3, O, P, Q, T, V1, V4) M4
Then Multi-cloud (F9, F10, W5, U3) M5
Last W1–W3 (multi-tenant) ⏸️ postponed after M2b/M3

GTM Track A — Status

Batch Assets Status Notes
A1 Y0, Y1, Y2, §3.8 🟦 drafted (v0.33) Awaiting Nidhi (Y1 claims) + Sagar (Y2 real numbers)
A2 Y3, Y4 🟦 drafted (v0.33) Awaiting Sagar's 50-account seed list
A3 Y5, Y6, Y7 🟦 drafted (v0.34)
A4 Y8, Y9 🟦 drafted (v0.34) Y8 needs brand assets; Y9 is synthetic
A5 Z0, Z1 🟦 drafted (v0.34)
A6 Y10, Y11, Y12, Y13 🟦 drafted (v0.34) Y12 needs counsel; Y13 needs HubSpot config
A7 Z2, Z3 🟦 drafted (v0.34) Unshipped delta assets flagged with milestone

Human prerequisites before going live with outbound: - Nidhi: compliance-claim review on Y1/Y5/Y7/Y9/Z2/Z3 + Y9 sanitization - Sagar: Y2 real numbers, Y3 50-account seed list, Y13 HubSpot pipeline config - Counsel: Y12 contract pack - Brand: Y8 deck design assets