Skip to content

Compliance Monitoring & Full Baseline — Handover

Package: Baseline "Cloud Secure" [B] (part 2 of 2) · Milestone: M2b Scenario: Same client as M2a; adds continuous compliance monitoring, backup/DR, an evidence floor, and auto-generated documentation. Ownership: Mostly [CO] Client-Owned, with [SH] Shared monitoring outputs (you consume reports SnowOps helps operate).

This is the compliance-monitoring and documentation half of the Baseline package. It assumes the M2a Greenfield Baseline is in place.


What You Now Get

The M2a baseline made your platform compliant by construction. M2b makes it audit-defensible over time — continuous proof that it stays compliant, recovers from disaster, and is documented.

Capability Asset Where it lives
Compliance evidence snapshot E0 apps/evidence-collector/compliance/snapshots/
Drift detection S1 apps/drift-detector/ + .github/workflows/drift-detection.yml
Compliance dashboard S2 apps/compliance-dashboard/ + .github/workflows/compliance-dashboard.yml
Policy waivers (exceptions) D5 waivers/exceptions.yaml + OPA waiver engine
Architecture diagram (auto) V2 apps/diagram-generator/
Operational runbook (auto) V3 apps/runbook-generator/
Backup policies L1 modules/azure/backup-policy/
Cross-region replication / DR L2 modules/azure/cross-region-replication/
Automated restore drill L4 apps/restore-drill/ + .github/workflows/restore-drill.yml
Incident response runbooks K1 docs/runbooks/incident/
On-call integration K2 modules/azure/oncall-integration/ (PagerDuty/Opsgenie + Slack)
Brownfield import library F12 modules/azure/import-blocks/
Self-service prerequisite checker B6 apps/client-bootstrap/
Module versioning / private registry F11 apps/module-registry/ + modules/registry.json

The Continuous-Compliance Loop

These assets form a closed feedback loop that runs without you having to remember to look:

        ┌──────────────────────────────────────────────┐
        │  E0  evidence-collector (scheduled + post-apply)│
        │  → queries Azure Policy + Defender secure score │
        │  → writes a versioned snapshot to               │
        │     compliance/snapshots/                       │
        └───────────────┬──────────────────────────────┘
          ┌─────────────┴──────────────┐
          ▼                            ▼
  S2 compliance-dashboard       diffSnapshots regression signal
  → trend + framework rollup    → flags when posture degrades
  → static HTML + markdown
        ┌───────────────┴──────────────┐
        │  S1 drift-detector (daily cron)│
        │  → terraform plan per stack    │
        │  → files ONE ticket per stack  │
        │     of drift via TicketPlatform│
        └────────────────────────────────┘
  • E0 is your evidence floor — a machine-generated compliance snapshot on every apply and on schedule. This is what you show an auditor when they ask "prove your controls were operating on date X."
  • S1 catches anyone making manual ("click-ops") changes to managed infrastructure and opens a ticket. It only plans, never applies.
  • S2 renders the snapshot history into a dashboard with a SOC 2 / ISO 27001 / CIS Azure / HIPAA rollup and a trend line. It also surfaces the L4 restore-drill results in a "DR restore drills" panel.

Disaster Recovery — The Three Legs

DR is delivered as three complementary assets. Understand which does what:

Leg Asset What it provides
Recoverability L1 backup-policy GeoRedundant Recovery Services + Data Protection vaults; per-env retention policies for VM / Files / SQL / AKS.
Active replication L2 cross-region-replication Blob object replication + SQL failover group across regions.
Proof L4 restore-drill Monthly automated drill: restore/failover into an ephemeral sandbox RG → validate → tear down → record a dated RestoreDrillReport.

The L4 drill is your RTO evidence. It runs monthly via restore-drill.yml, classifies each outcome passed/partial/failed, measures actual RTO, and commits the report to compliance/restore-drills/ — which S2 then displays. An auditor asking "do you test your backups?" gets a dated, machine-generated answer.

Backup policies (L1) define retention; binding a specific VM/DB/share to a policy is a per-instance action your team owns. The vault managed identities are exported for that.


Documentation That Generates Itself

Tool Input Output
V2 diagram-generator terraform output -json A d2lang architecture diagram (SVG/PNG/PDF) of what was actually deployed
V3 runbook-generator terraform output -json Per-domain operational runbooks (identity/network/compute/registry/secrets/storage/observability) with key facts, Day-Zero hardening posture, and failure modes

Re-run these after any significant change so your architecture diagram and operational runbook never drift from reality. Both are zero-cloud — they read Terraform outputs, not your live environment.


Policy Waivers — Handling Exceptions Correctly

When a real, justified exception to a D3 OPA rule is unavoidable (common during brownfield migration), do not disable the rule. File a time-boxed waiver:

# waivers/exceptions.yaml
- rule_prefix: snowops.network
  resource_address: azurerm_storage_account.legacy_public
  expiry_date: "2026-09-01"
  owner: platform-team@client.example
  justification: "Legacy public endpoint; migration to private endpoint tracked in JIRA-1234"

The waiver suppresses the matching finding until expiry_date, then hard-fails CI once expired (snowops.waiver_expired). This gives you a PR-linked, auditable exception trail with a built-in deadline — exactly what an auditor wants to see instead of a silently-disabled control.


Incident Response

  • K1 ships a runbook library in docs/runbooks/incident/ covering compromise, ransomware, data leak, DDoS, and vendor breach. Review these with your team and adapt the contact/escalation details.
  • K2 wires Sentinel/Defender incidents to your on-call tool (PagerDuty/Opsgenie) and Slack via modules/azure/oncall-integration/. Configure your action groups and test an alert end-to-end at handover.

What This Delivers for Compliance

Control theme Framework reference Asset / Evidence
Monitoring of controls SOC 2 CC4.1 · ISO 27001 A.8.16 E0 snapshots, S2 dashboard trend
Change detection / unauthorized change SOC 2 CC7.1/CC8.1 · ISO 27001 A.8.32 S1 drift tickets
Availability / backup SOC 2 A1.2/A1.3 · ISO 27001 A.8.13/A.8.14 L1 policies, L4 monthly restore reports
Incident response SOC 2 CC7.3/CC7.4 · ISO 27001 A.5.24–A.5.26 K1 runbooks, K2 on-call
Exception management SOC 2 CC3.4 D5 waiver records with expiry

Verification at Handover

  • E0 produces a snapshot in compliance/snapshots/ (scheduled run + post-apply).
  • S2 renders a dashboard HTML from the snapshot history with a framework rollup.
  • S1 opens a drift ticket after you deliberately make a manual change to a managed resource.
  • L4 dry-run drill completes and writes a RestoreDrillReport; the live monthly run restores into a sandbox RG and tears it down.
  • An expired waiver fails CI; an unexpired one suppresses its finding.
  • V2/V3 generate a diagram and runbook from terraform output -json.
  • A test incident routes to on-call (K2) and Slack.

Failure Modes You Should Know

Symptom Cause Response
S1 opens duplicate drift tickets Marker mismatch / multiple stacks sharing a key Each stack uses an embedded <!-- snowops-drift:stack=… --> marker for idempotent upsert; verify each matrix entry has a unique stack name.
E0 snapshot empty / missing fields Collector lacks Reader + Security Reader Grant the read-only roles E0 requires; it never needs write access.
L4 drill leaves a sandbox RG behind Teardown step failed X7 nightly cleanup backstops it (the drill RG is tagged ephemeral=true); the report is classified partial.
Dashboard framework rollup shows "Unmapped" Policy names don't match the name-based matcher Expected and honest — unmatched controls are bucketed explicitly, not dropped. Refine names or accept the bucket.

Removal / Offboarding Path

  • The monitoring apps (E0/S1/S2/L4/V2/V3) are read-only and standalone — deleting the app directories and scheduled workflows removes them with zero residual cloud cost.
  • Backup/DR modules (L1/L2) have verified terraform destroy paths. Decide retention deliberately — destroying a backup vault destroys recovery points.
  • Evidence already collected in compliance/snapshots/ and compliance/restore-drills/ is yours to keep for your audit record.

Next

  • Have existing infrastructure to bring under management, or running Azure DevOps? See Brownfield Adoption.
  • Pursuing formal SOC 2 / ISO 27001 / HIPAA certification? That is the Advanced "Certification-Ready" package (M4) — automated evidence platform (Vanta/Drata), SIEM, trust center, vendor/HR controls. Ask SnowOps for the Advanced engagement scope.