Skip to content

Manual Test Runbook — J4: Detection Alert Rule Pack

Owner: Sagar  |  Time: ~8 min (Parts A + B offline) · +15 min (optional Part C integration apply)  |  Sandbox: snowops-sandbox-01

Promotes J4 (modules/azure/alert-rule-pack/) from 🟦 Code Complete → 🟩 Shipped. Parts A + B are offline ($0). Part C applies the curated pack against a sandbox Log Analytics workspace (~$0 — scheduled-query rules are a few cents/month; the apply lasts minutes) and destroys.


Prerequisites

  • Sandbox subscription access active (PIM activated if required)
  • az login done; sandbox subscription selected
  • Identity has Contributor on the sandbox sub (workspace + alert-rule create)
  • SNOWOPS_SANDBOX_SUBSCRIPTION_ID + SNOWOPS_SANDBOX_TENANT_ID exported
  • Local tooling: terraform >= 1.6, go >= 1.22, az CLI >= 2.50
  • Working directory: repo root

Steps

Part A — terraform fmt + validate (offline, ~3 min)

  1. Module + example:
terraform -chdir=modules/azure/alert-rule-pack fmt -recursive -check
terraform -chdir=modules/azure/alert-rule-pack init -backend=false -input=false
terraform -chdir=modules/azure/alert-rule-pack validate

terraform -chdir=modules/azure/alert-rule-pack/examples/basic init -backend=false -input=false
terraform -chdir=modules/azure/alert-rule-pack/examples/basic validate

Expected: Success! for both.

  1. Offline Terratest case:
cd tests/terratest
go test -v -timeout 5m ./modules/azure/... -run TestAlertRulePackValidate

Expected: PASS — exercises the full curated pack across all four domains, a per-rule override (severity + threshold + frequency), one disabled rule, a custom rule, and the action-group wiring, offline.

Part B — full Terratest suite (offline, ~5 min)

  1. bash cd tests/terratest && go test -count=1 -timeout 15m ./...

Expected: the full suite green (the new TestAlertRulePackValidate included).

Part C — integration apply (sandbox, ~15 min, ~$0)

No build-tagged integration test ships for J4 — a real scheduled-query evaluation needs log data flowing and the "alert fires within N min of a planted signal" criterion belongs here, not in a teardown-on-completion test.

  1. Stand up a throwaway workspace + the curated pack. Reuse the J1 fixture for the workspace, then apply the J4 example against it (or hand-write a small root module wiring log-analyticsalert-rule-pack). Minimal path:
cd tests/terratest/fixtures/alert-rule-pack
terraform init -input=false
# Point workspace_id at a REAL sandbox workspace ARM ID first (edit main.tf
# or wrap with a workspace module); action_group_ids may stay synthetic or be
# set to a real K2 action group.
terraform apply -auto-approve \
  -var "subscription_id=$SNOWOPS_SANDBOX_SUBSCRIPTION_ID" \
  -var "tenant_id=$SNOWOPS_SANDBOX_TENANT_ID"
  1. Confirm the rules exist with the expected shape:
RG=snowops-j4-test-rg
az monitor scheduled-query list --resource-group "$RG" \
  --query "[].{name:name, severity:severity, enabled:enabled, freq:evaluationFrequency}" -o table

Expected: 9 enabled rules + 1 disabled (*-network-nsg-deny-spike, disabled via the fixture override) + the custom *-app-5xx-spike. The *-identity-failed-signin-spike rule shows severity 1 and frequency PT5M (override applied).

  1. (Optional, signal test) Plant a matching signal and confirm an alert fires. Easiest: temporarily lower a rule's threshold to 0 against a table that has data (e.g. point a custom rule at Heartbeat | take 1), wait one evaluation window, and confirm an alert appears:
az monitor activity-log alert list -g "$RG" -o table   # or check Alerts blade

Expected: the rule transitions to fired and (if a real action group was supplied) the action group receives a notification.

  1. Destroy:
terraform destroy -auto-approve \
  -var "subscription_id=$SNOWOPS_SANDBOX_SUBSCRIPTION_ID" \
  -var "tenant_id=$SNOWOPS_SANDBOX_TENANT_ID"

Expected: clean destroy — rules are stateless detection content; the RG and every rule are removed. (The workspace/action group, if owned elsewhere, are untouched.)


Pass criteria

  • Part A — module + example validate; TestAlertRulePackValidate passes
  • Part B — full offline suite passes
  • (Part C) fixture applies the pack; rule count + severities + the disabled rule + the override match; destroys clean
  • (Part C, optional) a planted signal fires the rule and notifies the action group
  • All test resources removed

Failure mode

A rule that can never fire because its source table isn't flowing into the workspace (most common operational failure). Documented in the module README data-source matrix; mitigation is the per-domain enable toggle. A noisy rule is tuned via rule_overrides (raise threshold) or a custom_rules replacement.

Cost impact

A few cents per rule per month plus log-query cost per evaluation (data scanned × frequency). Full pack at the default PT15M < $1/day on a typical Baseline workspace. No standing compute. Part C apply is ~$0.

Removal path

terraform destroy (Part C step 7) removes every rule and the RG if J4 created it. Verified clean in Part C.


Sign-Off

Field Value
Part A (validate) ☐ PASS
Part B (offline suite) ☐ PASS
Part C (integration apply) ☐ PASS / ☐ skipped
Part C signal test ☐ PASS / ☐ skipped
Tester
Date
Result ☐ PASS