Manual Test Runbook — F5: Azure Key Vault
Owner: Sagar | Time: ~30 min (Parts A + B) · +25 min (optional Part D secret-CRUD probe) | Sandbox: snowops-sandbox-01
Promotes F5 (
modules/azure/key-vault/) from 🟦 Code Complete → 🟩 Shipped. Part C costs ~$2 (PE hourly + a few vault ops). Skip Part D if not iterating on the secret data-plane / Workload Identity wiring.
Prerequisites
- Sandbox subscription access active (PIM activated if required)
-
az logindone;az account showconfirms the sandbox subscription is selected - Identity has Contributor + User Access Administrator on the sandbox subscription (UAA only needed if
kv_*_principal_idsis non-empty) - Local tooling:
terraform >= 1.6,go >= 1.22,az CLI >= 2.50,jq -
SNOWOPS_SANDBOX_SUBSCRIPTION_IDandSNOWOPS_SANDBOX_TENANT_IDenv vars set - Working directory: repo root
Steps
Part A — terraform fmt + validate (offline, ~1 min)
- Confirm formatting + structural validity of the module on its own:
terraform -chdir=modules/azure/key-vault fmt -check
terraform -chdir=modules/azure/key-vault init -backend=false -input=false
terraform -chdir=modules/azure/key-vault validate
Expected: Success! The configuration is valid.
- Run the F5-relevant offline Terratest cases:
cd tests/terratest
go test -v -timeout 5m ./modules/azure/... \
-run 'TestKeyVaultValidate|TestF5KVContractConformance|TestContractsRejectBadLiterals'
Expected: 3 top-level tests pass; the kv-missing-uri sub-test under
TestContractsRejectBadLiterals is the F5-relevant negative case.
Part B — full Terratest suite (offline, ~3 min)
- Run the whole offline suite to confirm F5 hasn't regressed F1/F2/F4/F6:
Expected: 13 top-level tests pass (TestNoopHarness,
TestBaselineValidate, TestStateBackendValidate, TestSandboxValidate,
TestF1ContractConformance, TestF6ObjectStoreContractConformance,
TestF2NetworkContractConformance, TestNetworkHubValidate,
TestACRValidate, TestF4RegistryContractConformance,
TestKeyVaultValidate, TestF5KVContractConformance,
TestContractsRejectBadLiterals with 7 sub-tests).
Part C — integration test (real Azure apply + destroy, ~25 min, ~$2)
Skip if iterating on offline changes only. Cost is dominated by the F2 hub-spoke + Private Endpoint, not the vault itself.
- Export sandbox env vars (same as F1 / F2 / F4 / F6):
export SNOWOPS_SANDBOX_SUBSCRIPTION_ID="<sandbox-subscription-guid>"
export SNOWOPS_SANDBOX_TENANT_ID="<sandbox-tenant-guid>"
- Run the F5 integration test:
cd tests/terratest
go test -v -tags integration -timeout 60m ./modules/azure/... -run TestKeyVaultModule
- Watch for key milestones:
Plan: ~14 to add, 0 to change, 0 to destroy.— F2 (RG + hub vnet + 1 hub subnet + 1 spoke vnet + 2 spoke subnets + 2 NSGs + 2 NSG associations + 2 peerings + 1 Private DNS zone + 2 vnet links) + F5 (RG + KV + PE + PE psc + PE dns zone group) = ~14 resources.azurerm_key_vault.this: Creation complete after ~30s— typically 30–60 seconds depending on region.azurerm_private_endpoint.this: Still creating...— usually ~2 min.- All output assertions PASS, including the
kv_contractshape check (rbac_mode = true,purge_protection = true). Destroy complete!— clean teardown.
Vault-name reuse caution. The fixture provider config sets
purge_soft_delete_on_destroy = falseandrecover_soft_deleted_key_vaults = true, so destroy leaves the vault in soft-deleted state. The integration test usesrandom.UniqueId()so this never collides. If you re-run with a fixed name within the 90-day window, Azure will refuse the apply.
Part D — secret CRUD via Workload Identity (optional, ~25 min)
Verifies the SnowOps end-to-end secret-fetch story: a workload identity (proxy for AKS Workload Identity / F3) writes + reads a secret over the Private Endpoint, while public access stays disabled.
- After Part C apply but before destroy, capture the vault name + outputs from the test state:
cd tests/terratest/fixtures/key-vault
VAULT_NAME=$(terraform output -raw vault_name)
KV_RG=$(terraform output -raw kv_resource_group_name)
NET_RG=$(terraform output -raw net_resource_group_name)
echo "vault=$VAULT_NAME kv_rg=$KV_RG net_rg=$NET_RG"
- Create a User-Assigned Managed Identity to play the Workload Identity role,
then grant it
Key Vault Secrets Officeron the vault scope:
az identity create \
--resource-group "$KV_RG" \
--name "f5-probe-mi" \
--location "eastus"
PROBE_MI_OID=$(az identity show --resource-group "$KV_RG" --name "f5-probe-mi" --query principalId -o tsv)
VAULT_ID=$(az keyvault show --name "$VAULT_NAME" --query id -o tsv)
az role assignment create \
--assignee-object-id "$PROBE_MI_OID" \
--assignee-principal-type ServicePrincipal \
--role "Key Vault Secrets Officer" \
--scope "$VAULT_ID"
- Deploy a tiny Linux VM into the F2 spoke
workloadsubnet and attach the MI to it. The VM is the only way to reach the vault — public access is off so a control-plane shell on your laptop cannot.
SPOKE_VNET=$(terraform output -json spoke_subnet_ids | jq -r '."apps/workload"' | awk -F/ '{print $(NF-2)}')
az vm create --resource-group "$NET_RG" \
--name "f5-probe-vm" \
--image "Ubuntu2204" \
--vnet-name "$SPOKE_VNET" \
--subnet "workload" \
--public-ip-address "" \
--admin-username "snowops" \
--generate-ssh-keys \
--assign-identity "/subscriptions/$SNOWOPS_SANDBOX_SUBSCRIPTION_ID/resourceGroups/$KV_RG/providers/Microsoft.ManagedIdentity/userAssignedIdentities/f5-probe-mi" \
--size "Standard_B2s"
-
From the VM, fetch a token using the attached MI and write + read a test secret. The DNS lookup must resolve to the PE private IP (not the public CNAME).
az vm run-command invoke --resource-group "$NET_RG" --name "f5-probe-vm" \ --command-id "RunShellScript" \ --scripts " set -euxo pipefail sudo apt-get update -y && sudo apt-get install -y curl dnsutils # Confirm DNS resolves to PE (10.10.2.x range from fixture). dig +short $VAULT_NAME.vault.azure.net # Token via IMDS. TOKEN=\$(curl -s 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fvault.azure.net' \ -H 'Metadata: true' | jq -r .access_token) # Write. curl -s -X PUT \"https://$VAULT_NAME.vault.azure.net/secrets/f5-probe-secret?api-version=7.4\" \ -H \"Authorization: Bearer \$TOKEN\" \ -H 'Content-Type: application/json' \ -d '{\"value\":\"f5-probe-value-001\"}' # Read back. curl -s \"https://$VAULT_NAME.vault.azure.net/secrets/f5-probe-secret?api-version=7.4\" \ -H \"Authorization: Bearer \$TOKEN\" "Expected:
digreturns a 10.10.2.x address (the PE private IP). The write returns the secret manifest. The read returns{"value":"f5-probe-value-001",...}. -
Confirm public access is denied — run the same read from outside the spoke (your laptop) and assert a 403:
TOKEN=$(az account get-access-token --resource https://vault.azure.net --query accessToken -o tsv) curl -s -o /dev/null -w "%{http_code}\n" \ "https://$VAULT_NAME.vault.azure.net/secrets/f5-probe-secret?api-version=7.4" \ -H "Authorization: Bearer $TOKEN"Expected:
403(PublicNetworkAccessDenied). If you get200, F5 has silently allowed public access — fail the runbook and investigate. -
Cleanup before the integration test's deferred
terraform destroyfires: delete the probe VM + MI (the role assignment is scoped to the vault and goes when the vault does, but the MI must be removed manually). The integration test'sterraform destroywon't know about these out-of-band resources:
Pass criteria
- Part A —
terraform validatepasses for the module - Part B — full offline Terratest suite passes (13+ top-level tests)
- Part C —
TestKeyVaultModuleintegration test passes end-to-end - Vault created at premium SKU
-
public_network_access_enabled = falseconfirmed via portal oraz keyvault show --name $VAULT_NAME --query properties.publicNetworkAccessreturnsDisabled - RBAC mode on —
az keyvault show --name $VAULT_NAME --query properties.enableRbacAuthorizationreturnstrue - Purge protection on —
... --query properties.enablePurgeProtectionreturnstrue - Soft-delete retention = 90 —
... --query properties.softDeleteRetentionInDaysreturns90 - Private Endpoint reachable; private IP non-empty; A-record present in F2's
privatelink.vaultcore.azure.netzone - (Part D) VM in spoke resolves
<vault>.vault.azure.netto a 10.10.2.x address - (Part D) Workload Identity writes + reads a secret over the PE
- (Part D) External (laptop) read returns 403 PublicNetworkAccessDenied
- All
Destroycalls complete without error - No orphaned RGs / soft-deleted vault collisions remain (verify with
az group list -o tableandaz keyvault list-deleted -o table) - All test resources tagged
ephemeral = true(X7 cleanup safety net)
Teardown
The integration test runs terraform destroy automatically. If a failure
mid-run orphans resources, clean up manually:
# Two RGs: <name_prefix>-net-rg (F2) and <name_prefix>-kv-rg (F5).
az group delete --name "<name_prefix>-net-rg" --yes --no-wait
az group delete --name "<name_prefix>-kv-rg" --yes --no-wait
# The vault soft-delete record outlives its RG. List + purge once empty (only
# possible because the integration test provider has purge_soft_delete_on_destroy
# = false combined with no purge-protection guard in soft-delete state on a
# fresh apply — production NEVER purges).
az keyvault list-deleted -o table
az keyvault purge --name "<vault_name>" --location "<region>"
Vault destroy takes ~10 seconds; the PE detach can stretch to ~2 minutes.
Sign-off
- Tester: _ | Date: _ | Result: PASS / FAIL / N/A
- Notes: