Manual Test Runbook — L2: Cross-Region Replication Module
Owner: Sagar | Time: ~8 min (Parts A + B offline) · +25 min (optional Part C integration apply) | Sandbox: snowops-sandbox-01
Promotes L2 (
modules/azure/cross-region-replication/) from 🟦 Code Complete → 🟩 Shipped. Parts A + B are offline ($0). Part C applies two storage accounts + two SQL servers + the replication links to the sandbox (small egress/SQL cost) and destroys. The live failover drill is L4.
Prerequisites
- Sandbox subscription access active (PIM activated if required)
-
az logindone; sandbox subscription selected - Identity has Contributor on the sandbox sub (storage + SQL create)
- An Azure AD object ID to use as the SQL AAD admin (your own user works)
-
SNOWOPS_SANDBOX_SUBSCRIPTION_ID+SNOWOPS_SANDBOX_TENANT_IDexported - Local tooling:
terraform >= 1.6,go >= 1.22,az CLI >= 2.50 - Working directory: repo root
Steps
Part A — terraform fmt + validate (offline, ~3 min)
- Module + example:
terraform -chdir=modules/azure/cross-region-replication fmt -recursive -check
terraform -chdir=modules/azure/cross-region-replication init -backend=false -input=false
terraform -chdir=modules/azure/cross-region-replication validate
terraform -chdir=modules/azure/cross-region-replication/examples/basic init -backend=false -input=false
terraform -chdir=modules/azure/cross-region-replication/examples/basic validate
Expected: Success! for both.
- Offline Terratest case:
cd tests/terratest
go test -v -timeout 5m ./modules/azure/... -run TestCrossRegionReplicationValidate
Expected: PASS — exercises blob object replication (primary GZRS account → secondary DR-region account, module-created destination container) and the SQL failover group on the Automatic (prod, 120-min grace) path, plus every cross-variable precondition, offline.
Part B — full Terratest suite (offline, ~5 min)
bash cd tests/terratest && go test -count=1 -timeout 15m ./...
Expected: the full suite green (the new TestCrossRegionReplicationValidate included).
Part C — integration apply (sandbox, ~25 min)
The fixture creates a primary + secondary (DR-region) storage account, a source container, a primary + partner SQL server (AAD-only auth), and a database, then wires object replication + a SQL failover group across them.
- Apply the fixture (use globally-unique storage names + your AAD object ID):
cd tests/terratest/fixtures/cross-region-replication
SUFFIX=$RANDOM
terraform init -input=false
terraform apply -auto-approve \
-var "subscription_id=$SNOWOPS_SANDBOX_SUBSCRIPTION_ID" \
-var "tenant_id=$SNOWOPS_SANDBOX_TENANT_ID" \
-var "name_prefix=l2-drill-$SUFFIX" \
-var "primary_sa_name=snol2p$SUFFIX" \
-var "secondary_sa_name=snol2s$SUFFIX" \
-var "sql_admin_object_id=$(az ad signed-in-user show --query id -o tsv)"
SQL server + failover group creation is the slow step (~10-15 min). The geo-secondary database seeds before the failover group reports healthy.
- Confirm the failover group + object replication exist with the expected posture:
FOG=$(terraform output -raw sql_failover_group_id | sed -E 's#.*/failoverGroups/##')
PRI=$(terraform state show 'azurerm_mssql_server.primary' | awk '/ name /{print $3; exit}' | tr -d '"')
RG="l2-drill-$SUFFIX-rg"
az sql failover-group show --name "$FOG" --server "$PRI" --resource-group "$RG" \
--query "{role:replicationRole, mode:readWriteEndpoint.failoverPolicy, grace:readWriteEndpoint.failoverWithDataLossGracePeriodMinutes}" -o json
terraform output replication_summary
Expected: role = Primary, mode = Automatic, grace = 120;
replication_summary shows secondary_location ≠ primary_location,
container_pair_count = 1, sql_database_count = 1.
- Destroy:
terraform destroy -auto-approve \
-var "subscription_id=$SNOWOPS_SANDBOX_SUBSCRIPTION_ID" \
-var "tenant_id=$SNOWOPS_SANDBOX_TENANT_ID" \
-var "name_prefix=l2-drill-$SUFFIX" \
-var "primary_sa_name=snol2p$SUFFIX" \
-var "secondary_sa_name=snol2s$SUFFIX" \
-var "sql_admin_object_id=$(az ad signed-in-user show --query id -o tsv)"
Expected: clean destroy — the failover group is removed first (the geo-link breaks), then the servers, accounts, and RG.
Pass criteria
- Part A — module + example validate;
TestCrossRegionReplicationValidatepasses - Part B — full offline suite passes
- (Part C) fixture applies object replication + failover group; failover group
reports
Automatic/grace = 120, cross-region; destroys clean - All test resources removed
Teardown
If Part C left anything behind (e.g. a destroy interrupted while the failover group was still seeding), delete the drill RG:
A failover group must be deleted (or its geo-link broken) before its servers can be removed —
terraform destroyorders this for you, but a manual RG delete handles a partial state.
Sign-off
- Tester: _ | Date: _ | Result: PASS / FAIL / N/A
- Notes: