Skip to content

Manual Test Runbook — L2: Cross-Region Replication Module

Owner: Sagar  |  Time: ~8 min (Parts A + B offline) · +25 min (optional Part C integration apply)  |  Sandbox: snowops-sandbox-01

Promotes L2 (modules/azure/cross-region-replication/) from 🟦 Code Complete → 🟩 Shipped. Parts A + B are offline ($0). Part C applies two storage accounts + two SQL servers + the replication links to the sandbox (small egress/SQL cost) and destroys. The live failover drill is L4.


Prerequisites

  • Sandbox subscription access active (PIM activated if required)
  • az login done; sandbox subscription selected
  • Identity has Contributor on the sandbox sub (storage + SQL create)
  • An Azure AD object ID to use as the SQL AAD admin (your own user works)
  • SNOWOPS_SANDBOX_SUBSCRIPTION_ID + SNOWOPS_SANDBOX_TENANT_ID exported
  • Local tooling: terraform >= 1.6, go >= 1.22, az CLI >= 2.50
  • Working directory: repo root

Steps

Part A — terraform fmt + validate (offline, ~3 min)

  1. Module + example:
terraform -chdir=modules/azure/cross-region-replication fmt -recursive -check
terraform -chdir=modules/azure/cross-region-replication init -backend=false -input=false
terraform -chdir=modules/azure/cross-region-replication validate

terraform -chdir=modules/azure/cross-region-replication/examples/basic init -backend=false -input=false
terraform -chdir=modules/azure/cross-region-replication/examples/basic validate

Expected: Success! for both.

  1. Offline Terratest case:
cd tests/terratest
go test -v -timeout 5m ./modules/azure/... -run TestCrossRegionReplicationValidate

Expected: PASS — exercises blob object replication (primary GZRS account → secondary DR-region account, module-created destination container) and the SQL failover group on the Automatic (prod, 120-min grace) path, plus every cross-variable precondition, offline.

Part B — full Terratest suite (offline, ~5 min)

  1. bash cd tests/terratest && go test -count=1 -timeout 15m ./...

Expected: the full suite green (the new TestCrossRegionReplicationValidate included).

Part C — integration apply (sandbox, ~25 min)

The fixture creates a primary + secondary (DR-region) storage account, a source container, a primary + partner SQL server (AAD-only auth), and a database, then wires object replication + a SQL failover group across them.

  1. Apply the fixture (use globally-unique storage names + your AAD object ID):
cd tests/terratest/fixtures/cross-region-replication
SUFFIX=$RANDOM
terraform init -input=false
terraform apply -auto-approve \
  -var "subscription_id=$SNOWOPS_SANDBOX_SUBSCRIPTION_ID" \
  -var "tenant_id=$SNOWOPS_SANDBOX_TENANT_ID" \
  -var "name_prefix=l2-drill-$SUFFIX" \
  -var "primary_sa_name=snol2p$SUFFIX" \
  -var "secondary_sa_name=snol2s$SUFFIX" \
  -var "sql_admin_object_id=$(az ad signed-in-user show --query id -o tsv)"

SQL server + failover group creation is the slow step (~10-15 min). The geo-secondary database seeds before the failover group reports healthy.

  1. Confirm the failover group + object replication exist with the expected posture:
FOG=$(terraform output -raw sql_failover_group_id | sed -E 's#.*/failoverGroups/##')
PRI=$(terraform state show 'azurerm_mssql_server.primary' | awk '/ name /{print $3; exit}' | tr -d '"')
RG="l2-drill-$SUFFIX-rg"

az sql failover-group show --name "$FOG" --server "$PRI" --resource-group "$RG" \
  --query "{role:replicationRole, mode:readWriteEndpoint.failoverPolicy, grace:readWriteEndpoint.failoverWithDataLossGracePeriodMinutes}" -o json

terraform output replication_summary

Expected: role = Primary, mode = Automatic, grace = 120; replication_summary shows secondary_locationprimary_location, container_pair_count = 1, sql_database_count = 1.

  1. Destroy:
terraform destroy -auto-approve \
  -var "subscription_id=$SNOWOPS_SANDBOX_SUBSCRIPTION_ID" \
  -var "tenant_id=$SNOWOPS_SANDBOX_TENANT_ID" \
  -var "name_prefix=l2-drill-$SUFFIX" \
  -var "primary_sa_name=snol2p$SUFFIX" \
  -var "secondary_sa_name=snol2s$SUFFIX" \
  -var "sql_admin_object_id=$(az ad signed-in-user show --query id -o tsv)"

Expected: clean destroy — the failover group is removed first (the geo-link breaks), then the servers, accounts, and RG.


Pass criteria

  • Part A — module + example validate; TestCrossRegionReplicationValidate passes
  • Part B — full offline suite passes
  • (Part C) fixture applies object replication + failover group; failover group reports Automatic / grace = 120, cross-region; destroys clean
  • All test resources removed

Teardown

If Part C left anything behind (e.g. a destroy interrupted while the failover group was still seeding), delete the drill RG:

az group delete --name "l2-drill-<suffix>-rg" --yes --no-wait

A failover group must be deleted (or its geo-link broken) before its servers can be removed — terraform destroy orders this for you, but a manual RG delete handles a partial state.


Sign-off

  • Tester: _  |  Date: _  |  Result: PASS / FAIL / N/A
  • Notes: