Every time I've seen a "least-privilege" IAM policy described in an architecture document and then looked at the actual implementation, the gap between intention and reality has been instructive.
The intentions are good. The process is manual. Manual processes degrade over time. The gap widens.
This is what I did about it at Craneware.
The problem in concrete terms
We had multiple Azure SQL databases supporting different products across multiple environments. Each database had a set of users — application service accounts, read-only reporting users, DBA access, and a handful of legacy accounts that existed for reasons nobody could fully remember.
Access was provisioned manually. An engineer would run a SQL script, or update a spreadsheet, or raise a ticket. There was no single source of truth. Access reviews were painful and slow because the first step was figuring out what access actually existed — which meant querying each database individually, or trusting the spreadsheet was current (it wasn't).
When I audited the current state, I found:
- Several service accounts with
db_ownerthat only neededdb_datareader - Three legacy accounts that hadn't authenticated in over 180 days
- No rotation schedule for any service account credentials
- No automated mechanism to detect drift between intended and actual access
The design
The goal was to replace the manual spreadsheet with a code-defined, automatically-applied access model. Infrastructure as code for database access.
1. Desired state in code
Access definitions lived in a YAML file committed to the infrastructure repo:
# access/azure-sql.yml
databases:
- name: craneware-prod-billing
server: craneware-prod-sql.database.windows.net
users:
- identity: billing-api-prod # Azure managed identity name
role: db_datareader
schemas: [billing, shared]
- identity: billing-worker-prod
role: db_datawriter
schemas: [billing]
- identity: reporting-reader-prod
role: db_datareader
schemas: [billing, reporting]
read_only: true
- identity: dba-group # AAD group
role: db_owner
environment_restriction: [dev, staging] # not prod
This file was the source of truth. It lived in version control, so every access change had a commit, a PR, and a reviewer.
2. Automation that applied the desired state
I wrote a Python script that read the YAML definition and reconciled it against the actual database state:
def reconcile_database_access(
db_config: DatabaseConfig,
conn: pyodbc.Connection
) -> ReconcileResult:
actual = get_current_users(conn)
desired = db_config.users
to_create = [u for u in desired if u.identity not in actual]
to_remove = [u for u in actual if u not in [d.identity for d in desired]]
to_update = [u for u in desired if u.identity in actual
and actual[u.identity].role != u.role]
return ReconcileResult(
create=to_create,
remove=to_remove,
update=to_update,
drift_detected=bool(to_remove or to_update)
)
The script ran in two modes:
- Plan mode (on every PR): showed what would change, posted a summary as a PR comment
- Apply mode (post-merge to main): applied the changes, logged everything
Managed identity authentication meant service accounts never had passwords to rotate. The Azure AD identity was the credential.
3. C# for the RBAC provisioning at scale
For the Azure AD side — assigning roles to service principals, creating managed identities, configuring Entra ID groups — I used C# with the Azure SDK, since it gave cleaner async patterns for the volume of API calls involved:
public async Task ProvisionServiceIdentityAsync(
ServiceIdentityConfig config,
CancellationToken ct = default)
{
// Create managed identity if it doesn't exist
var identity = await _managedIdentityClient
.GetOrCreateAsync(config.Name, config.ResourceGroup, ct);
// Assign SQL roles via Azure AD
await _roleAssignmentClient.AssignAsync(
principalId: identity.PrincipalId,
roleDefinitionId: RoleDefinitions.SqlDbContributor,
scope: config.DatabaseResourceId,
ct: ct);
_logger.LogInformation(
"Provisioned {Name} with role {Role} on {Database}",
config.Name, config.Role, config.DatabaseName);
}
4. Drift detection as a scheduled pipeline
A pipeline ran nightly, re-ran the plan step, and alerted if drift was detected — meaning actual database access had diverged from the desired state definition. Drift in a production database's access model is always worth knowing about.
Results
- ~85% reduction in manual access-change effort — most changes were now a PR, not a ticket-and-spreadsheet workflow
- Access reviews became fast — the YAML file was the access review. What's in the file is what's in the database. Auditors loved it.
- Three immediate remediations found during the first automated audit:
db_ownerdowngraded todb_datareaderwhere appropriate, 180-day-inactive accounts removed - No more access ambiguity — "who has access to this database?" became a five-second question
The pattern generalises
The specific implementation here is Azure SQL, but the pattern — desired state in YAML, reconciliation script, plan-on-PR / apply-on-merge, drift detection — applies to almost any access management problem. I've used variants of it for Azure AD group membership, Auth0 role assignments, and pipeline service connection permissions.
The key insight is that access management is just another form of infrastructure. It deserves the same version control, the same review process, and the same automation as your compute resources.
The alternative — good intentions and a spreadsheet — degrades predictably.