Least-privilege IAM for Azure SQL — automation over good intentions

Every time I've seen a "least-privilege" IAM policy described in an architecture document and then looked at the actual implementation, the gap between intention and reality has been instructive.

The intentions are good. The process is manual. Manual processes degrade over time. The gap widens.

This is what I did about it at Craneware.

The problem in concrete terms

We had multiple Azure SQL databases supporting different products across multiple environments. Each database had a set of users — application service accounts, read-only reporting users, DBA access, and a handful of legacy accounts that existed for reasons nobody could fully remember.

Access was provisioned manually. An engineer would run a SQL script, or update a spreadsheet, or raise a ticket. There was no single source of truth. Access reviews were painful and slow because the first step was figuring out what access actually existed — which meant querying each database individually, or trusting the spreadsheet was current (it wasn't).

When I audited the current state, I found:

Several service accounts with db_owner that only needed db_datareader
Three legacy accounts that hadn't authenticated in over 180 days
No rotation schedule for any service account credentials
No automated mechanism to detect drift between intended and actual access

The design

The goal was to replace the manual spreadsheet with a code-defined, automatically-applied access model. Infrastructure as code for database access.

1. Desired state in code

Access definitions lived in a YAML file committed to the infrastructure repo:

# access/azure-sql.yml
databases:
  - name: craneware-prod-billing
    server: craneware-prod-sql.database.windows.net
    users:
      - identity: billing-api-prod           # Azure managed identity name
        role: db_datareader
        schemas: [billing, shared]

      - identity: billing-worker-prod
        role: db_datawriter
        schemas: [billing]

      - identity: reporting-reader-prod
        role: db_datareader
        schemas: [billing, reporting]
        read_only: true

      - identity: dba-group                  # AAD group
        role: db_owner
        environment_restriction: [dev, staging]  # not prod

This file was the source of truth. It lived in version control, so every access change had a commit, a PR, and a reviewer.

2. Automation that applied the desired state

I wrote a Python script that read the YAML definition and reconciled it against the actual database state:

def reconcile_database_access(
    db_config: DatabaseConfig,
    conn: pyodbc.Connection
) -> ReconcileResult:
    actual = get_current_users(conn)
    desired = db_config.users

    to_create = [u for u in desired if u.identity not in actual]
    to_remove = [u for u in actual if u not in [d.identity for d in desired]]
    to_update  = [u for u in desired if u.identity in actual
                  and actual[u.identity].role != u.role]

    return ReconcileResult(
        create=to_create,
        remove=to_remove,
        update=to_update,
        drift_detected=bool(to_remove or to_update)
    )

The script ran in two modes:

Plan mode (on every PR): showed what would change, posted a summary as a PR comment
Apply mode (post-merge to main): applied the changes, logged everything

Managed identity authentication meant service accounts never had passwords to rotate. The Azure AD identity was the credential.

3. C# for the RBAC provisioning at scale

For the Azure AD side — assigning roles to service principals, creating managed identities, configuring Entra ID groups — I used C# with the Azure SDK, since it gave cleaner async patterns for the volume of API calls involved:

public async Task ProvisionServiceIdentityAsync(
    ServiceIdentityConfig config,
    CancellationToken ct = default)
{
    // Create managed identity if it doesn't exist
    var identity = await _managedIdentityClient
        .GetOrCreateAsync(config.Name, config.ResourceGroup, ct);

    // Assign SQL roles via Azure AD
    await _roleAssignmentClient.AssignAsync(
        principalId: identity.PrincipalId,
        roleDefinitionId: RoleDefinitions.SqlDbContributor,
        scope: config.DatabaseResourceId,
        ct: ct);

    _logger.LogInformation(
        "Provisioned {Name} with role {Role} on {Database}",
        config.Name, config.Role, config.DatabaseName);
}

4. Drift detection as a scheduled pipeline

A pipeline ran nightly, re-ran the plan step, and alerted if drift was detected — meaning actual database access had diverged from the desired state definition. Drift in a production database's access model is always worth knowing about.

Results

~85% reduction in manual access-change effort — most changes were now a PR, not a ticket-and-spreadsheet workflow
Access reviews became fast — the YAML file was the access review. What's in the file is what's in the database. Auditors loved it.
Three immediate remediations found during the first automated audit: db_owner downgraded to db_datareader where appropriate, 180-day-inactive accounts removed
No more access ambiguity — "who has access to this database?" became a five-second question

The pattern generalises

The specific implementation here is Azure SQL, but the pattern — desired state in YAML, reconciliation script, plan-on-PR / apply-on-merge, drift detection — applies to almost any access management problem. I've used variants of it for Azure AD group membership, Auth0 role assignments, and pipeline service connection permissions.

The key insight is that access management is just another form of infrastructure. It deserves the same version control, the same review process, and the same automation as your compute resources.

The alternative — good intentions and a spreadsheet — degrades predictably.