There's a specific failure mode that platform engineers fall into when they start caring about governance: they become the bottleneck.
Every pull request needs your sign-off. Every team waits for you to review their Terraform. Every deployment gate sits in your queue. You've successfully centralised risk reduction — and accidentally centralised delivery too.
I've been on the receiving end of that as a developer. I've also been the engineer who caused it. This is what I eventually learned to do differently.
The context
When I joined Craneware's platform team, we had 15+ Azure DevOps pipelines in various states of repair. Some were originally built by engineers who'd since left. Some used ARM templates. Some used Terraform — but inconsistently, with no shared module strategy, no standardised naming, and no enforced security controls.
Every team had their own interpretation of how infrastructure should look. A few teams were doing it well. Most were doing it fine. A couple had accumulated the kind of debt that makes auditors nervous.
My job was to bring coherence to this without slowing anyone down.
The framing that changed everything
I stopped thinking about governance as control and started thinking about it as environment design.
In a well-designed environment, the easy path and the correct path are the same. Engineers don't reach for the secure option because they've been told to — they reach for it because it's the only option that's actually easy.
This reframe changes what you build. Instead of building guardrails that block wrong things, you build paved roads that make right things effortless.
What we built
1. A versioned Terraform module library
We created a private Azure DevOps Artifacts feed with versioned Terraform modules for our most-used resource types: App Services, Azure SQL, storage accounts, networking components.
Each module had security decisions baked in rather than exposed as parameters:
module "app_service" {
source = "azuredevops://craneware/tf-modules//app-service"
version = "~> 2.0"
name = var.service_name
resource_group_name = module.resource_group.name
environment = var.environment
# What teams configure:
sku_name = "P1v3"
app_settings = var.app_settings
}
# What they don't see — baked into the module:
# - https_only = true (enforced)
# - minimum_tls_version = "1.2" (enforced)
# - managed_identity_type = "SystemAssigned" (default)
# - ftps_state = "Disabled" (enforced)
TLS enforcement, HTTPS-only, and managed identity weren't toggles. You couldn't turn them off without forking the module — which would immediately flag in code review.
2. Shared pipeline templates
The second piece was a pipeline-templates repository that any team could reference. The core template bundled everything we wanted to happen on every deployment:
# In any service's azure-pipelines.yml
resources:
repositories:
- repository: templates
type: git
name: craneware/pipeline-templates
ref: refs/tags/v1.4.0 # pinned version
stages:
- template: stages/standard-deploy.yml@templates
parameters:
serviceName: $(Build.Repository.Name)
environment: $(ENVIRONMENT)
terraformVersion: '1.7.2'
By referencing a pinned template version, teams got:
- Snyk dependency and container scanning
terraform validateandtflinton every PR- TLS certificate validation before deployment
- Approval gates for production
- Standardised tagging enforcement
None of this required individual teams to think about it. It just happened.
3. Automated PR feedback — not just blocking
The part I'm most pleased with is how we handled code review at scale.
I wrote a Python script that ran as a pipeline task and posted structured feedback directly on pull requests as inline comments. For routine compliance issues — missing required tags, naming convention violations, hardcoded secrets patterns — the bot caught them before a human ever looked at the PR.
@dataclass
class ReviewIssue:
severity: Literal['error', 'warning', 'info']
file: str
line: int
message: str
suggestion: str
def check_required_tags(resource_block: dict, file: str) -> list[ReviewIssue]:
required = {'environment', 'cost_centre', 'team', 'managed_by'}
missing = required - set(resource_block.get('tags', {}).keys())
return [
ReviewIssue(
severity='error',
file=file,
line=resource_block['line'],
message=f"Missing required tag: {tag}",
suggestion=f'Add `{tag} = var.{tag}` to the tags block'
)
for tag in missing
]
This meant my manual reviews could focus on architectural decisions, not catching missing cost_centre tags.
The results
After rolling this out over roughly six months:
- 100+ PRs reviewed with significantly less per-PR time spent on routine compliance issues
- ~85% reduction in manual access-change effort through automated RBAC provisioning
- Zero security regressions in HITRUST-scoped environments during the period
- Engineers from multiple squads reported faster onboarding because the patterns were self-documenting
The bottleneck problem essentially disappeared. Teams could ship infrastructure independently, confident that the pipeline would catch anything important before it reached production.
What I'd do differently
Start the module library before you need it. We were retrofitting — migrating existing pipelines to the new standard while also trying to support new work. Doing both simultaneously stretched the effort out longer than it needed to be.
The other thing: document the why, not just the what. A module that enforces TLS 1.2 without explanation creates compliance without understanding. Engineers who understand why the control exists are much less likely to try to work around it.
The paved road only works if engineers know it was built for them, not against them.