← All posts

Platform Engineering as a Product (2/3): Architecture, Operating Model, and Governance

Part 2 of 3 — Platform Engineering as a Product

Unni Pillai
Unni Pillai
  • Part 1: Why Platform Engineering Matters (and Why Most Get It Wrong)
  • Part 2: Inside the Platform: Architecture, Operating Model, and Governance (you are here)
  • Part 3: Building It — The Maturity Roadmap and Tech Stack

From Vision to Internals

Most platform engineering efforts die in the architecture phase. Not because the technology is wrong — because no one defines who owns what, who’s on-call, or how governance actually works without becoming a bottleneck.

In Part 1, we covered the why. Now let’s open the hood. This article covers three things:

  1. The 5-layer architecture — what each layer does and how they compose
  2. The operating model — who owns what, who’s on-call, and how the platform team functions as a product team
  3. Governance-as-code — how to embed compliance without creating bottlenecks

1. The 5-Layer Platform Architecture

The platform is modular, extensible, and interface-consistent. Every capability flows through the same API — whether your developers use the CLI, UI, or API directly.

flowchart TB
    subgraph UI["1. Developer Interface Layer"]
        A1["CLI"]
        A2["UI Portal"]
        A3["API Gateway"]
    end
    subgraph Core["2. Platform Core Services"]
        B1["Service Blueprint Engine"]
        B2["Infra Blueprint Engine"]
        B3["Pipeline Orchestrator"]
    end
    subgraph Policy["3. Policy & Compliance Layer"]
        C1["OPA Policy Engine"]
        C2["Input Validators"]
        C3["Security Scanners"]
    end
    subgraph Runtime["4. Provisioning & Runtime Layer"]
        D1["Infra Provisioning"]
        D2["Service Deployment"]
        D3["Environment Orchestrator"]
    end
    subgraph Obs["5. Observability & Audit Layer"]
        E1["Telemetry Hooks"]
        E2["Audit Trail"]
        E3["Cost Monitor"]
    end
    A1 --> A3
    A2 --> A3
    A3 --> B1
    A3 --> B2
    A3 --> B3
    B1 --> C1
    B2 --> C1
    B3 --> C2
    B3 --> C3
    C1 --> D1
    C2 --> D1
    C3 --> D2
    B1 --> D2
    D1 --> E3
    D2 --> E1
    D2 --> E2

Layer 1: Developer Interface

This is the surface your developers interact with. Three interfaces, one underlying API.

  • API Gateway — the primary surface. All capabilities are exposed via versioned APIs. This is the backbone that ensures consistency across all interfaces.
  • CLI Tool — for power users and CI/CD integrations. Think dip create, dip deploy, dip monitor. Uses the same API as everything else.
  • UI Portal — for visual workflows, service discovery, scaffolding, and dashboards. Ideal for onboarding and service browsing.

The guiding principle: every capability is API-accessible, CLI-controllable, and UI-visible.

Layer 2: Platform Core Services

This is where the intelligence lives.

  • Service Blueprint Engine — provides golden templates for common service types. A developer says “I need a REST API service” and gets a fully scaffolded project with CI/CD, logging, tracing, and compliance hooks pre-wired.
  • Infrastructure Blueprint Engine — manages reusable, versioned modules for infrastructure provisioning. Databases, queues, caches, storage — all defined as curated blueprints that abstract away cloud-specific details.
  • Pipeline Orchestrator — standardizes CI/CD pipelines across teams. Shared templates with hooks for security scanning, policy checks, and automated testing.

Layer 3: Policy & Compliance

Governance is built into the platform, not bolted on top.

  • Policy-as-Code Engine — uses OPA (Open Policy Agent) or similar tools for runtime policy evaluation. Rules like “all storage must be encrypted at rest” or “no public-facing resources without TLS” are evaluated automatically during provisioning and deployment.
  • Input Validators — catch bad requests early. Before a blueprint even executes, parameters are validated against schemas, allowed values, and organizational constraints.
  • Security Scanners — embedded into every pipeline. SAST, container scanning, IaC scanning — all running automatically as part of the golden path.

Layer 4: Provisioning & Runtime

This layer executes the actual infrastructure and service deployments.

  • Infra Provisioning Controller — manages infrastructure lifecycle via Terraform, Crossplane, or Pulumi. Handles creation, updates, drift detection, and teardown.
  • Service Deployment Controller — integrates with GitOps tools (ArgoCD, Flux) or CI/CD runners for application deployment across environments.
  • Environment Orchestrator — handles multi-environment support, cloud-region mapping, and on-prem orchestration. Developers spin up environments through the platform; this layer handles the complexity underneath.

Layer 5: Observability & Audit

Every service deployed through the platform comes with observability pre-configured.

  • Telemetry Hooks — logging, metrics, and tracing are baked into blueprints via OpenTelemetry, Prometheus, and Grafana. Developers don’t configure monitoring — it’s already there.
  • Audit Trail Collector — every action through the platform is logged. Who provisioned what, when, from where. Critical for compliance and debugging.
  • Cost & Usage Monitor — auto-tags resources and links them to cost dashboards. Teams see their spend in real time, not at the end of the quarter.

2. The Operating Model

Architecture is one thing. How people operate within it is another. This is where “you build it, you run it” either works or falls apart.

Role-Based Ownership

RoleResponsibilitiesOwns
Developer TeamsBuild, deploy, and operate services end-to-end. Consume platform APIs, templates, and pipelines. Define SLOs, monitor performance, manage incidentsService code, infra blueprints used by their service, alerts, dashboards
Platform Engineering TeamBuild and maintain the internal developer platform. Curate and version blueprints. Own API, CLI, and UI layers. Implement policy-as-code. Maintain developer docs and onboardingPlatform core, shared templates, policy bundles, governance tooling
Security & RiskDefine compliance, security, and data protection policies. Review and audit policy bundles with platform teamRegulatory inputs, escalation triggers, risk scoring logic
SRE (Optional / Embedded)Guide dev teams in adopting SLOs and resilience patterns. Help establish observability standards. Coach teams through incident reviewsReliability practices, SLO/SLI models, observability standards

The platform team is not an operations team. They don’t operate your services. They don’t provision infrastructure for your teams. They build and maintain the platform that enables everyone else to do those things autonomously. Get this distinction wrong, and you’ve just renamed your ops team.

On-Call & Incident Ownership

LayerOn-Call OwnerNotes
App / Service RuntimeDeveloper teamAlerts configured via platform, tied to service-level SLOs
Blueprint / Platform IssuesPlatform teamBugs in templates, deploy flow, policy logic, API failures
Infra FailuresPlatform team (initial), cloud escalation if neededProvisioning failures, limits, misconfiguration
Security IncidentsSecurity team, with dev team coordinationAlerts from scanning, policies, or external events

When something breaks at 2am, the team that built the service owns the first response. The platform team handles platform-level issues — blueprint bugs, provisioning failures, policy engine outages.

Proactive Support Model

“You build it, you run it” doesn’t mean “you’re on your own.” The platform team provides:

  • Office hours — regular sessions for onboarding and problem-solving
  • Support channels — #ask-platform, #policy-help, #blueprint-feedback
  • Embedded champions — platform team members temporarily embedded with key product squads during early adoption phases

Platform as Product: Ways of Working

The platform team operates as a product team, not a service desk.

PracticeHow It’s Done
Customer DiscoveryBiweekly feedback sessions with dev teams, friction log reviews
Backlog ManagementKanban or dual-track agile — prioritized by adoption, internal demand, and risk
Release ManagementVersioned blueprint releases, changelogs, backwards compatibility guarantees
Adoption MetricsTrack services onboarded, time-to-deploy, pipeline usage, developer satisfaction
DocumentationFirst-class artifact — built into CLI/UI flows and auto-generated from blueprints
flowchart TD
    A["Developer Teams"] -->|"Uses"| B["Platform APIs & CLI/UI"]
    A -->|"Owns"| C["Services & Pipelines"]
    B -->|"Built by"| D["Platform Team"]
    C -->|"Integrated with"| E["Observability Layer"]
    D -->|"Collaborates with"| F["Security Team"]
    F -->|"Defines"| G["Policy-as-Code Bundles"]
    D -->|"Maintains"| G
    A -->|"Consumes"| G
    A -->|"Alerts to"| H["On-call Rotations"]
    D -->|"Supports"| H

🔐 3. Governance-as-Code

If you’re in a regulated industry — fintech, banking, insurance — governance isn’t optional. But it also shouldn’t be a bottleneck. I’ve seen organizations where a compliance review adds two weeks to every deployment. That’s not governance. That’s a queue.

The platform adopts a governance-as-code model that enforces security, compliance, and best practices at multiple layers, automatically. Developers move fast within safe guardrails. Security teams get assurance. Auditors get logs. No manual gates.

Three Enforcement Layers

LayerDescriptionTools / Methods
Input ValidationParameters validated before infra or service creationJSON schema, regex, CLI validators, UI constraints
Policy-as-CodeDynamic rules evaluated during blueprint execution and deploymentOPA / Gatekeeper, Conftest
Pipeline HooksAll CI/CD pipelines include required scanning and loggingSnyk, Trivy, Checkov, custom admission controllers

Policy Bundle Design

Policies are organized as modular bundles and managed like application code — versioned, tested, and deployed with changelogs.

policy-bundle/
├── terraform/
│   ├── enforce-encryption.rego
│   ├── restrict-instance-types.rego
├── kubernetes/
│   ├── restrict-host-paths.rego
│   └── enforce-resource-limits.rego
├── cicd/
│   ├── block-default-branch-push.yaml
│   └── enforce-security-scan.yaml
├── metadata/
│   ├── version.json
│   └── policy-owners.yaml

Governance Lifecycle

StepDescription
AuthoringPlatform team and security team collaborate to define policy logic
TestingPolicies tested against sandbox blueprints and sample inputs
ReleasingVersioned and rolled out via changelog and flagging system
MonitoringViolations logged with dashboards showing policy hits/failures
ReviewingPeriodically audited and reviewed with stakeholders

Policy Enforcement Flow

flowchart TD
    A["Dev initiates service / infra request"] --> B["Input Validator"]
    B -->|"Valid"| C["Policy Engine (OPA)"]
    B -->|"Invalid"| X["Error: Missing or Invalid Input"]
    C -->|"Compliant"| D["Provision Infra / Deploy App"]
    C -->|"Violation"| Y["Reject + Policy Violation Log"]
    D --> E["Trigger CI/CD Pipeline"]
    E --> F["Run Security Scanners"]
    F -->|"Passed"| G["Deploy to Environment"]
    F -->|"Failed"| Z["Block + Report to Dev + Audit Log"]
    D --> H["Auto-tagging + Audit Trail"]
    style X fill:#7f1d1d,color:#e7e5e4
    style Y fill:#7f1d1d,color:#e7e5e4
    style Z fill:#7f1d1d,color:#e7e5e4
    style G fill:#14532d,color:#e7e5e4

This flow enforces policies without blocking workflows. Developers get fast feedback. Security teams get visibility. Auditors get logs. No one waits in a queue.

Common Policies

CategoryPolicy Example
SecurityAll storage must be encrypted at rest; no public-facing resources without TLS
Cost ManagementAll resources must be tagged with cost-center and env
ResilienceAll services must define liveness and readiness probes
DeploymentNo pushes directly to default branch; PR checks must pass before deploy
Infra ProvisioningOnly approved regions, instance types, and services may be used

What’s Next

We’ve covered the architecture (how it’s built), the operating model (how it’s run), and governance (how compliance is embedded). The platform is designed. Now it needs to be built.

In Part 3, we cover the capability evolution roadmap — how to go from a bootstrap MVP to an autonomous, AI-assisted platform — and the tech stack recommendations for each layer.


/ Unni