ARCHITECTURE2025-03-27BY ELI NAVARRO

Q&A: centralize or embed platform capabilities?

How we decide whether capabilities like auth, flags, or logging live in shared platforms or in individual services.

architectureplatformsownershipreliability

Q&A

Why centralize anything at all?

Centralizing capabilities like authentication, feature flags, or logging:

reduces duplicate work
creates consistent behavior across services
gives us one place to enforce policies and observability

Without some centralization, every team re-learns the same hard lessons.

Why not centralize everything?

Because platforms have costs:

they become critical dependencies
they can become bottlenecks for change
they require dedicated ownership and on-call

If we centralize too aggressively, we:

slow teams down
make outages wider when platforms fail

How do we decide what belongs in a shared platform?

We ask a few questions:

Is this capability naturally cross-cutting? (auth, flags, logging usually are.)
Do multiple teams need the same behavior?
Does centralization make operations safer or more observable?

If the answer is "yes" to these, we lean toward a shared platform.

When is it better to embed capability in a service?

We keep things embedded when:

the behavior is highly specific to one domain
the blast radius of mistakes is small and well-understood
the service can own its SLOs without a central dependency

Examples:

domain-specific caching strategies
one-off data transforms that don’t generalize

How do SLOs influence the decision?

Shared platforms need:

clear SLOs that match or exceed dependent services’ needs
a plan for degradation when they’re unhealthy

If we can’t meet those SLOs, we:

reconsider centralization
or design services to degrade gracefully when the platform is down

How do we avoid platforms becoming "one more big ball of mud"?

We:

define clear boundaries and responsibilities
version and document contracts (APIs, data, guarantees)
avoid stuffing every unrelated capability into the same platform

If a proposed feature doesn’t fit the platform’s mission, we:

build it closer to the consuming service
or create a new, focused platform if it truly is cross-cutting

What about incident response?

Platforms change how we respond to incidents:

platform issues can affect many services at once
platform dashboards and runbooks become central tools

We make sure platforms have:

their own on-call rotation
clear communication channels with consumers
"blast radius" docs that explain who is impacted when they fail

Takeaways

Centralization makes sense for truly cross-cutting capabilities with many consumers.
Platforms need strong SLOs, ownership, and clear contracts to be worth the dependency.
Not everything belongs in a platform; small, domain-specific behaviors often should stay embedded.
Thinking about SLOs and incident blast radius early helps avoid building platforms that are bigger risks than the problems they solve.