Decision record: Moving secrets out of env vars
We decided to move application secrets out of long-lived environment variables and into a managed secrets system.
Context
Our applications, like many, started with configuration in environment variables.
This included:
- non-sensitive configuration (feature toggles, URLs, numeric limits)
- sensitive data (API keys, database passwords, encryption keys)
Over time, this pattern accumulated sharp edges:
- secrets were long-lived and rotated infrequently
- it was hard to audit where a secret was used
- changing a secret often required a full redeploy
- different teams managed environment variables in slightly different ways
The operational cost showed up during incidents:
- we could not quickly rotate a compromised credential
- we didn’t have a single view of which services depended on which secrets
- local development and staging sometimes used production-like secrets by accident
We needed a more disciplined approach.
Decision
We decided to move sensitive application secrets out of long-lived environment variables and into a managed secrets system.
Concretely:
- Secrets are stored in a dedicated secrets manager, not in application configs or deployment manifests.
- Applications retrieve secrets at startup (or on demand) via authenticated calls to the secrets system.
- Environment variables remain for non-sensitive configuration and for references (e.g., secret names), not for secret values themselves.
We evaluated several options and chose one that:
- integrated with our existing identity and access management
- supported per-service access control
- provided auditing and rotation primitives
The specific tool matters less than the properties: auditable, revocable, and scriptable.
Consequences
Upsides
- Easier rotation. We can rotate secrets centrally and roll them out without rebuilding images or editing multiple config files.
- Better auditing. We have logs of when and where secrets are accessed.
- Tighter access control. Each service has a scoped identity that grants access only to the secrets it needs.
- Safer development environments. We can provision lower-privilege secrets for non-production use without copying production values.
Downsides / costs
- Operational complexity. Applications must handle the failure modes of the secrets system (e.g., transient unavailability).
- Migration work. We needed to:
- identify all existing secrets in environment variables
- move them into the secrets system
- update applications to fetch them correctly
- Bootstrap questions. The system that fetches secrets needs its own trust path (e.g., an instance identity or initial credential).
Guardrails
To keep the new system from becoming another source of drift, we set a few rules:
- New services must use the secrets system from the start.
- Adding or changing a secret requires updating a small inventory document that maps secrets to services.
- Incident runbooks include a "secrets" section:
- where the secret lives
- how to rotate it
- how to verify the new value is in use
We also decided not to move everything:
- Non-sensitive configuration stays in environment variables for simplicity.
- We avoid using the secrets system for values that change constantly or are better represented as data in a database.
This decision makes some workflows more explicit, but it pays off during the rare but critical moments when a secret must change quickly and safely.