DESIGN2023-03-06BY MARA SABOGAL

Design: making retries visible in the UI

Patterns we use so automatic retries feel predictable and honest instead of random and frustrating.

designreliabilityretriesux

Retries are one of our favorite reliability tools.

They are also a common source of user confusion.

From the system’s perspective:

we see transient errors and timeouts
we add retries with backoff
graphs smooth out

From the user’s perspective, retries can look like:

a button that sometimes works and sometimes doesn’t
a spinner that hangs for a while and then resolves
duplicate actions they didn’t mean to trigger twice

This post is about how we design UI around retries so they feel predictable and honest.

Constraints

We already had retries implemented in many places.
We didn’t want to expose all the internal complexity.
Some flows were more sensitive (payments, irreversible actions) than others.

What we changed

We focused on three questions:

What does the user see while we’re retrying?
How do we avoid doing the same work twice?
What do we tell the user when we stop?

1. Show that work is in progress, not stuck

For actions where we retry automatically, we:

show a clear "in progress" state (e.g., "Saving…" rather than a generic spinner)
avoid disabling all feedback; buttons may be disabled, but we show text and, when appropriate, a small status indicator

If the action might take more than a couple of seconds, we:

use language that sets expectations ("This can take up to a minute")
consider exposing some sense of attempts ("Still trying…") without numbers that imply guarantees

2. Make actions idempotent from the user’s point of view

We design flows so that:

pressing a button twice doesn’t run the action twice in a harmful way
automatic retries reuse the same request identity under the hood

In the UI, this means:

we show the action as "pending" rather than making the user guess whether clicking again is safe
we prevent obvious double-submits by:
- disabling the primary action while work is active
- showing a clear way to cancel, when safe

3. Be honest when we give up

Retries can’t last forever.

When we stop, we:

tell the user what happened in plain language
make a concrete suggestion ("You can try again" or "We’ll keep trying and email you")
avoid implying that the action may or may not have partially succeeded without clarifying what we know

Where possible, we:

show the current state of the underlying entity ("Your change has not been saved")
log enough context that support can tell what happened later

4. Separate user errors from system errors

Retries don’t help with user mistakes.

We distinguish:

errors where another attempt might succeed (network, transient server issues)
errors where the input is invalid or the action is blocked (permissions, validation)

The UI reflects this:

user errors get specific, actionable messages and no automatic retries
system errors get bounded retries and clear status copy

This keeps users from feeling like the system is "just trying again" on input they can fix.

5. Make retry behavior discoverable for support

Support often gets the first report when retries behave badly.

We surfaced retry behavior in internal tools:

a short note in the support view for key flows ("We retry this action up to N times over M minutes")
recent status for retry-heavy actions ("Last attempt at HH:MM", "Next scheduled retry")

This helps support:

set expectations with users
know when it’s safe to advise "try again" vs escalate

Results / Measurements

We looked at a few signals after updating our retry patterns in key flows:

support tickets that boiled down to "did my click work?" dropped
we saw fewer duplicate submissions in flows where we added clearer pending states and idempotent handling
user research sessions showed less confusion around long-running operations once the UI set better expectations

These weren’t dramatic numbers, but they were enough to confirm that the design changes were moving us in the right direction.

Takeaways

Retries are a UX concern, not just a backend implementation detail.
Users need to see that the system is trying, not stuck.
Idempotent actions and clear pending states prevent double work.
Honest messages when we stop retrying build more trust than ambiguous failures.