Can't fetch cached JWT-SVIDs when spire-server is unresponsive #5994

sorindumitru · 2025-04-07T15:06:02Z

spire-agent maintains a cache of JWT-SVIDs (and X509-SVIDs, but those work differently) to serve repeated requests for the same JWT-SVID and to deal with spire-server unavailability.

When a cached JWT-SVID is past 50% of its TTL the agent will first try to request a new one from the server before falling back to the cached SVIDs. All (or most) RPCs the agent makes have a global timeout of 30 seconds applied. So the request to fetch a JWT-SVID may take up to 30 seconds to timeout if the server is unresponsive.

This is not the best, but I think there's a bigger issue here. If a client connects with a timeout smaller than 30 seconds the whole request ends up being cancelled by the client disconnecting so the agent doesn't get a chance to respond with the client. So a client connecting with a timeout of 5 seconds, for example, will just see all the requests timing out even though the cached SVID has enough lifetime left to be useful.

Some ways to deal with this:

Apply a smaller timeout to the NewJWTSVID request when a cached entry exists and is valid, for example 1 second. It won't help every client, but it should help a lot of them. (small change)
Always return the cached SVID and schedule the JWT-SVID for asynchronous refresh. (bigger change)

amartinezfayo · 2025-04-10T19:24:06Z

Thank you @sorindumitru for raising this.

Apply a smaller timeout to the NewJWTSVID request when a cached entry exists and is valid, for example 1 second. It won't help every client, but it should help a lot of them. (small change)

I agree this could be a good improvement. I personally think that a 1-second timeout might be a bit too aggressive in some environments. I'd suggest something in the range of 3–5 seconds instead.

sorindumitru added the triage/in-progress Issue triage is in progress label Apr 10, 2025

amartinezfayo added priority/backlog Issue is approved and in the backlog and removed triage/in-progress Issue triage is in progress labels Apr 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't fetch cached JWT-SVIDs when spire-server is unresponsive #5994

Can't fetch cached JWT-SVIDs when spire-server is unresponsive #5994

sorindumitru commented Apr 7, 2025

amartinezfayo commented Apr 10, 2025

Uh oh!

Can't fetch cached JWT-SVIDs when spire-server is unresponsive #5994

Can't fetch cached JWT-SVIDs when spire-server is unresponsive #5994

Comments

sorindumitru commented Apr 7, 2025

amartinezfayo commented Apr 10, 2025

Uh oh!