[spire-agent] improve "outdated entries" detection logic in LRU cache #6011
Labels
help wanted
Issues with this label are ready to start work but are in need of someone to do it
priority/backlog
Issue is approved and in the backlog
Uh oh!
There was an error while loading. Please reload this page.
Problem
We're seeing issues with SPIRE agent requesting an updated SVID for the same entry multiple times in the span of few minutes, with logs showing "Renewing X.509 SVID" on agent.
Our SPIRE server deployment -
Our SPIRE agent deployment -
Since we have event based cache hydration enabled, whenever an entry is updated, calls going from agent for GetAuthorizedEntries to different server instances end up return different revisions for the updated entry. Since agent receives toggled revision (old/new) for every call, this results in agent marking the entry as outdated because the currently the agent only checks for revision mismatch, instead of new revision -
existingEntry.RevisionNumber != newEntry.RevisionNumber
.This outdated entry is then marked as stale as we have the same logic in the callback function. The stale entry is then force-renewed by the cache, resulting in multiple signings of the same SVID on the server. This continues to happen until all event-based caches reconcile to the new revision.
Proposed solution
To accommodate for split-cache on server, agent should expect a server instance to return an older revision for an entry, and only mark the current entry as outdated if it receives a higher revision. Updates needed in the following places, replacing the revision check from 'existing != new' to 'existing < new' -
The text was updated successfully, but these errors were encountered: