-
Notifications
You must be signed in to change notification settings - Fork 511
Trust bundle missing active signing keys on some clusters #6083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For completeness, persisting /data directory is irrelevant. Tested in DEV, hit the same issue whenever the server is restarted, even when keys are found on startup.
Correct bundle contains 29 keys. On any server: On the restarted server is missing one As @sorindumitru and @evan2645 suspected, it will be broken until server prepares the first JWT key. @sorindumitru suggested to upgrade to v1.11 and on the server missing keys, issue I can confirm this fixes it. However, it might not be a valid workaround for our use case. That command injects the prepared key into the bundle, making the bundle larger on each restart. The problem is that we use SPIRE for OIDC federation with AWS: https://spiffe.io/docs/latest/keyless/oidc-federation-aws/ and the /keys endpoint can't return more than 100 keys, otherwise AWS won't be able to verify the token. From https://repost.aws/knowledge-center/iam-sts-invalididentitytoken
Is there a way to reduce the number of keys in the JWKS? |
Thanks @IvMdlc for confirming restarts trigger the issue. I'm working on a fix for this and will open a PR to address it. I'm afraid there's no good way to make sure you don't end up with more than 100 keys in the bundle. As you mentioned on slack, you may be able to do that by revoking old keys. Just make sure you wait some time after you prepare a new one so that the bundle update propagates to workloads. |
Thanks @sorindumitru for working on a fix. While experimenting with the localauthority, I observed that when I taint and revoke a key on a spire-server, the key is immediately removed from the bundle visible to the cluster hosting the spire-server. However, other clusters continue to see a bundle that includes the key, and it is only removed some hours after the key expires. Not an issue for us, although I can raise a separate issue if you think it's worth investigating. |
Hi, coming from this thread https://spiffe.slack.com/archives/C7XDP01HB/p1747864066874729 where @evan2645 and @sorindumitru asked me to raise an issue.
We have a root cluster and 8 regional clusters, all of them running on EC2 instances and AWS Postgres as datastore. A few days ago we terminated/created all the EC2 instances in a rolling fashion, one node at a time, as we’ve done before. DBs are always untouched. The /data directory is gone when servers startup as we don’t persist it, so servers can’t find keys and they create new ones.
We've noticed that some clusters are missing signing keys:
on root cluster:
$ spire-server bundle show -format spiffe | grep kid | wc -l
58
on 5 out of 8 regional clusters:
$ spire-server bundle show -format spiffe | grep kid | wc -l
58
and on the other 3:
$ spire-server bundle show -format spiffe | grep kid | wc -l
53
We are running 1.10.4
Thank you.
The text was updated successfully, but these errors were encountered: