Graduate KEP-2340 to Stable #5330

serathius · 2025-05-22T15:41:03Z

Ref #2340
/assign @jpbetz @wojtek-t @deads2k

deads2k · 2025-05-22T17:18:51Z

Thank you for the detail qualification notes.

/lgtm
/approve

k8s-ci-robot · 2025-05-22T17:18:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [deads2k]
~~keps/sig-api-machinery/OWNERS~~ [deads2k]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

serathius · 2025-05-22T18:24:20Z

/cc @liggitt

wojtek-t

the PRR looks ok to me, but I'm not fully aligned with the remove fallback part.

wojtek-t · 2025-05-23T07:00:05Z

keps/sig-api-machinery/2340-Consistent-reads-from-cache/README.md

+With qualification results showing that fallback is needed and we can go back to the original design.
+We should fail the requests and rely on rate limiting to prevent cascading failure.  I.e. `Retry-After` HTTP header (for
+well-behaved clients) and [Apiserver Priority and Fairness](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md).
+The main reason for that is the added complexity and incorrect handling in APF that assumes that request cost doesn't change.


I'm actually not 100% convinced here. I agree that incorrect APF handling imposes some risk, but on the other hand, if watchcache is lagging by 3s+ all the time, after removing fallback, you will not be able to make a consistent list at all.

I would personally be more willing to take the APF risk here (especially we didn't see any materialization of it) and not expose us to a different risk of not being to perform consistent lists at all.

@deads2k @jpbetz

Also @liggitt

My perspective is that APF handling presents a more immediate and critical threat. Fallbacks are predominantly observed during etcd upgrades, which logically leads to watch breakage and necessitates watch cache reinitialization. Requests caught during processing will fall back to etcd before the cache starts blocking requests (due to it reinitializing), with their cost estimate being based on the cache.

Consider scenarios where a significant number of these requests fall back during an upgrade. If these requests, instead of being properly deferred by APF, cause the memory usage of LIST requests to surge to hundreds of megabytes each, it could easily trigger a massive memory consumption spike. I've personally identified cases with over 100 such fallbacks to etcd. Upgrades are already disruptful, adding fallback risks further instability due to OOMs.

Therefore, I advocate that we prioritize APF safety and API server availability during planned operations like upgrades, rather than accepting the potential for degraded LIST availability.

I'm not saying there is no risk due to APF. I'm saying that you're completely ignoring the other side.

If my watchcache is lagging by 3s+ (and you can get to this state purely because of thundering herd of writes) - then you no longer can perform any consistent list. You can't debug, you can't reinitialize components.
I don't think we can just ignore that.

If you're worried about planned operations - I'm fairly sure there are actually other options we can make.
We generally can distinguish those - because then the watch breaks and watchcache switches to not-ready state.
We can easily distinguish this situation and in case watchcache is not-ready - fail the request instead of fallback.
Or some variation of it.

I just don't think we can just remove fallback for every case.

I'm not saying there is no risk due to APF. I'm saying that you're completely ignoring the other side.

I'm prioritizing APF, because I'm not comfortable leaving a loophole. Overall I think we should be stricter about memory management in apiserver, and by leaving such fallback we will never be able to keep apiserver memory in check. I have seen cases where apiserver memory grew from 10 to over 400GB in seconds. Such cases are worse than you described, because they lead to OOMs and full unavailability, there is nothing left to debug. I agree with the original proposal that unavailability of consistent reads has less impact than cascading failure.

I agree with the original proposal that unavailability of consistent reads has less impact than cascading failure.

What changed compared to that moment is that there was supposed to be a way to force list from etcd. So you had a way to list in case of issues.
With no explicit way to request list from etcd, we lost that.

I'm not feeling comfortable with the APF issue either - but I know what would be first AI if we face an incident that I described - it would be "add fallback". Fallback isn't great - but I think we need some solution, than just ignoring the problem.

Ok, I reanalysed the fallback again. Looks like upgrades are not the main trigger. It's hard to identify a single cause. Based on that I think we should just graduate what we have as it works well. We can leave further revisions for the future.

k8s-ci-robot assigned deads2k, jpbetz and wojtek-t May 22, 2025

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 22, 2025

k8s-ci-robot requested review from apelisse and fedebongio May 22, 2025 15:41

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 22, 2025

serathius mentioned this pull request May 22, 2025

[Beta: 1.31] Consistent Reads from Cache #2340

Open

13 tasks

Graduate KEP-2340 to Stable

24fe732

serathius force-pushed the kep-2340-stable branch from 34b514c to 24fe732 Compare May 22, 2025 15:43

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 22, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 22, 2025

k8s-ci-robot merged commit 397f701 into kubernetes:master May 22, 2025
4 checks passed

k8s-ci-robot added this to the v1.34 milestone May 22, 2025

k8s-ci-robot requested a review from liggitt May 22, 2025 18:24

wojtek-t reviewed May 23, 2025

View reviewed changes

serathius mentioned this pull request May 27, 2025

Revert decision to return retry-after based on feedback #5345

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Graduate KEP-2340 to Stable #5330

Graduate KEP-2340 to Stable #5330

serathius commented May 22, 2025 •

edited

Loading

Uh oh!

deads2k commented May 22, 2025

Uh oh!

k8s-ci-robot commented May 22, 2025

Uh oh!

Uh oh!

serathius commented May 22, 2025

Uh oh!

wojtek-t left a comment

Uh oh!

wojtek-t May 23, 2025

Uh oh!

wojtek-t May 23, 2025

Uh oh!

serathius May 23, 2025 •

edited

Loading

Uh oh!

wojtek-t May 23, 2025

Uh oh!

serathius May 23, 2025 •

edited

Loading

Uh oh!

wojtek-t May 26, 2025

Uh oh!

serathius May 27, 2025

Uh oh!

Uh oh!

Graduate KEP-2340 to Stable #5330

Graduate KEP-2340 to Stable #5330

Conversation

serathius commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deads2k commented May 22, 2025

Uh oh!

k8s-ci-robot commented May 22, 2025

Uh oh!

Uh oh!

serathius commented May 22, 2025

Uh oh!

wojtek-t left a comment

Choose a reason for hiding this comment

Uh oh!

wojtek-t May 23, 2025

Choose a reason for hiding this comment

Uh oh!

wojtek-t May 23, 2025

Choose a reason for hiding this comment

Uh oh!

serathius May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wojtek-t May 23, 2025

Choose a reason for hiding this comment

Uh oh!

serathius May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wojtek-t May 26, 2025

Choose a reason for hiding this comment

Uh oh!

serathius May 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

serathius commented May 22, 2025 •

edited

Loading

serathius May 23, 2025 •

edited

Loading

serathius May 23, 2025 •

edited

Loading