Skip to content

[Query Cache] Add a dynamic cluster setting to change skip cache factor #17736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sgup432 opened this issue Mar 31, 2025 · 8 comments · May be fixed by #18351
Open

[Query Cache] Add a dynamic cluster setting to change skip cache factor #17736

sgup432 opened this issue Mar 31, 2025 · 8 comments · May be fixed by #18351
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance v3.0.0 Issues and PRs related to version 3.0.0

Comments

@sgup432
Copy link
Contributor

sgup432 commented Mar 31, 2025

Is your feature request related to a problem? Please describe

Recently, a change was added in lucene LRU cache where we can dynamically change skip cache factor to increase the cache utilization. Someone had posted performance benefits here by relaxing skip cache factor limits and utilizing cache more.

We can add a cluster setting at our end to achieve this. Also explore in what cases(to what values) we can use this to achieve benefits.

Describe the solution you'd like

Add a dynamic cluster setting

Related component

No response

Describe alternatives you've considered

No response

Additional context

No response

@sgup432 sgup432 added enhancement Enhancement or improvement to existing feature or request untriaged labels Mar 31, 2025
@kkhatua kkhatua added the v3.0.0 Issues and PRs related to version 3.0.0 label Mar 31, 2025
@sandeshkr419
Copy link
Contributor

[Search Triage] That will be great, lets see if someone from community is interested in working on this.

@kkewwei
Copy link
Contributor

kkewwei commented Apr 3, 2025

[Search Triage] That will be great, lets see if someone from community is interested in working on this.

I'd great to support this feature when the new lucene is released, If @sgup432 agrees.

@sgup432
Copy link
Contributor Author

sgup432 commented May 19, 2025

Also, in addition to this, we need to expose a dynamic setting to update the frequency based caching policy for QueryCache.
As of now, it only caches query when it has seen the same query for 5 times, see here.

Exposing this setting will give us an opportunity to update the value to a lower number, thereby increasing cache utilization if needed.

@kkewwei
Copy link
Contributor

kkewwei commented May 20, 2025

@sgup432 I attempted to make the frequency value dynamic but failed. While I can override the minFrequencyToCache method, I'm unable to reuse the isCostly method since it's not declared as public.

If we want to make the frequency value dynamic, there seems one way to implement:

  • Disregard the "costly" classification.
           cachingPolicy = new UsageTrackingQueryCachingPolicy() {
                @Override
                protected int minFrequencyToCache(Query query) {
                    int minFrequency = this.minFrequency;
                    if (query instanceof BooleanQuery || query instanceof DisjunctionMaxQuery) {
                        --minFrequency;
                    }

                    return Math.max(0, minFrequency);
                }
            };

I'm uncertain whether this is the optimal implementation path or if maintaining the frequency status quo is preferable.

@sgup432
Copy link
Contributor Author

sgup432 commented May 20, 2025

@kkewwei I also believe that is the right/only approach ie passing our implementation of UsageTrackingQueryCachingPolicy along with frequency, which essentially would be defined as a dynamic cluster setting. If you want, you can raise a PR, otherwise I will get to this in few days.

@sgup432
Copy link
Contributor Author

sgup432 commented May 20, 2025

@kkewwei While checking more, I see there are two ways to get around isCostly method

  1. Skip it, and have a single frequency value for all types of queries via setting.
  2. Define our own way in OpenSearch to define a costly query and provide extension point if needed. We can also try asking lucene community to make it public/extensible in parallel.

Option#2 might be a better way to keep the existing functionality.

@kkewwei
Copy link
Contributor

kkewwei commented May 21, 2025

@kkewwei While checking more, I see there are two ways to get around isCostly method

  1. Skip it, and have a single frequency value for all types of queries via setting.
  2. Define our own way in OpenSearch to define a costly query and provide extension point if needed. We can also try asking lucene community to make it public/extensible in parallel.

Option#2 might be a better way to keep the existing functionality.

@sgup432 when definingisCostly, I previously worried that it would not align with lucene, for example, MultiTermQueryConstantScoreWrapper and MultiTermQueryConstantScoreWrapper are not public, making it inaccessible in OpenSearch.

However, I now realize we can simply check the class name instead of directly referencing MultiTermQueryConstantScoreWrapper and MultiTermQueryConstantScoreWrapper.

If you think it's feasible, I'll do it this way.

@sgup432
Copy link
Contributor Author

sgup432 commented May 21, 2025

Yeah I personally think that having a custom caching policy for OpenSearch should be fine, given that lucene doesn't give the desired extensibility at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance v3.0.0 Issues and PRs related to version 3.0.0
Projects
Status: 🆕 New
5 participants