-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix docCount
does not increase when visiting all Docs in `ApproximatePointRangeQuery
#18337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix docCount
does not increase when visiting all Docs in `ApproximatePointRangeQuery
#18337
Conversation
70c008c
to
7400899
Compare
❌ Gradle check result for 7400899: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@@ -177,9 +177,10 @@ public void visit(DocIdSetIterator iterator) throws IOException { | |||
|
|||
@Override | |||
public void visit(IntsRef ref) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kkewwei, this change might impact the results of desc_sort's as now we are not adding the entire collected batch of docs.
curl -XPOST -H'Content-type: application/json' -kv localhost:9200/big5/_search -d '{
"query": {
"match_all": {}
},
"sort" : [
{"@timestamp" : "desc"}
]
}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just testing if my understanding was right and yes this impacts the desc
sort results. If we want to restrict the doc count with visit(IntsRef ref)
we should consider removing from the end of the batch for desc sorts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using desc
, only a few BKD
leaves are visited, whereas asc
triggers a scan of all leaves in the BKD
tree, leading to significant slowness. I believe we should leverage docCount
to minimize excessive document accesses, which aligns with the original purpose of docCount
. This approach ensures that both desc
and asc
access the same documents in the BKD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can control the excessive document access, but what I was saying is this code change changes the search results for desc sorts. I have tested with big5 workload using the following query
curl -XPOST -H'Content-type: application/json' -kv localhost:9200/big5/_search -d '{
"query": {
"match_all": {}
},
"sort" : [
{"@timestamp" : "desc"}
]
}' | jq '.'
The output of search hits is not the same with original code and with this change.
The intersectRight
visits tree nodes from right to left and each leaf node still contains documents sorted in ascending order. To fix the search results change issue, when processing a batch via visit(IntsRef ref), we should ideally process from the end of the batch for descending order (if we really want to control the docs and not change the search results).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding @msfroh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why it is not same with original code is that I haven't controlled the order of visiting BKD
nodes yet. Yes, it should be as you described:
we should ideally process from the end of the batch for descending order (if we really want to control the docs and not change the search results).
If this approach seems acceptable to you, I will proceed with the fix accordingly.
…tePointRangeQuery` Signed-off-by: kkewwei <[email protected]> Signed-off-by: kkewwei <[email protected]>
7400899
to
062fe64
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18337 +/- ##
============================================
+ Coverage 72.51% 72.58% +0.06%
- Complexity 67335 67427 +92
============================================
Files 5488 5488
Lines 311069 311070 +1
Branches 45219 45219
============================================
+ Hits 225569 225782 +213
+ Misses 67069 66887 -182
+ Partials 18431 18401 -30 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
closing in favor of #18358 |
Description
[Describe what this change achieves]
Related Issues
Resolves #18313
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.