Skip to content

Fix docCount does not increase when visiting all Docs in `ApproximatePointRangeQuery #18337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

kkewwei
Copy link
Contributor

@kkewwei kkewwei commented May 19, 2025

Description

[Describe what this change achieves]

Related Issues

Resolves #18313

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 7400899: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei kkewwei marked this pull request as ready for review May 19, 2025 15:57
@kkewwei
Copy link
Contributor Author

kkewwei commented May 19, 2025

It seems not easy to add unit test, Lucene will delegate PointValues.IntersectVisitor to AssertingIntersectVisitor, but AssertingIntersectVisitor doesn't delegate visit(IntsRef ref) to ApproximatePointRangeQuery$PointValues.IntersectVisitor, the change will never be covered in the ut.

image

@@ -177,9 +177,10 @@ public void visit(DocIdSetIterator iterator) throws IOException {

@Override
public void visit(IntsRef ref) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kkewwei, this change might impact the results of desc_sort's as now we are not adding the entire collected batch of docs.

curl -XPOST -H'Content-type: application/json' -kv localhost:9200/big5/_search -d '{
  "query": {
    "match_all": {}
  },
  "sort" : [
    {"@timestamp" : "desc"}
  ]
}'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just testing if my understanding was right and yes this impacts the desc sort results. If we want to restrict the doc count with visit(IntsRef ref) we should consider removing from the end of the batch for desc sorts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using desc, only a few BKD leaves are visited, whereas asc triggers a scan of all leaves in the BKD tree, leading to significant slowness. I believe we should leverage docCount to minimize excessive document accesses, which aligns with the original purpose of docCount. This approach ensures that both desc and asc access the same documents in the BKD.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can control the excessive document access, but what I was saying is this code change changes the search results for desc sorts. I have tested with big5 workload using the following query

curl -XPOST -H'Content-type: application/json' -kv localhost:9200/big5/_search -d '{
  "query": {
    "match_all": {}
  },
  "sort" : [
    {"@timestamp" : "desc"}
  ]
}' | jq '.'

The output of search hits is not the same with original code and with this change.

The intersectRight visits tree nodes from right to left and each leaf node still contains documents sorted in ascending order. To fix the search results change issue, when processing a batch via visit(IntsRef ref), we should ideally process from the end of the batch for descending order (if we really want to control the docs and not change the search results).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding @msfroh

Copy link
Contributor Author

@kkewwei kkewwei May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why it is not same with original code is that I haven't controlled the order of visiting BKD nodes yet. Yes, it should be as you described:

we should ideally process from the end of the batch for descending order (if we really want to control the docs and not change the search results).

If this approach seems acceptable to you, I will proceed with the fix accordingly.

…tePointRangeQuery`

Signed-off-by: kkewwei <[email protected]>
Signed-off-by: kkewwei <[email protected]>
@kkewwei kkewwei force-pushed the fix_not_count_inApproximatePointRangeQuery branch from 7400899 to 062fe64 Compare May 20, 2025 08:03
Copy link
Contributor

✅ Gradle check result for 062fe64: SUCCESS

Copy link

codecov bot commented May 20, 2025

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 72.58%. Comparing base (370cd8c) to head (062fe64).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
...search/approximate/ApproximatePointRangeQuery.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18337      +/-   ##
============================================
+ Coverage     72.51%   72.58%   +0.06%     
- Complexity    67335    67427      +92     
============================================
  Files          5488     5488              
  Lines        311069   311070       +1     
  Branches      45219    45219              
============================================
+ Hits         225569   225782     +213     
+ Misses        67069    66887     -182     
+ Partials      18431    18401      -30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kkewwei
Copy link
Contributor Author

kkewwei commented May 23, 2025

closing in favor of #18358

@kkewwei kkewwei closed this May 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Regression in Range and sort queries with Lucene 10.2.1
2 participants