[BUG] Regression in Range and sort queries with Lucene 10.2.1 #18313

harshavamsi · 2025-05-15T23:54:26Z

Describe the bug

From #17961 (comment), it is evident that range and sort queries are seeing some heavy regression. I suspect it could be because of the bulk scorer changes that lucene is now using. We should try and fix them in the upgrade PR and then run benchmarks to ensure that we are not regressing.

Related component

Search:Performance

To Reproduce

Please look at the lucene 10.2.1 upgrade PR for more info around benchmark numbers.

Expected behavior

No regression expected

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

OS: [e.g. iOS]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

harshavamsi · 2025-05-16T19:16:43Z

@prudhvigodithi can you help investigate? This will be a blocker for 3.1 release. Given that we are seeing general regression from the approximate framework in http_logs as well, we should expand and add debug logging to the approximation to collect metrics around how we are optimizing the counts. I will open a separate issue for that.

cc: @getsaurabh02

prudhvigodithi · 2025-05-16T20:38:39Z

Yes noticed a regression for 3.0.0 (seeing same for 3.1.0) with http_logs dataset with asc_sort queries using Approximation. The non approximation path is faster.

{
  "name": "sort_size_asc",
  "operation-type": "search",
  "index": "logs-241998",
  "body": {
    "track_total_hits": false,
    "query": {
      "match_all": {}
    },
    "sort": [
      {
        "size": "asc"
      }
    ]
  }
},

With Approximation on http_logs

|                                    Heap used for stored fields |               |           0 |     MB |
|                                                  Segment count |               |          35 |        |
|                                                 Min Throughput | sort_size_asc |         0.5 |  ops/s |
|                                                Mean Throughput | sort_size_asc |         0.5 |  ops/s |
|                                              Median Throughput | sort_size_asc |         0.5 |  ops/s |
|                                                 Max Throughput | sort_size_asc |         0.5 |  ops/s |
|                                        50th percentile latency | sort_size_asc |     31.8261 |     ms |
|                                        90th percentile latency | sort_size_asc |     34.9839 |     ms |
|                                        99th percentile latency | sort_size_asc |     65.7444 |     ms |
|                                       100th percentile latency | sort_size_asc |     74.1878 |     ms |
|                                   50th percentile service time | sort_size_asc |     29.1671 |     ms |
|                                   90th percentile service time | sort_size_asc |      32.277 |     ms |
|                                   99th percentile service time | sort_size_asc |     64.4171 |     ms |
|                                  100th percentile service time | sort_size_asc |     70.8772 |     ms |
|                                                     error rate | sort_size_asc |           0 |      % |

without Approximation on http_logs

|                                    Heap used for stored fields |               |           0 |     MB |
|                                                  Segment count |               |          35 |        |
|                                                 Min Throughput | sort_size_asc |         0.5 |  ops/s |
|                                                Mean Throughput | sort_size_asc |         0.5 |  ops/s |
|                                              Median Throughput | sort_size_asc |         0.5 |  ops/s |
|                                                 Max Throughput | sort_size_asc |         0.5 |  ops/s |
|                                        50th percentile latency | sort_size_asc |     7.37754 |     ms |
|                                        90th percentile latency | sort_size_asc |     8.18209 |     ms |
|                                        99th percentile latency | sort_size_asc |     9.12302 |     ms |
|                                       100th percentile latency | sort_size_asc |     9.26709 |     ms |
|                                   50th percentile service time | sort_size_asc |     4.50927 |     ms |
|                                   90th percentile service time | sort_size_asc |     5.46192 |     ms |
|                                   99th percentile service time | sort_size_asc |      6.3236 |     ms |
|                                  100th percentile service time | sort_size_asc |     6.53081 |     ms |
|                                                     error rate | sort_size_asc |           0 |      % |

The nightly benchmark dashboard also shows the regression for asc_sort_timestamp .

We should 1st prioritize identifying the cause of regression with ApproximatePointRangeQuery for ascending sorts, then yes should fix the regression with Lucene 10.1.0.

kkewwei · 2025-05-19T14:36:57Z

@prudhvigodithi it seems exist a bug in ApproximatePointRangeQuery. When we use ApproximatePointRangeQuery in the match_all, we don't increase the docCount[0], so we will visit all the docs in the BKD tree. cc @harshavamsi

OpenSearch/server/src/main/java/org/opensearch/search/approximate/ApproximatePointRangeQuery.java

Line 179 in 93d5356

public void visit(IntsRef ref) {

It should be like this:

                    @Override
                    public void visit(IntsRef ref) {
                        for (int i = 0; i < Math.min(ref.length, size - docCount[0]); i++) {
                            adder.add(ref.ints[ref.offset + i]);
                        }
                        docCount[0] += Math.min(ref.length, Math.max(0, size - docCount[0]));
                    }

prudhvigodithi · 2025-05-20T17:30:20Z

Following are the queries in big5 that I see which go through the approximation today

asc_sort_timestamp
desc_sort_timestamp
range
range-numeric
sort_numeric_desc
sort_numeric_asc

Seeing regression with:

asc_sort_timestamp
range
sort_numeric_asc

harshavamsi added bug Something isn't working untriaged labels May 15, 2025

github-actions bot added the Search:Performance label May 15, 2025

github-project-automation bot added this to Search Project Board May 15, 2025

github-project-automation bot moved this to 🆕 New in Search Project Board May 15, 2025

kkewwei mentioned this issue May 19, 2025

Fix docCount does not increase when visiting all Docs in `ApproximatePointRangeQuery #18337

Closed

3 tasks

This was referenced May 19, 2025

[Feature Request] Improve ApproximatePointRangeQuery Traversal for Skewed Datasets with DFS Strategy #18341

Open

Add http_logs search only test procedure #18343

Merged

sandeshkr419 removed the untriaged label May 21, 2025

sandeshkr419 assigned prudhvigodithi May 21, 2025

This was referenced May 21, 2025

Update big5 (id_4) to test against Lucene 10.2.1 #18353

Merged

Fix performance regression in ApproximatePointRangeQuery with Lucene 10.2.1 change #18358

Open

prudhvigodithi added this to Performance Roadmap May 22, 2025

github-project-automation bot moved this to Todo in Performance Roadmap May 22, 2025

prudhvigodithi added the v3.1.0 label May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Regression in Range and sort queries with Lucene 10.2.1 #18313

[BUG] Regression in Range and sort queries with Lucene 10.2.1 #18313

harshavamsi commented May 15, 2025

harshavamsi commented May 16, 2025

Uh oh!

prudhvigodithi commented May 16, 2025 •

edited

Loading

Uh oh!

kkewwei commented May 19, 2025 •

edited

Loading

Uh oh!

prudhvigodithi commented May 20, 2025

Uh oh!

[BUG] Regression in Range and sort queries with Lucene 10.2.1 #18313

[BUG] Regression in Range and sort queries with Lucene 10.2.1 #18313

Comments

harshavamsi commented May 15, 2025

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details

harshavamsi commented May 16, 2025

Uh oh!

prudhvigodithi commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkewwei commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prudhvigodithi commented May 20, 2025

Uh oh!

prudhvigodithi commented May 16, 2025 •

edited

Loading

kkewwei commented May 19, 2025 •

edited

Loading