Skip to content

[BUG] Regression in Range and sort queries with Lucene 10.2.1 #18313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
harshavamsi opened this issue May 15, 2025 · 4 comments
Open

[BUG] Regression in Range and sort queries with Lucene 10.2.1 #18313

harshavamsi opened this issue May 15, 2025 · 4 comments
Assignees
Labels
bug Something isn't working Search:Performance v3.1.0

Comments

@harshavamsi
Copy link
Contributor

Describe the bug

From #17961 (comment), it is evident that range and sort queries are seeing some heavy regression. I suspect it could be because of the bulk scorer changes that lucene is now using. We should try and fix them in the upgrade PR and then run benchmarks to ensure that we are not regressing.

Related component

Search:Performance

To Reproduce

Please look at the lucene 10.2.1 upgrade PR for more info around benchmark numbers.

Expected behavior

No regression expected

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@harshavamsi
Copy link
Contributor Author

@prudhvigodithi can you help investigate? This will be a blocker for 3.1 release. Given that we are seeing general regression from the approximate framework in http_logs as well, we should expand and add debug logging to the approximation to collect metrics around how we are optimizing the counts. I will open a separate issue for that.

cc: @getsaurabh02

@prudhvigodithi
Copy link
Member

prudhvigodithi commented May 16, 2025

Yes noticed a regression for 3.0.0 (seeing same for 3.1.0) with http_logs dataset with asc_sort queries using Approximation. The non approximation path is faster.

{
  "name": "sort_size_asc",
  "operation-type": "search",
  "index": "logs-241998",
  "body": {
    "track_total_hits": false,
    "query": {
      "match_all": {}
    },
    "sort": [
      {
        "size": "asc"
      }
    ]
  }
},

With Approximation on http_logs

|                                    Heap used for stored fields |               |           0 |     MB |
|                                                  Segment count |               |          35 |        |
|                                                 Min Throughput | sort_size_asc |         0.5 |  ops/s |
|                                                Mean Throughput | sort_size_asc |         0.5 |  ops/s |
|                                              Median Throughput | sort_size_asc |         0.5 |  ops/s |
|                                                 Max Throughput | sort_size_asc |         0.5 |  ops/s |
|                                        50th percentile latency | sort_size_asc |     31.8261 |     ms |
|                                        90th percentile latency | sort_size_asc |     34.9839 |     ms |
|                                        99th percentile latency | sort_size_asc |     65.7444 |     ms |
|                                       100th percentile latency | sort_size_asc |     74.1878 |     ms |
|                                   50th percentile service time | sort_size_asc |     29.1671 |     ms |
|                                   90th percentile service time | sort_size_asc |      32.277 |     ms |
|                                   99th percentile service time | sort_size_asc |     64.4171 |     ms |
|                                  100th percentile service time | sort_size_asc |     70.8772 |     ms |
|                                                     error rate | sort_size_asc |           0 |      % |

without Approximation on http_logs

|                                    Heap used for stored fields |               |           0 |     MB |
|                                                  Segment count |               |          35 |        |
|                                                 Min Throughput | sort_size_asc |         0.5 |  ops/s |
|                                                Mean Throughput | sort_size_asc |         0.5 |  ops/s |
|                                              Median Throughput | sort_size_asc |         0.5 |  ops/s |
|                                                 Max Throughput | sort_size_asc |         0.5 |  ops/s |
|                                        50th percentile latency | sort_size_asc |     7.37754 |     ms |
|                                        90th percentile latency | sort_size_asc |     8.18209 |     ms |
|                                        99th percentile latency | sort_size_asc |     9.12302 |     ms |
|                                       100th percentile latency | sort_size_asc |     9.26709 |     ms |
|                                   50th percentile service time | sort_size_asc |     4.50927 |     ms |
|                                   90th percentile service time | sort_size_asc |     5.46192 |     ms |
|                                   99th percentile service time | sort_size_asc |      6.3236 |     ms |
|                                  100th percentile service time | sort_size_asc |     6.53081 |     ms |
|                                                     error rate | sort_size_asc |           0 |      % |

The nightly benchmark dashboard also shows the regression for asc_sort_timestamp .

We should 1st prioritize identifying the cause of regression with ApproximatePointRangeQuery for ascending sorts, then yes should fix the regression with Lucene 10.1.0.

@kkewwei
Copy link
Collaborator

kkewwei commented May 19, 2025

@prudhvigodithi it seems exist a bug in ApproximatePointRangeQuery. When we use ApproximatePointRangeQuery in the match_all, we don't increase the docCount[0], so we will visit all the docs in the BKD tree. cc @harshavamsi

It should be like this:

                    @Override
                    public void visit(IntsRef ref) {
                        for (int i = 0; i < Math.min(ref.length, size - docCount[0]); i++) {
                            adder.add(ref.ints[ref.offset + i]);
                        }
                        docCount[0] += Math.min(ref.length, Math.max(0, size - docCount[0]));
                    }

@prudhvigodithi
Copy link
Member

Following are the queries in big5 that I see which go through the approximation today

  • asc_sort_timestamp
  • desc_sort_timestamp
  • range
  • range-numeric
  • sort_numeric_desc
  • sort_numeric_asc

Seeing regression with:

  • asc_sort_timestamp
  • range
  • sort_numeric_asc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Performance v3.1.0
Projects
Status: Todo
Status: 🆕 New
4 participants