Skip to content

Service Becomes Unhealthy Under Load After Upgrading from Quarkus 3.13 to 3.15+ #48052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
joao6verde6 opened this issue May 26, 2025 · 6 comments
Labels
area/health area/smallrye kind/bug Something isn't working triage/needs-reproducer We are waiting for a reproducer.

Comments

@joao6verde6
Copy link

Describe the bug

After upgrading Quarkus from 3.13 to 3.15 LTS, our application becomes unhealthy under load shortly after startup.

We operate a high-throughput backend service that performs dynamic SQL queries against an RDS Postgres database. Caching is not viable due to the high variability of requests.

The service starts fine and reports healthy, but as soon as requests start hitting the application (around 150 RPS, with query latency typically under 200ms), the service quickly becomes unresponsive and unhealthy.

This behavior was not present in 3.13, but starts in 3.15 and persists through 3.20 LTS. Our tests suggest the regression is introduced between 3.13 and 3.15.

JDBC config (Agroal):

{
  "QUARKUS_DATASOURCE_JDBC_ACQUISITION_TIMEOUT": "20S",
  "QUARKUS_DATASOURCE_JDBC_MIN_SIZE": "50",
  "QUARKUS_DATASOURCE_JDBC_MAX_SIZE": "90",
  "QUARKUS_DATASOURCE_JDBC_INITIAL_SIZE": "10"
}

Notable changes between 3.13 and 3.15
We suspect the issue may be linked to internal changes in these versions:

  • Upgrade to Hibernate ORM 6.6
  • Agroal upgraded from 2.4 to 2.5
  • Possibly other relevant updates to connection pool behavior or thread/concurrency models

We would appreciate guidance on whether these changes could impact connection handling or startup load performance.

Expected behavior

The application should remain healthy and performant under load (e.g., 200 RPS), as it does on Quarkus 3.13.

Actual behavior

On Quarkus 3.15+:

  • The service becomes unhealthy shortly after startup under moderate-to-high load.
  • Observed a drastic drop in successful and completed requests during load testing compared to 3.13.
  • It appears the connection pool may not be handling demand properly or is being exhausted/blocked.

How to Reproduce?

The issue can be observed by:

  • Running a Quarkus service with dynamic SQL queries using the Agroal config above.
  • Load testing with ~150 RPS immediately after startup.
  • Monitoring app health and connection pool behavior.

Load test comparison:

Quarkus 3.13
VUs: 25–40
Duration: ~1m30s
Successful iterations: 4330, 6244, 5229
Application remains healthy

Quarkus 3.15
VUs: 25–40
Duration: ~1m30s
Successful iterations: 44, 50, 224
Service becomes unhealthy

Output of uname -a or ver

No response

Output of java -version

OpenJDK Runtime Environment Corretto-21.0.6.7.1 (build 21.0.6+7-LTS

Quarkus version or git rev

3.15

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.9.9

Additional information

No response

@joao6verde6 joao6verde6 added the kind/bug Something isn't working label May 26, 2025
Copy link

quarkus-bot bot commented May 26, 2025

/cc @jmartisk (health), @xstefank (health)

@geoand
Copy link
Contributor

geoand commented May 30, 2025

Although I understand it's a tall order, I don't see a way for us to be able to get to the root of the problem unless we have a way to easily reproduce the issue.

@geoand geoand added the triage/needs-reproducer We are waiting for a reproducer. label May 30, 2025
@gsmet
Copy link
Member

gsmet commented May 30, 2025

It would also help a lot if you could try to pinpoint the exact version in which things are starting to go wrong.

@joao6verde6
Copy link
Author

Versions where the problem started:

  • Quarkus 3.13 → Service behaves well under high load
  • Quarkus 3.14 → This is the version where the issue starts

To clarify: no changes were made to our business logic, configurations, or database schemas between these versions. We only bumped the Quarkus BOM parent in our pom.xml.

This particular service:

  • Relies heavily on dynamic SQL reads using Hibernate ORM
  • Uses Amazon RDS (PostgreSQL engine) for data storage
  • Also integrates with Amazon S3, though that's not part of the critical path
  • Does not use caching, as most queries are unique

Due to internal constraints and the sensitivity of our infrastructure, we unfortunately cannot provide a public reproducer.

This issue is currently blocking us from upgrading this service to the latest Quarkus version. We’ve already migrated around 40 other services to 3.20 without issue, but this one behaves differently due to its heavier reliance on Hibernate and high-load DB reads.

At this point, we’re trying to identify whether the root cause is more likely related to:

  • The Hibernate ORM upgrade
  • A change in Vert.x concurrency/reactive behavior
  • The Agroal connection pool update (2.4 → 2.5)
  • Or something else introduced in Quarkus 3.14

Here you can find the list of quarkus dependencies using in this app:
[INFO] | +- io.quarkiverse.amazonservices:quarkus-amazon-s3:jar:2.16.2:compile
[INFO] | | +- io.quarkiverse.amazonservices:quarkus-amazon-common:jar:2.16.2:compile
[INFO] +- io.quarkus:quarkus-flyway:jar:3.14.0:compile
[INFO] | +- io.quarkus:quarkus-core:jar:3.14.0:compile
[INFO] | | +- io.quarkus:quarkus-ide-launcher:jar:3.14.0:compile
[INFO] | | +- io.quarkus:quarkus-development-mode-spi:jar:3.14.0:compile
[INFO] | | +- io.quarkus:quarkus-bootstrap-runner:jar:3.14.0:compile
[INFO] | | - io.quarkus:quarkus-fs-util:jar:0.0.10:compile
[INFO] | +- io.quarkus:quarkus-agroal:jar:3.14.0:compile
[INFO] | | +- io.quarkus:quarkus-datasource:jar:3.14.0:compile
[INFO] | | - io.quarkus:quarkus-credentials:jar:3.14.0:compile
[INFO] | +- io.quarkus:quarkus-narayana-jta:jar:3.14.0:compile
[INFO] | | +- io.quarkus:quarkus-transaction-annotations:jar:3.14.0:compile
[INFO] | - io.quarkus:quarkus-datasource-common:jar:3.14.0:compile
[INFO] +- io.quarkus:quarkus-hibernate-orm:jar:3.14.0:compile
[INFO] | +- org.hibernate:quarkus-local-cache:jar:0.3.0:compile
[INFO] | - io.quarkus:quarkus-caffeine:jar:3.14.0:compile
[INFO] +- io.quarkus:quarkus-hibernate-orm-panache:jar:3.14.0:compile
[INFO] | +- io.quarkus:quarkus-hibernate-orm-panache-common:jar:3.14.0:compile
[INFO] | | - io.quarkus:quarkus-panache-hibernate-common:jar:3.14.0:compile
[INFO] | - io.quarkus:quarkus-panache-common:jar:3.14.0:compile
[INFO] +- io.quarkus:quarkus-jdbc-postgresql:jar:3.14.0:compile
[INFO] +- io.quarkiverse.loggingjson:quarkus-logging-json:jar:3.1.0:compile
[INFO] +- io.quarkus:quarkus-arc:jar:3.14.0:compile
[INFO] | +- io.quarkus.arc:arc:jar:3.14.0:compile
[INFO] +- io.quarkus:quarkus-hibernate-validator:jar:3.14.0:compile
[INFO] +- io.quarkus:quarkus-resteasy:jar:3.14.0:compile
[INFO] | +- io.quarkus:quarkus-vertx-http:jar:3.14.0:compile
[INFO] | | +- io.quarkus.security:quarkus-security:jar:2.1.0:compile
[INFO] | +- io.quarkus.vertx.utils:quarkus-vertx-utils:jar:3.14.0:compile
[INFO] | - io.quarkus:quarkus-resteasy-server-common:jar:3.14.0:compile
[INFO] | +- io.quarkus:quarkus-resteasy-common:jar:3.14.0:compile
[INFO] +- io.quarkus:quarkus-resteasy-jackson:jar:3.14.0:compile
[INFO] | +- io.quarkus:quarkus-jackson:jar:3.14.0:compile
[INFO] +- io.quarkus:quarkus-resteasy-multipart:jar:3.14.0:compile
[INFO] +- io.quarkus:quarkus-smallrye-openapi:jar:3.14.0:compile
[INFO] | - io.quarkus:quarkus-swagger-ui:jar:3.14.0:compile
[INFO] +- io.quarkus:quarkus-smallrye-fault-tolerance:jar:3.14.0:compile
[INFO] | +- io.quarkus:quarkus-mutiny:jar:3.14.0:compile
[INFO] | +- io.quarkus:quarkus-smallrye-context-propagation:jar:3.14.0:compile
[INFO] +- io.quarkus:quarkus-opentelemetry:jar:3.14.0:compile
[INFO] | +- io.quarkus:quarkus-tls-registry:jar:3.14.0:compile
[INFO] | +- io.quarkus:quarkus-vertx:jar:3.14.0:compile
[INFO] | | +- io.quarkus:quarkus-netty:jar:3.14.0:compile
[INFO] | | +- io.quarkus:quarkus-virtual-threads:jar:3.14.0:compile
[INFO] | | +- io.quarkus:quarkus-vertx-latebound-mdc-provider:jar:3.14.0:compile
[INFO] | +- io.quarkus:quarkus-security-runtime-spi:jar:3.14.0:compile
[INFO] | +- io.quarkus:quarkus-grpc-common:jar:3.14.0:compile
[INFO] +- io.quarkiverse.micrometer.registry:quarkus-micrometer-registry-datadog:jar:3.2.4:compile
[INFO] | +- io.quarkus:quarkus-micrometer:jar:3.14.0:compile
[INFO] +- io.quarkus:quarkus-smallrye-health:jar:3.14.0:compile
[INFO] | - io.quarkus:quarkus-jsonp:jar:3.14.0:compile
[INFO] +- io.quarkiverse.unleash:quarkus-unleash:jar:1.10.0:compile
[INFO] +- io.quarkus:quarkus-jacoco:jar:3.14.0:test
[INFO] +- io.quarkus:quarkus-junit5:jar:3.14.0:test
[INFO] | +- io.quarkus:quarkus-bootstrap-core:jar:3.14.0:test
[INFO] | | +- io.quarkus:quarkus-classloader-commons:jar:3.14.0:compile
[INFO] | | +- io.quarkus:quarkus-bootstrap-app-model:jar:3.14.0:test
[INFO] | +- io.quarkus:quarkus-test-common:jar:3.14.0:test
[INFO] | | +- io.quarkus:quarkus-core-deployment:jar:3.14.0:test
[INFO] | | | +- io.quarkus.gizmo:gizmo:jar:1.8.0:test
[INFO] | | | +- io.quarkus:quarkus-hibernate-validator-spi:jar:3.14.0:test
[INFO] | | | +- io.quarkus:quarkus-class-change-agent:jar:3.14.0:test
[INFO] | | | +- io.quarkus:quarkus-devtools-utilities:jar:3.14.0:test
[INFO] | | | +- io.quarkus:quarkus-builder:jar:3.14.0:test
[INFO] | | +- io.quarkus:quarkus-bootstrap-maven-resolver:jar:3.14.0:test
[INFO] | | - io.quarkus:quarkus-bootstrap-gradle-resolver:jar:3.14.0:test
[INFO] | +- io.quarkus:quarkus-junit5-properties:jar:3.14.0:test
[INFO] +- io.quarkus:quarkus-junit5-mockito:jar:3.14.0:test
[INFO] | +- io.quarkus:quarkus-junit5-mockito-config:jar:3.14.0:test
[INFO] | +- io.quarkus:quarkus-arc-deployment:jar:3.14.0:test
[INFO] | | +- io.quarkus:quarkus-smallrye-context-propagation-spi:jar:3.14.0:test
[INFO] | | +- io.quarkus:quarkus-vertx-http-dev-ui-spi:jar:3.14.0:test
[INFO] | | +- io.quarkus.arc:arc-processor:jar:3.14.0:test
[INFO] | | - io.quarkus:quarkus-arc-test-supplement:jar:3.14.0:test

@geoand
Copy link
Contributor

geoand commented May 30, 2025

Thanks for the additional input, but as you can guess, it's impossible to make any kind of educated guess about what the cause could using only this information

@gsmet
Copy link
Member

gsmet commented May 30, 2025

Yeah, it will require some detective work on your side.

First try to find out the micro that is problematic: try with 3.14.0.CR1, then 3.14.0, 3.14.1... That will narrow things down a bit.

Then when the service starts going berserk, it might be good to get a thread dump (using kill -3 <pid>) and have a look at what's going on.

I don't think it will give us the answer but it might help narrowing down what's wrong and could lead to you being able to assemble a reproducer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/health area/smallrye kind/bug Something isn't working triage/needs-reproducer We are waiting for a reproducer.
Projects
None yet
Development

No branches or pull requests

3 participants