Provide a way of dealing with rate limiting #1407

iocanel · 2025-04-04T07:08:35Z

I often hit rate limits when playing with LLMs. The most common case is implementing a rag ingesting a large volume of data.

The problem is that once the rate limit is hit, the rag fails and by extension the app fails.

@geoand suggested looking into smallrye-fault-tollerance. The issue is that the @RateLimit annotations are using for requests per timeout and there are no means to quantify each request.

@maxandersen suggested looking into the other features of smallrye-fault-tollerance for implementing a retry mechanism. This could work but something like this would require us to parse the exception and read when we should retry. Besides the amount of work that this requires its also LLM specific.

So, I think that most probably we need possibly a little bit of both.

The text was updated successfully, but these errors were encountered:

iocanel · 2025-04-04T07:09:18Z

@cescoffier I am wondering if we should create ... you know what 😉

cescoffier · 2025-04-04T07:13:12Z

Oh, yes!

geoand · 2025-04-04T07:14:55Z

👍🏽

iocanel mentioned this issue Apr 4, 2025

I would like to be able to get and display information about tokens per user. #1408

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide a way of dealing with rate limiting #1407

Provide a way of dealing with rate limiting #1407

iocanel commented Apr 4, 2025

iocanel commented Apr 4, 2025

Uh oh!

cescoffier commented Apr 4, 2025

Uh oh!

geoand commented Apr 4, 2025

Uh oh!

Provide a way of dealing with rate limiting #1407

Provide a way of dealing with rate limiting #1407

Comments

iocanel commented Apr 4, 2025

iocanel commented Apr 4, 2025

Uh oh!

cescoffier commented Apr 4, 2025

Uh oh!

geoand commented Apr 4, 2025

Uh oh!