Skip to content

Provide a way of dealing with rate limiting #1407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
iocanel opened this issue Apr 4, 2025 · 3 comments
Open

Provide a way of dealing with rate limiting #1407

iocanel opened this issue Apr 4, 2025 · 3 comments

Comments

@iocanel
Copy link
Collaborator

iocanel commented Apr 4, 2025

I often hit rate limits when playing with LLMs. The most common case is implementing a rag ingesting a large volume of data.

The problem is that once the rate limit is hit, the rag fails and by extension the app fails.

@geoand suggested looking into smallrye-fault-tollerance. The issue is that the @RateLimit annotations are using for requests per timeout and there are no means to quantify each request.

@maxandersen suggested looking into the other features of smallrye-fault-tollerance for implementing a retry mechanism. This could work but something like this would require us to parse the exception and read when we should retry. Besides the amount of work that this requires its also LLM specific.

So, I think that most probably we need possibly a little bit of both.

@iocanel
Copy link
Collaborator Author

iocanel commented Apr 4, 2025

@cescoffier I am wondering if we should create ... you know what 😉

@cescoffier
Copy link
Collaborator

Oh, yes!

@geoand
Copy link
Collaborator

geoand commented Apr 4, 2025

👍🏽

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants