You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I often hit rate limits when playing with LLMs. The most common case is implementing a rag ingesting a large volume of data.
The problem is that once the rate limit is hit, the rag fails and by extension the app fails.
@geoand suggested looking into smallrye-fault-tollerance. The issue is that the @RateLimit annotations are using for requests per timeout and there are no means to quantify each request.
@maxandersen suggested looking into the other features of smallrye-fault-tollerance for implementing a retry mechanism. This could work but something like this would require us to parse the exception and read when we should retry. Besides the amount of work that this requires its also LLM specific.
So, I think that most probably we need possibly a little bit of both.
The text was updated successfully, but these errors were encountered:
I often hit rate limits when playing with LLMs. The most common case is implementing a rag ingesting a large volume of data.
The problem is that once the rate limit is hit, the rag fails and by extension the app fails.
@geoand suggested looking into
smallrye-fault-tollerance
. The issue is that the@RateLimit
annotations are using for requests per timeout and there are no means to quantify each request.@maxandersen suggested looking into the other features of
smallrye-fault-tollerance
for implementing a retry mechanism. This could work but something like this would require us to parse the exception and read when we should retry. Besides the amount of work that this requires its also LLM specific.So, I think that most probably we need possibly a little bit of both.
The text was updated successfully, but these errors were encountered: