[Question]: Running the agent, first conversation returns error message and the next conversation directly return #7897

chenlanqing · 2025-05-28T01:12:35Z

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (Language Policy).
Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
Please do not modify this template :) and fill in all the required fields.

Describe your problem

When running the agent, during the first conversation, if the "invoke" component call times out, it returns the message:
ERROR: HTTPConnectionPool(host='172.31.7.174', port=8080): Read timed out. (read timeout=5)

If you continue the conversation, the second response directly returns:
{'content': {'0': "HTTPConnectionPool(host='172.31.7.174', port=8080): Read timed out. (read timeout=5)"}

And in the third round of conversation, it again returns:
ERROR: HTTPConnectionPool(host='172.31.7.174', port=8080): Read timed out. (read timeout=5)

We tried increasing the invoke timeout to 60 seconds since we're unsure how long the API might take, but this issue can still occur.
Do you have any suggestions on how to fix this?

Here's what the three conversation steps look like in the image

The text was updated successfully, but these errors were encountered:

dosubot · 2025-05-28T01:24:46Z

The timeout errors you’re seeing with the "invoke" component indicate that the remote server at 172.31.7.174:8080 is not responding within the specified timeout window. Increasing the timeout to 60 seconds is a good first step, but if the server is overloaded, unresponsive, or there are network issues, the problem can persist.

The "invoke" component is designed to make HTTP requests to remote endpoints and has a configurable timeout parameter (default 60 seconds) for each call. However, it does not currently implement a retry mechanism for failed or timed-out requests, so a single failure will immediately propagate the error to the user in subsequent conversation rounds, as you observed. The error message is simply passed along as the response content in later rounds, which is why you see the same timeout error repeatedly [source].

Other parts of RAGFlow, such as the LLM chat model, implement retry logic with exponential backoff for transient errors (including timeouts and server errors). This approach retries failed requests a set number of times, waiting longer between each attempt, which can help mitigate temporary network or server issues [source].

To improve resilience against timeouts in the "invoke" component, consider implementing a similar retry mechanism. You can refer to the retry logic in the LLM chat model as a template. Alternatively, if the remote service supports asynchronous processing, you could adapt the call pattern to poll for results or use a task queue with retries, as seen in the TaskService class [source].

In summary:

The timeout is likely due to slow or unresponsive remote services.
Increasing the timeout helps only if the server eventually responds.
Implementing a retry mechanism with exponential backoff in the "invoke" component can help handle transient failures more gracefully.
If possible, investigate the health and performance of the remote service at 172.31.7.174:8080 to ensure it can handle requests reliably.

There are no existing issues or discussions in the repository that address this specific problem. If you need guidance on how to implement retry logic similar to the LLM chat model, let me know!

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

chenlanqing · 2025-05-28T02:06:34Z

What other options do we have here? And are you planning to build in retries for the invoke component down the line?
@dosu

dosubot · 2025-05-28T02:11:14Z

Currently, there are no documented plans or existing discussions in the RAGFlow repository about adding retries to the "invoke" component. The component only supports a configurable timeout, and if the remote service does not respond within that period, it returns an error without retrying. This is different from other parts of RAGFlow, like the LLM chat model, which already implement retry logic with exponential backoff for transient errors.

Beyond implementing retries, your options are to investigate and optimize the remote service at 172.31.7.174:8080 for better responsiveness, check for network stability between your agent and the remote endpoint, or consider redesigning the integration to use asynchronous processing or a task queue if the remote service supports it. These approaches can help reduce the likelihood or impact of timeouts, but ultimately, without retries in the "invoke" component, transient failures will continue to propagate as errors.

If you need guidance on how to implement retry logic similar to the LLM chat model, let me know. At this time, there is no indication that retries for the "invoke" component are planned or in progress in the official project sources.

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

chenlanqing · 2025-05-28T03:17:13Z

@dosu
I saw what you mentioned earlier: "so a single failure will immediately propagate the error to the user in subsequent conversation rounds, as you observed."
Is there a way to handle this so that errors from components don’t get passed back to users in follow-up conversations?

dosubot · 2025-05-28T03:22:13Z

There isn’t currently a built-in mechanism in RAGFlow to prevent errors from components like "invoke" from being passed back to users in follow-up conversations. When a component fails (for example, due to a timeout), the error is returned as the response and stored in the conversation history. This means that unless the error is handled or filtered out, it will continue to appear in subsequent conversation rounds, as you’ve observed.

To avoid this, you can adapt error handling patterns used elsewhere in RAGFlow. For example, components like email.py and baidufanyi.py use try-except blocks to catch errors and return custom error messages or fallback responses instead of raw exceptions. The "invoke" component currently does not implement such error handling in its _run method—errors simply propagate up the stack and become part of the conversation state [source].

To prevent errors from being passed to users in follow-up conversations, you could:

Add try-except blocks in the "invoke" component to catch exceptions and return a controlled, user-friendly message or a default value instead of the raw error.
In the conversation management logic (such as in the completion or structure_answer functions), filter out or mask error responses before updating the conversation history, so that failed outputs do not persist into future rounds [source].
Optionally, implement logic to retry failed requests or to only update the conversation history with successful responses.

For inspiration, see how error handling is done in other components like email.py (which returns custom messages for different error types) or baidufanyi.py (which checks for error codes and handles exceptions gracefully).

If you need a concrete code example or guidance on where to add these changes, let me know!

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

chenlanqing added the 🙋‍♀️ question Further information is requested label May 28, 2025

dosubot bot added the 🐞 bug Something isn't working, pull request that fix bug. label May 28, 2025

chenlanqing closed this as completed May 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question]: Running the agent, first conversation returns error message and the next conversation directly return #7897

[Question]: Running the agent, first conversation returns error message and the next conversation directly return #7897

chenlanqing commented May 28, 2025

dosubot bot commented May 28, 2025

Uh oh!

chenlanqing commented May 28, 2025

Uh oh!

dosubot bot commented May 28, 2025

Uh oh!

chenlanqing commented May 28, 2025

Uh oh!

dosubot bot commented May 28, 2025

Uh oh!

[Question]: Running the agent, first conversation returns error message and the next conversation directly return #7897

[Question]: Running the agent, first conversation returns error message and the next conversation directly return #7897

Comments

chenlanqing commented May 28, 2025

Self Checks

Describe your problem

dosubot bot commented May 28, 2025

Uh oh!

chenlanqing commented May 28, 2025

Uh oh!

dosubot bot commented May 28, 2025

Uh oh!

chenlanqing commented May 28, 2025

Uh oh!

dosubot bot commented May 28, 2025

Uh oh!