You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When an outbound kad substream times out (10s), it is removed from the substream list, and a new outbound substream can be opened.
But on the inbound side, there appears to be no timeout, so the node only drops new inbound substreams when they are over its substream limit. (If all other substreams are waiting for the first message, or in another state, those substreams can't be re-used. So the new substream gets dropped.)
This causes thousands of "substream limit exceeded" warnings on the inbound side. It can also slow down syncing a lot, in some cases making it impossible.
This bug is self-triggering, because the dropped inbound substreams also time out on the outbound side.
Edit: this is not a duplicate of #3236, the cause is different, and it only happens under specific load conditions.
Expected behavior
Inbound substreams time out after approximately 10 seconds.
Ideally the inbound timeout is slightly shorter, because the timeout starts on the outbound side immediately, but only starts on the inbound side after the network transmission delay. If there is a long network delay for earlier substreams, but a short network delay for later substreams, this warning can still happen occasionally.
Actual behavior
Inbound substreams which have been timed out on the outbound side seem to hang around for much longer than 10s. Maybe they are only removed when a read fails on them? Or some other error happens?
Relevant log output
2025-04-08T06:24:27.293722Z WARN Consensus: libp2p_kad::handler: New inbound substream to peer exceeds inbound substream limit. No older substream waiting to be reused. Dropping new substream. peer=PeerId("12D3KooWN6kFp2Ev181UGq3BUDfk1jfjaNu6sDTqxCZUBpmp8kRQ")
Possible Solution
On the sending side, outbound substreams only count towards the limit until they timeout:
Uh oh!
There was an error while loading. Please reload this page.
Summary
When an outbound kad substream times out (10s), it is removed from the substream list, and a new outbound substream can be opened.
But on the inbound side, there appears to be no timeout, so the node only drops new inbound substreams when they are over its substream limit. (If all other substreams are waiting for the first message, or in another state, those substreams can't be re-used. So the new substream gets dropped.)
This causes thousands of "substream limit exceeded" warnings on the inbound side. It can also slow down syncing a lot, in some cases making it impossible.
This bug is self-triggering, because the dropped inbound substreams also time out on the outbound side.
Edit: this is not a duplicate of #3236, the cause is different, and it only happens under specific load conditions.
Expected behavior
Inbound substreams time out after approximately 10 seconds.
Ideally the inbound timeout is slightly shorter, because the timeout starts on the outbound side immediately, but only starts on the inbound side after the network transmission delay. If there is a long network delay for earlier substreams, but a short network delay for later substreams, this warning can still happen occasionally.
Actual behavior
Inbound substreams which have been timed out on the outbound side seem to hang around for much longer than 10s. Maybe they are only removed when a read fails on them? Or some other error happens?
Relevant log output
2025-04-08T06:24:27.293722Z WARN Consensus: libp2p_kad::handler: New inbound substream to peer exceeds inbound substream limit. No older substream waiting to be reused. Dropping new substream. peer=PeerId("12D3KooWN6kFp2Ev181UGq3BUDfk1jfjaNu6sDTqxCZUBpmp8kRQ")
Possible Solution
On the sending side, outbound substreams only count towards the limit until they timeout:
rust-libp2p/protocols/kad/src/handler.rs
Line 614 in b56b47a
rust-libp2p/protocols/kad/src/handler.rs
Line 819 in b56b47a
And the outbound timeout is 10 seconds:
rust-libp2p/protocols/kad/src/handler.rs
Line 476 in b56b47a
rust-libp2p/protocols/kad/src/handler.rs
Line 815 in b56b47a
rust-libp2p/protocols/kad/src/handler.rs
Line 938 in b56b47a
rust-libp2p/protocols/kad/src/handler.rs
Line 1013 in b56b47a
and can't be re-used if the sender times out on the first message:
rust-libp2p/protocols/kad/src/handler.rs
Line 542 in b56b47a
rust-libp2p/protocols/kad/src/handler.rs
Line 573 in b56b47a
There is no inbound timeout:
https://github.com/libp2p/rust-libp2p/blob/b56b47aa6510ab4af0ae797a7f036364d414ae3e/protocols/kad/src/handler.rs#L75C5-L75C23
Here is how other protocols implement matching inbound and outbound timeouts:
rust-libp2p/protocols/relay/src/behaviour/handler.rs
Line 384 in 1206fef
Version
Latest
main
back to at least 0.54.2Would you like to work on fixing this bug?
Yes
The text was updated successfully, but these errors were encountered: