You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're experiencing intermittent issues with gossip data propagation in our libp2p network using Rust-libp2p 0.54. The problem occurs on both local development machines and test servers, where nodes sometimes fail to receive gossiped messages despite appearing to be connected.
Symptoms:
Nodes sometimes fail to receive gossiped data
Bootstrap connections occasionally fail with HandshakeTimedOut errors
Gossipsub mesh reports needing more peers even when nodes are connected
Kademlia bootstrap queries complete but don't always result in stable connections
Configuration:
// Network setupletmut config = libp2p::quic::Config::new(&keypair.unwrap().clone());
config.max_idle_timeout = 10*1000;// 10 seconds
config.keep_alive_interval = Duration::from_secs(5);// Gossipsub configlet gossipsub_config = gossipsub::ConfigBuilder::default().heartbeat_interval(Duration::from_secs(HEARTBEAT_INTERVAL))// 5 seconds.validation_mode(gossipsub::ValidationMode::Strict).duplicate_cache_time(Duration::from_secs(DUPLICATE_CACHE_DURATION))// 10 seconds.max_transmit_size(1_000_000).message_id_fn(message_id_fn).max_messages_per_rpc(Some(MAX_MESSAGES_PER_RPC))// 100.mesh_n_low(4).mesh_n_high(10).mesh_n(8).build()?;//Swarm setup #[tracing::instrument(skip(keypair))]pubasyncfnsetup_swarm_network(keypair:Option<Keypair>,bootstrap_addresses:Option<Vec<(PeerId,Multiaddr)>>,port:String,) -> Result<Swarm<SwarmBehaviour>,Box<dynError>>{// Set up the SwarmBuilder based on whether a keypair is provided or not.let builder = ifletSome(keypair) = keypair.clone(){// Use the provided keypair for the swarm identity.SwarmBuilder::with_existing_identity(keypair)}else{// Generate a new identity if no keypair is provided.SwarmBuilder::with_new_identity()};letmut config = libp2p::quic::Config::new(&keypair.unwrap().clone());// config.max_idle_timeout = 300;
config.max_idle_timeout = 10*1000;//config.keep_alive_interval = Duration::from_millis(100);
config.keep_alive_interval=Duration::from_secs(5);// Build the libp2p swarm with a specific transport (TCP and QUIC), and relay client.letmut swarm = builder
.with_tokio()// Use Tokio for asynchronous execution..with_quic_config(|_| config).with_behaviour(|keypair| {// If no bootstrap addresses are provided, print the peer ID for informational purposes.if bootstrap_addresses.is_none(){info!("Bootstrap Peer ID :{}", keypair.public().to_peer_id());}// Initialize the custom MyBehaviour which includes Gossipsub and Kademlia behaviors.SwarmBehaviour::new(keypair.clone()).unwrap()})?
.with_swarm_config(|c| {// Configure idle connection timeout.
c.with_idle_connection_timeout(Duration::from_secs(60))}).build();// If bootstrap nodes are provided, add them to the Kademlia behavior.ifletSome(ref bootstrap_addresses) = bootstrap_addresses {for(peer_id, multi_addr)in bootstrap_addresses {// Add each bootstrap node's address to the Kademlia DHT.
swarm
.behaviour_mut().kademlia.add_address(peer_id, multi_addr.clone());
swarm.dial(multi_addr.clone())?;// Trigger the Kademlia bootstrap process to find more peers.}
swarm.behaviour_mut().kademlia.bootstrap()?;}// Subscribe to the primary Gossipsub topic for network-wide communication.
swarm
.behaviour_mut().gossipsub.subscribe(&IdentTopic::new(NETWORK_TOPIC))?;// Define the address to listen on for incoming connections (QUIC over UDP).let listen_address = format!("/ip4/0.0.0.0/udp/{}/quic-v1", port);
swarm.listen_on(listen_address.parse()?)?;// Return the initialized swarm.Ok(swarm)}
Logs:
From Node 1 (working):
[TRACE] Sending message to peer 16Uiu2HAmR6ogo4eHfXuz28HNS2XJUGcB1R9Wf4UzHh7go18LQX3v
[TRACE] Sending message to peer 16Uiu2HAmGjjk8mDH5F1Y3FVW68tenMNWMNkTcZXceLMnXUEJDoSx
[TRACE] Sending message to peer 16Uiu2HAmMwshLKvkHnMsgJ5MPxcLeVkkSxRK8Rm6cFRaCCTkhhEd
@jxs Yes currently we have disabled quic and enabled TCP. Also our network is very small . like 4 nodes
Also we spotted this
May 29 07:53:04 guardian-testnet-2 sh[25552]: {"v":0,"name":"DUCAT_NODE","msg":"[SWARM::POLL - EVENT] Request to peer in query failed with Io(Custom { kind: ConnectionRefused, error: \"protocol not supported\" })","level":20,"hostname":"guardian-testnet-2","pid":25552,"time":"2025-05-29T07:53:04.078781109Z","target":"libp2p_kad::behaviour","line":2358,"file":"/home/admin/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libp2p-kad-0.46.2/src/behaviour.rs","peer":"16Uiu2HAm4QSE1Q6jvtbq4NZjmBdUb4k34KG17vS5bRRfv5ZYtZyE","query":"QueryId(0)"}
Uh oh!
There was an error while loading. Please reload this page.
Summary
We're experiencing intermittent issues with gossip data propagation in our libp2p network using Rust-libp2p 0.54. The problem occurs on both local development machines and test servers, where nodes sometimes fail to receive gossiped messages despite appearing to be connected.
Symptoms:
HandshakeTimedOut
errorsConfiguration:
Logs:
From Node 1 (working):
From Node 2 (failing):
Expected behavior
Actual behavior
1.log
2.log
Relevant log output
Possible Solution
No response
Version
0.54
Would you like to work on fixing this bug?
Yes
The text was updated successfully, but these errors were encountered: