Replies: 4 comments 2 replies
-
@guillaumemichel maybe something you can answer? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the heads up @elenaf9 @maqi your summary seems correct, it is a valid concern. I am not sure why this design choice was adopted, but I agree that it is suboptimal. About refreshing buckets
How the refresh process can be improved
|
Beta Was this translation helpful? Give feedback.
-
Hey @guillaumemichel That's some good feedback. I think the pre split buckets approach of libp2p is good in some ways but does skew the algorithm to create some targeting like behaviour which is deep and non intuitive. Dragons and all that live there :-) However the approach seems reasonable. In traditional KAD the number of buckets is more correct in terms of network size due to the splitting of buckets when full approach and in some ways can be easier to reason with (i.e. most likely the close bucket is the non full one and so on). Dead does aside. A pre-image for search seems something to be careful about. If we search in a bucket for more nodes but use a precalculated fixed address we are likely to get the nodes closest to the address as the result for nodes found "in that bucket" from our perspective. So using a computed pre-image of addresses to query for may give a more constant set of very close nodes in a bucket, if we are not careful. i.e. the address space of the knowledge we have about that part of the network is confined to a smaller less distributed group than a random bucket search in that bucket. I think it is worth looking at using the bucket address (common leading bits to you) and than doing a random number to fill out the rest of the address. i.e. in bucket n-100 create a (255-100 = 155) bit random number and append that to the bucket address. In that way we can probe across the address range of the bucket each time as opposed to a more constant target within the bucket. It would be safe to use the same random number (so create once per pass) and use the common leading bits from each bucket we wish to test as the beginning of our target address. So that it's more efficient in cpu terms. That would be akin to trad kad using the closest bucket as a target for a random find node. I think this is what you mean by
But just being sure here. The other issue is random nodes appearing in buckets further down the bucket list, where in KAD they would all usually fit in the first non full bucket, here they could appear at random in a low numbered bucket (i.e. in a 20K network we find a node in bucket 90 and one in bucket 140 and then a couple in another low bucket etc.). This gives a potentially sparse bucket approach which is another thing we should consider IMO. It would be smashing to have entirely even distributions but I think we need to consider not so and especially with sybil type attacks and more that sparse buckets do perhaps need some consideration. It's all pretty deep IMO, but interesting for sure. |
Beta Was this translation helpful? Give feedback.
-
Nice @guillaumemichel appreciate the input
I think perhaps it's worth checking that maybe nodes use the first found (in the first beta replies etc.) in the bucket as opposed to closest? It would be more random/spread in that range, perhaps? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Rust-libp2p’s Kademlia protocol periodically refreshes its RT to maintain an updated view of the network.
The process involves generating random target keys for each bucket, attempting to ensure that each non-empty bucket is refreshed.
However, due to constraints in target generation and the probabilistic nature of hitting specific buckets, the actual distribution of refresh targets is skewed towards the farthest buckets :
the probability of within 16 attempts to hit the bucket is :
if hasn’t hit the bucket after 16 attempts, the last generated target still got used
The final generated targets are actually:
This results in
over queried
un-necessarilyWondering if this has been considered previously? and why such scheme got decided to be used ?
Beta Was this translation helpful? Give feedback.
All reactions