Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values expiring with permanent=true #599

Open
ghenry opened this issue Mar 30, 2022 · 24 comments
Open

Values expiring with permanent=true #599

ghenry opened this issue Mar 30, 2022 · 24 comments
Assignees
Labels

Comments

@ghenry
Copy link
Collaborator

ghenry commented Mar 30, 2022

On 2.4.0 should this be happening?

Screenshot from 2022-03-30 19-42-02

I'm getting all my own puts back off the DHT too, which I skip as I'm checking what put them there.

Thanks.

@ghenry
Copy link
Collaborator Author

ghenry commented Mar 30, 2022

This also results and in a big rush inbound and one of the opendht threads going 100% CPU.

@aberaud aberaud self-assigned this Mar 30, 2022
@aberaud aberaud added the bug label Mar 30, 2022
@aberaud
Copy link
Member

aberaud commented Mar 30, 2022

This looks like a catastrophic bug

  • What's the lifetime of the nodes performing the put?
  • How long do you wait before seeing that?

Thanks

@ghenry
Copy link
Collaborator Author

ghenry commented Mar 30, 2022

This looks like a catastrophic bug

  • What's the lifetime of the nodes performing the put?

Ah, I'm stopping and starting this development node. What triggers it? A mode going offline?

  • How long do you wait before seeing that?

Hmmm, I'll add a timer or something when I first see an expired value.

@ghenry
Copy link
Collaborator Author

ghenry commented Mar 30, 2022

So, my sentrypeer node has been up since 16:10 today and you can see that the event timestamps of bad_actors all created after that, so they shouldn't have expired:

Screenshot from 2022-03-30 20-21-15

You can prove that, because in this screenshot I don't process them as they have the same node_id (uuid) as the currently running node:

Screenshot from 2022-03-30 20-20-40

In this console screenshot you can see I'm saving them all permanently (https://github.com/SentryPeer/SentryPeer/blob/main/src/peer_to_peer_dht.c#L345):

Screenshot from 2022-03-30 20-19-10

Then they all flood in again:

Screenshot from 2022-03-30 20-21-09

When you see the flood of expires and flood of values for the bad_actors key, that thread CPU sits at 100%. Right now it's doing this. I could take a video if it helps :-)

Thanks.

@ghenry
Copy link
Collaborator Author

ghenry commented Mar 30, 2022

This is a vanillia dhtnode from master branch today, on the same box bootstrapped to bootstrap.sentrypeer.org up since 16:04 showing the same set of values coming in again via l bad_actors. Note the timestamps too:

Screenshot from 2022-03-30 20-33-00

and then I see them expire again on my sentrypeer node, then they go round and round :-)

@ghenry
Copy link
Collaborator Author

ghenry commented Mar 30, 2022

I've stopped all my nodes now and restarted, but things are probably living on other DHT nodes that I'm not running as per 4222 is open....let's see.

@aberaud
Copy link
Member

aberaud commented Mar 30, 2022

Thanks. How many values are on this key approximately?

@aberaud
Copy link
Member

aberaud commented Mar 30, 2022

Ah, I'm stopping and starting this development node. What triggers it? A mode going offline?

For the "permanent put" feature to work, the node would need to stay online for the duration of the value lifetime

@ghenry
Copy link
Collaborator Author

ghenry commented Mar 30, 2022

Ah, I'm stopping and starting this development node. What triggers it? A mode going offline?

For the "permanent put" feature to work, the node would need to stay online for the duration of the value lifetime

Yep, going by my screenshots the same node was.

@ghenry
Copy link
Collaborator Author

ghenry commented Mar 30, 2022

Thanks. How many values are on this key approximately?

No idea. Can I see that on dhtnode if I do a get?

@ghenry
Copy link
Collaborator Author

ghenry commented Mar 30, 2022

Just done that and pasted into a text file. 210 values at the moment.

@ghenry
Copy link
Collaborator Author

ghenry commented Apr 1, 2022

If there is connectivity issues too, I guess this can happen? Who is expiring these values? The server I bootstrapped to or other nodes? Just trying to understand. This node is running on broadband:

Screenshot from 2022-04-01 08-57-29

Screenshot from 2022-04-01 08-57-33

@aberaud
Copy link
Member

aberaud commented Apr 12, 2022

I updated dhtcnode so it now does something similar to the C++ node.

Still investigating this issue. Some standalone code to reproduce the problem would be useful.

@aberaud
Copy link
Member

aberaud commented Apr 12, 2022

I made a few tests on my side and had no issue with get, put and listen on this key using the C bindings and dhtnode.

@ghenry
Copy link
Collaborator Author

ghenry commented Apr 12, 2022 via email

@aberaud
Copy link
Member

aberaud commented Apr 19, 2022

Could you please try with 2.4.1 ?

I could reproduce some issues when the value count was reaching the value limit per key (for a given node). At least some of these issues should now be solved. The limit has also been raised from 1024 to 64k.

@ghenry
Copy link
Collaborator Author

ghenry commented Apr 19, 2022 via email

@aberaud
Copy link
Member

aberaud commented Apr 19, 2022

A middle-click-paste introduced a typo in the 2.4.1 release commit -_-
Made a 2.4.2

@ghenry
Copy link
Collaborator Author

ghenry commented Apr 20, 2022

Is 2.4.2 coming out today? Just done - Homebrew/homebrew-core#99672

@ghenry
Copy link
Collaborator Author

ghenry commented Apr 20, 2022

Alpine submitted too.

@aberaud
Copy link
Member

aberaud commented Apr 20, 2022

The tag is there, just the GitHub release is not documented yet

@ghenry
Copy link
Collaborator Author

ghenry commented Apr 23, 2022

Still seeing this, but my bootstrap node is on 2.4.0 via Homebrew. Test node gets restart often and is on 2.4.2. Need to get this 2.4.3 out so I can test on that.

Value callback expired: {"app_name":"sentrypeer","app_version":"1.4.1","event_timestamp":"2022-04-23 01:06:19.505170365","event_uuid":"f93e6871-2dd7-45fc-b1ac-526a575898ac","created_by_node_id":"04ec5965-4e4b-415a-b464-ecf1cc1fa90d","collected_method":"responsive","transport_type":"UDP","source_ip":"118.123.237.29","destination_ip":"xxx","called_number":"","sip_method":"","sip_user_agent":"","sip_message":""}

@ghenry
Copy link
Collaborator Author

ghenry commented Jun 8, 2022

Hi @aberaud

Any thoughts on this still? I'd really like to get this expired thing sorted and a bandwidth limiter in place for UDP traffic in the lib.

They flood in from peers as per https://twitter.com/ghenry/status/1534525326155554817 but

Thanks,
Gavin.

@aberaud
Copy link
Member

aberaud commented Jun 12, 2022

The problem is that every sentrypeer dht node puts every value on the same key (bad_actors). The DHT Kademlia design distributes the load over the different keys. In case of significant load on a single key, best-effort applies and there is no guarantee to obtain all the values, instead, the load will spread on adjacent nodes as nodes taking too much traffic will stop responding (for BitTorrent, this means a popular torrent won't flood all the same nodes, and peers later exchange peer lists directly with PEX).

The best way to handle this might be to put values on the hash of the host to verify, and perform a get when needed to check this host. This would ensure the load of sentrypeer would better spread on the distributed network and greatly improve the reliability of both sentrypeer and the public opendht network, at the cost of added latency when verifying a new host.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants