feat(net): disconnect from malicious nodes if necessary #5899

317787106 · 2024-07-03T09:38:05Z

What does this PR do?

To maintain the stability of the network node and prevent isolation, we need to disconnect from malicious nodes on a regular basis. Three strategies are adopted:

If the node is within a LAN network and the number of peers is equal to or greater than minConnection, disconnect from the earliest malicious peer.
If the node is isolated and the number of peers is equal to or greater than minConnection, disconnect from the earliest malicious peer. Active connection is preferred.
If the number of peers is equal to or greater than maxConnection, disconnect from the earliest malicious peer.

We also optimize the strategy of random disconnection.

Why are these changes required?

This PR has been tested by:

Unit Tests
Manual Testing

Follow up

Extra details

xxo1shine · 2024-07-04T01:51:38Z

chainbase/src/main/java/org/tron/core/ChainBaseManager.java

+  @Getter
+  @Setter
+  private long latestSaveBlockTime = System.currentTimeMillis();
+


Why define this variable? Can it be determined by the header block time?

It cannot be determined by the header block time. The variable is used to determine how long ago I save the last block. Let's suppose that i am far from newest block, the header block time is useless in this scene.

common/src/main/java/org/tron/common/parameter/CommonParameter.java

xxo1shine · 2024-07-04T01:56:20Z

common/src/main/java/org/tron/common/parameter/ResilienceConfig.java

+
+  @Getter
+  @Setter
+  private int checkInterval = 60;


This does not need to be defined as a parameter.

The specification requires that the value of check interval is configurable.

xxo1shine · 2024-07-04T01:59:50Z

common/src/main/java/org/tron/common/parameter/ResilienceConfig.java

+
+  @Getter
+  @Setter
+  private int blockNotChangeThreshold = 300;


It is recommended to use inactive instead, and merge the two parameters into one parameter inactiveThreshold.

They have completely different definition. blockNotChangeThreshold is the config parameter that means if latest block has stay unchanged in blockNotChangeThreshold seconds, we think the node is isolated by peers.

xxo1shine · 2024-07-04T02:09:53Z

framework/src/main/java/org/tron/core/net/TronNetService.java

@@ -178,6 +184,7 @@ private P2pConfig updateConfig(P2pConfig config) {
    config.setPort(parameter.getNodeListenPort());
    config.setNetworkId(parameter.getNodeP2pVersion());
    config.setDisconnectionPolicyEnable(parameter.isOpenFullTcpDisconnect());
+    config.setNotActiveInterval(parameter.peerNoBlockTime * 1000L);


What does libp2p need this parameter for?

The strategy of Random disconnection use this parameter to filter no block peer. If we send or receive some blocks among peerNoBlockTime from this peer, it cannot be disconnect.

framework/src/main/java/org/tron/core/net/peer/PeerConnection.java

xxo1shine · 2024-07-04T02:36:03Z

framework/src/main/java/org/tron/core/net/peer/PeerConnection.java

+  public class MaliciousFeature {
+
+    @Setter
+    private long advStartTime = System.currentTimeMillis();


Why is advStartTime the current time? If the peer is always in synchronization, will it affect other judgments?

It has a precondition that the peer must be in adv status before judgments:

public void updateBadFeature3() { long tempTime = Math.max(channel.getLastActiveTime(), advStartTime); if (!needSyncFromPeer && !needSyncFromUs && System.currentTimeMillis() - tempTime > resilienceConfig.getPeerNotActiveThreshold() * 1000L) { zombieBeginTime = tempTime; } }

This does not make sense logically. It also does not affect the following function.

xxo1shine · 2024-07-04T02:41:15Z

framework/src/main/java/org/tron/core/net/peer/PeerConnection.java

+    @Setter
+    private long stopBlockInvTime = -1;
+    @Setter
+    private long lastRecBlockInvTime = System.currentTimeMillis();


Why is it defined as the current time? It is possible that the block has never been received.

We can use lastRecBlockInvTime = -1 instead.

xxo1shine · 2024-07-04T02:45:57Z

common/src/main/java/org/tron/common/parameter/ResilienceConfig.java

+
+  @Getter
+  @Setter
+  private boolean testStopInv = false;


This is only a field in a feature. The feature should be configurable. If the feature is not enabled, defining this field is useless.

It is used to specify whether we test if the peer is malicious. There is alternative method. Only one of them can be used. It can be configured in config.conf.

This is just a field in the rule. What I mean is that you should configure it according to the rules, not the fields in the rules. When you want to enable or disable rules, you can just change the configuration without changing the code.

xxo1shine · 2024-07-04T07:50:25Z

framework/src/main/java/org/tron/core/net/service/adv/AdvService.java

+            //if peer is not active for too long, test if peer will broadcast block inventory to me
+            //after I stop broadcasting block inventory to it
+            peer.getMaliciousFeature().setStopBlockInvTime(System.currentTimeMillis());
+            invCheckExecutor.schedule(() -> peer.getMaliciousFeature().updateBadFeature4(),


Why use a scheduler to do this?

We check that if we receive some inventory from peer in 10 seconds, it is a timer. It is scheduled only once.

xxo1shine · 2024-07-05T09:31:58Z

framework/src/main/java/org/tron/core/net/peer/PeerConnection.java

+  public class MaliciousFeature {
+
+    @Setter
+    private long advStartTime = System.currentTimeMillis();


This does not make sense logically. It also does not affect the following function.

xxo1shine · 2024-07-05T09:35:20Z

framework/src/main/java/org/tron/core/net/peer/PeerConnection.java

+    }
+
+    //it can only be set from -1 to positive
+    public void updateBadFeature1() {


The design is unreasonable. All features should be moved to the ResilienceService you defined for implementation. This class only needs to define fields and set field values.

jwrct · 2024-07-09T08:49:45Z

framework/src/main/java/org/tron/core/db/Manager.java

@@ -1391,6 +1391,7 @@ public void updateDynamicProperties(BlockCapsule block) {
        (chainBaseManager.getDynamicPropertiesStore().getLatestBlockHeaderNumber()
            - chainBaseManager.getDynamicPropertiesStore().getLatestSolidifiedBlockNum()
            + 1));
+    chainBaseManager.setLatestSaveBlockTime(System.currentTimeMillis());


This assignment statement should not be placed in this method, right?

We verify whether the node is isolated through the latest writing block time to database. If this time stay unchanged over some minutes, it's isolated. Do you have better solution or place?

I did not deny the timing of assigning this variable, I just feel that the assignment statement is unreasonable in this method, I think you can put it near where this method is called, what do you think?

I can assign it in ResilienceService where it's used, but it's not accurate. Let's try it.

jwrct · 2024-07-09T10:38:42Z

framework/src/main/java/org/tron/core/net/peer/PeerConnection.java

+      long tempTime = Math.max(channel.getLastActiveTime(), advStartTime);
+      if (!needSyncFromPeer && !needSyncFromUs && System.currentTimeMillis() - tempTime
+          > resilienceConfig.getPeerNotActiveThreshold() * 1000L) {
+        zombieBeginTime = tempTime;


If a peer has been set with zombieBeginTime, will it still be considered a malicious node even if it resumes block interaction before being selected to disconnect? I haven’t seen any other place that updates zombieBeginTime.

jwrct · 2024-07-09T11:03:14Z

framework/src/main/java/org/tron/core/net/peer/PeerConnection.java

+    public void updateBadFeature4() {
+      if (zombieBeginTime2 < 0
+          && maliciousFeature.lastRecBlockInvTime < maliciousFeature.stopBlockInvStartTime) {
+        zombieBeginTime2 = getLatestTime();


There are two questions:

Same question as zombieBeginTime, is recovery not allowed?

When the inventory is started to be sent but the test end time is not reached, will zombieBeginTime2 be assigned a value after this method is called? Is this assignment operation wrong at this time?

zombieBeginTime is not allowed to recover. zombieBeginTime2 will be assigned a value if no inventory received during 10 seconds. Is there any problem ?

Why aren't zombieBeginTime and zombieBeginTime2 allowed to recover? Even though the node triggered the rules you set, it quickly recovered, or even mistakenly identified as a malicious node by us.
What I mean is, will there be a scenario where updateBadFeature4 is called before 10 seconds? What consequences will occur if this scenario happens?

It seems reasonable to recover zombieBeginTime, but zombieBeginTime2 cannot be recovered. updateBadFeature4 will be not called before 10 secondes.

updateBadFeature4 will be not called before 10 secondes.

Okay, I took another careful look, and what I'm saying aligns with what you said.

317787106 added 21 commits June 26, 2024 17:32

initial submit of disconnect zombie node

7dbe7c6

format MaliciousFeature

53dab13

rename to peerNotActiveTime

1b6ca45

modify default peerNotActiveTime to 600

9e2b144

reduce log

1b198f8

set type of peerNoBlockTime to seconds

5a41c62

close resilienceService when TronNetService close

4af6102

update feature 2

d7d5aff

rearrange the close order

d7c0fef

simplfy PeerConnection

75f405d

don't disconnect with active peer if connection is full in case 3

22c54ed

add some log

7207537

add condition when disconnect

2642b9f

add feature if enables

edf6934

add testcase ResilienceServiceTest

81b4289

merge develop

6c87144

rename name of config value

f2a1732

schedule to test after 10 seconds

406c5c2

only one of feature3 and feature4 is used

c5c3add

use same peerNotActiveThreshold for block and inventory

90af8ef

add testcase testCondition1StopInv

828077b

317787106 changed the title ~~feat(net): disconnect with malicious nodes if necessary~~ feat(net): disconnect from malicious nodes if necessary Jul 3, 2024

317787106 added 3 commits July 3, 2024 19:36

fix checkstyle and sonar check

a57de7a

test pause send inventory

7836f4c

set TEST_PAUSE_INV_SECONDS to constant

24bcaa6

xxo1shine reviewed Jul 4, 2024

View reviewed changes

delete config item node.peerNoBlockTime

5bf9ea3

xxo1shine reviewed Jul 5, 2024

View reviewed changes

317787106 added 2 commits July 9, 2024 10:53

init latestSaveBlockTime in init method

15b3abc

use setNeedSyncFromPeer,setNeedSyncFromUs method

369cd92

jwrct reviewed Jul 9, 2024

View reviewed changes

317787106 added 3 commits July 9, 2024 22:08

update some default value of class Feature

3662502

check if addInv success

3a85f98

use ReasonCode.BAD_PROTOCOL; noInvBackTime is recoverable

eafda50

317787106 closed this Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(net): disconnect from malicious nodes if necessary #5899

feat(net): disconnect from malicious nodes if necessary #5899

317787106 commented Jul 3, 2024 •

edited

Loading

xxo1shine Jul 4, 2024

317787106 Jul 4, 2024

xxo1shine Jul 4, 2024

317787106 Jul 4, 2024

xxo1shine Jul 4, 2024

317787106 Jul 4, 2024

xxo1shine Jul 4, 2024

317787106 Jul 4, 2024

xxo1shine Jul 4, 2024

317787106 Jul 4, 2024

xxo1shine Jul 5, 2024

xxo1shine Jul 4, 2024

317787106 Jul 4, 2024

xxo1shine Jul 4, 2024

317787106 Jul 4, 2024

xxo1shine Jul 5, 2024

xxo1shine Jul 4, 2024

317787106 Jul 4, 2024

xxo1shine Jul 5, 2024

xxo1shine Jul 5, 2024

jwrct Jul 9, 2024

317787106 Jul 9, 2024

jwrct Jul 10, 2024

317787106 Jul 10, 2024 •

edited

Loading

jwrct Jul 9, 2024

jwrct Jul 9, 2024

317787106 Jul 9, 2024

jwrct Jul 10, 2024 •

edited

Loading

317787106 Jul 10, 2024

jwrct Jul 10, 2024

feat(net): disconnect from malicious nodes if necessary #5899

feat(net): disconnect from malicious nodes if necessary #5899

Conversation

317787106 commented Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

317787106 Jul 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwrct Jul 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

317787106 commented Jul 3, 2024 •

edited

Loading

317787106 Jul 10, 2024 •

edited

Loading

jwrct Jul 10, 2024 •

edited

Loading