Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(net): disconnect from malicious nodes if necessary #5899

Closed
wants to merge 30 commits into from

Conversation

317787106
Copy link
Contributor

@317787106 317787106 commented Jul 3, 2024

What does this PR do?

To maintain the stability of the network node and prevent isolation, we need to disconnect from malicious nodes on a regular basis. Three strategies are adopted:

  1. If the node is within a LAN network and the number of peers is equal to or greater than minConnection, disconnect from the earliest malicious peer.
  2. If the node is isolated and the number of peers is equal to or greater than minConnection, disconnect from the earliest malicious peer. Active connection is preferred.
  3. If the number of peers is equal to or greater than maxConnection, disconnect from the earliest malicious peer.

We also optimize the strategy of random disconnection.

Why are these changes required?

This PR has been tested by:

  • Unit Tests
  • Manual Testing

Follow up

Extra details

@317787106 317787106 changed the title feat(net): disconnect with malicious nodes if necessary feat(net): disconnect from malicious nodes if necessary Jul 3, 2024
@Getter
@Setter
private long latestSaveBlockTime = System.currentTimeMillis();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define this variable? Can it be determined by the header block time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot be determined by the header block time. The variable is used to determine how long ago I save the last block. Let's suppose that i am far from newest block, the header block time is useless in this scene.


@Getter
@Setter
private int checkInterval = 60;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not need to be defined as a parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specification requires that the value of check interval is configurable.


@Getter
@Setter
private int blockNotChangeThreshold = 300;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to use inactive instead, and merge the two parameters into one parameter inactiveThreshold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have completely different definition. blockNotChangeThreshold is the config parameter that means if latest block has stay unchanged in blockNotChangeThreshold seconds, we think the node is isolated by peers.

@@ -178,6 +184,7 @@ private P2pConfig updateConfig(P2pConfig config) {
config.setPort(parameter.getNodeListenPort());
config.setNetworkId(parameter.getNodeP2pVersion());
config.setDisconnectionPolicyEnable(parameter.isOpenFullTcpDisconnect());
config.setNotActiveInterval(parameter.peerNoBlockTime * 1000L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does libp2p need this parameter for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strategy of Random disconnection use this parameter to filter no block peer. If we send or receive some blocks among peerNoBlockTime from this peer, it cannot be disconnect.

public class MaliciousFeature {

@Setter
private long advStartTime = System.currentTimeMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is advStartTime the current time? If the peer is always in synchronization, will it affect other judgments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a precondition that the peer must be in adv status before judgments:

    public void updateBadFeature3() {
      long tempTime = Math.max(channel.getLastActiveTime(), advStartTime);
      if (!needSyncFromPeer && !needSyncFromUs && System.currentTimeMillis() - tempTime
          > resilienceConfig.getPeerNotActiveThreshold() * 1000L) {
        zombieBeginTime = tempTime;
      }
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not make sense logically. It also does not affect the following function.

@Setter
private long stopBlockInvTime = -1;
@Setter
private long lastRecBlockInvTime = System.currentTimeMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it defined as the current time? It is possible that the block has never been received.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use lastRecBlockInvTime = -1 instead.


@Getter
@Setter
private boolean testStopInv = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a field in a feature. The feature should be configurable. If the feature is not enabled, defining this field is useless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used to specify whether we test if the peer is malicious. There is alternative method. Only one of them can be used. It can be configured in config.conf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a field in the rule. What I mean is that you should configure it according to the rules, not the fields in the rules. When you want to enable or disable rules, you can just change the configuration without changing the code.

//if peer is not active for too long, test if peer will broadcast block inventory to me
//after I stop broadcasting block inventory to it
peer.getMaliciousFeature().setStopBlockInvTime(System.currentTimeMillis());
invCheckExecutor.schedule(() -> peer.getMaliciousFeature().updateBadFeature4(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use a scheduler to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check that if we receive some inventory from peer in 10 seconds, it is a timer. It is scheduled only once.

public class MaliciousFeature {

@Setter
private long advStartTime = System.currentTimeMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not make sense logically. It also does not affect the following function.

}

//it can only be set from -1 to positive
public void updateBadFeature1() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design is unreasonable. All features should be moved to the ResilienceService you defined for implementation. This class only needs to define fields and set field values.

@@ -1391,6 +1391,7 @@ public void updateDynamicProperties(BlockCapsule block) {
(chainBaseManager.getDynamicPropertiesStore().getLatestBlockHeaderNumber()
- chainBaseManager.getDynamicPropertiesStore().getLatestSolidifiedBlockNum()
+ 1));
chainBaseManager.setLatestSaveBlockTime(System.currentTimeMillis());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment statement should not be placed in this method, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We verify whether the node is isolated through the latest writing block time to database. If this time stay unchanged over some minutes, it's isolated. Do you have better solution or place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not deny the timing of assigning this variable, I just feel that the assignment statement is unreasonable in this method, I think you can put it near where this method is called, what do you think?

Copy link
Contributor Author

@317787106 317787106 Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can assign it in ResilienceService where it's used, but it's not accurate. Let's try it.

long tempTime = Math.max(channel.getLastActiveTime(), advStartTime);
if (!needSyncFromPeer && !needSyncFromUs && System.currentTimeMillis() - tempTime
> resilienceConfig.getPeerNotActiveThreshold() * 1000L) {
zombieBeginTime = tempTime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a peer has been set with zombieBeginTime, will it still be considered a malicious node even if it resumes block interaction before being selected to disconnect? I haven’t seen any other place that updates zombieBeginTime.

public void updateBadFeature4() {
if (zombieBeginTime2 < 0
&& maliciousFeature.lastRecBlockInvTime < maliciousFeature.stopBlockInvStartTime) {
zombieBeginTime2 = getLatestTime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two questions:

  1. Same question as zombieBeginTime, is recovery not allowed?
  2. When the inventory is started to be sent but the test end time is not reached, will zombieBeginTime2 be assigned a value after this method is called? Is this assignment operation wrong at this time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zombieBeginTime is not allowed to recover. zombieBeginTime2 will be assigned a value if no inventory received during 10 seconds. Is there any problem ?

Copy link
Contributor

@jwrct jwrct Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't zombieBeginTime and zombieBeginTime2 allowed to recover? Even though the node triggered the rules you set, it quickly recovered, or even mistakenly identified as a malicious node by us.
What I mean is, will there be a scenario where updateBadFeature4 is called before 10 seconds? What consequences will occur if this scenario happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems reasonable to recover zombieBeginTime, but zombieBeginTime2 cannot be recovered. updateBadFeature4 will be not called before 10 secondes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateBadFeature4 will be not called before 10 secondes.

Okay, I took another careful look, and what I'm saying aligns with what you said.

@317787106 317787106 closed this Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants