Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many active HTLC, lost most of channels #CLN24.05 #bitcoind27.1 #RPi4 #715

Open
eltoo-cln opened this issue Jul 10, 2024 · 5 comments
Open

Comments

@eltoo-cln
Copy link

eltoo-cln commented Jul 10, 2024

Hello I am trying to find what went wrong with my node and why I lost recently most of my channels.
This situation happen shortly after upgrading to 1.0.9 (CLN 24.05)

My HW setup:

  • RPi4 8GB RAM
  • 1TB SSD connected via USB3 enclosure

This node has been up for about 2 years without any previous issues.
NB is deployed via krops (I still didn't move to flakes)

CLN:

lightning-cli getinfo
{
   "id": "reducted",
   "alias": "reducted",
   "color": "reducted",
   "num_peers": 37,
   "num_pending_channels": 0,
   "num_active_channels": 5,
   "num_inactive_channels": 5,
   "address": [
      {
         "type": "torv3",
         "address": "reductedd.onion",
         "port": 9735
      }
   ],
   "binding": [
      {
         "type": "ipv4",
         "address": "reducted",
         "port": 9735
      }
   ],
   "version": "v24.05",
   "blockheight": 851703,
   "network": "bitcoin",
   "fees_collected_msat": reducted,
   "lightning-dir": "/var/lib/clightning/bitcoin",
   "our_features": {
      "init": "08a0800a8a59a1",
      "node": "88a0800a8a59a1",
      "channel": "",
      "invoice": "02000002024100"
   }
}

configuration.nix

{ config, pkgs, lib, ... }: {
  imports = [
    <nix-bitcoin/modules/presets/secure-node.nix>
    <nix-bitcoin/modules/presets/hardened.nix>
  ./hardware-configuration.nix

  ];
  nix-bitcoin.onionServices.bitcoind.public = true;

  services.clightning = {
    enable = true;

    plugins = {
      monitor.enable = true;
      rebalance.enable = true;
      currencyrate.enable = true;
      clboss = {       
        enable = true;
        zerobasefee = "require";
        min-onchain = 500000000;
        min-channel = 5000000; 
      };                       
    };

    extraConfig = ''
      alias=reducted
      rgb=reducted
      min-capacity-sat=8000000
      htlc-minimum-msat=1000000
      large-channels
      fee-base=0
      experimental-offers
      experimental-onion-messages
    '';
  };
  nix-bitcoin.onionServices.clightning.public = true;
  systemd.services.clightning.serviceConfig.TimeoutStartSec = "5m";

  services.rtl = {
    enable = true;
  };
  services.rtl.nodes.clightning = {
    enable = true;
    extraConfig = {
      Settings = {
        themeMode = "night";
        enableOffers = "true";
      };
    };
  };
  services.electrs.enable = true;

  services.backups.enable = true;
  programs.ssh.knownHosts."host" = {
    hostNames = [ "recucted" ];
    publicKey = "recucted";
  };

  services.borgbackup.jobs."reducted" = {
    paths = [
      "/var/lib/localBackups"
    ];
    repo = "reducted";
    encryption = {
      mode = "reducted";
      passCommand = "reducted";
    };
    environment.BORG_RSH = "reducted";
    compression = "auto,zstd";
    startAt = "daily";
  };

  nix-bitcoin.netns-isolation.enable = true;

  networking.hostName = "eltoo-cln";
  time.timeZone = "reducted";

  services.openssh = {
    enable = true;
    settings = {
      LoginGraceTime = 0; # RCE fix
    };
  };
  users.users.root = {
    openssh.authorizedKeys.keys = [
      "reducted"
    ];
  };

  environment.systemPackages = with pkgs; [
    reducted
  ];

  boot = {
    kernelPackages = pkgs.linuxPackages_rpi4;
    tmp.useTmpfs = true;
    kernelParams = [
      "8250.nr_uarts=1"
      "console=ttyAMA0,115200"
      "console=tty1"
      "cma=128M"
    ];
  };

  boot.loader.grub.enable = false;

  hardware.enableRedistributableFirmware = true;
  nixpkgs.config = {
    allowUnfree = true;
  };

  networking = {
    dhcpcd.enable = false;
    interfaces.eth0.ipv4.addresses = [{
      address = "reducted";
      prefixLength = 32;
    }];
    defaultGateway = {
      address = "reducted";
      interface = "eth0";
    };
    nameservers = [ "reducted" ];
  };

  i18n.defaultLocale = "reducted";
  console = {
    font = "Hack";
    keyMap = "us";
  };

  nix = {
    settings = {
      auto-optimise-store = true;
    };
    gc = {
      automatic = true;
      dates = "weekly";
      options = "--delete-older-than 14d";
    };
    extraOptions = ''
      min-free = ${toString (100 * 1024 * 1024)}
      max-free = ${toString (1024 * 1024 * 1024)}
    '';
  };

  system.stateVersion = "24.05"; 

  nix-bitcoin.configVersion = "0.0.109";
}

I would like to ask for support. I didn't find anything suspicious in logs, but I'm also not sure what I should be looking for.

After reviewing clightning.service log, there were coupe of strange lines:

lightningd[4601]: INFO    plugin-clboss: New block at 851508
lightningd[4670]: Unhandled exception in concurrent task!
lightningd[4670]: Incorrect type.
lightningd[4670]: ...main loop continuing...
lightningd[4601]: **BROKEN** plugin-clboss: NodeBalanceSwapper: Unexpected 
lightningd[4601]: **BROKEN** plugin-clboss: EarningsRebalancer: Unexpected 
lightningd[4601]: INFO    plugin-clboss: InternetConnectionMonitor: online.
lightningd[4601]: INFO    plugin-clboss: Connector: failed to connect to 02
lightningd[4601]: INFO    plugin-clboss: Connector: failed to connect to 02
lightningd[4601]: UNUSUAL plugin-clboss: NeedsConnectSolicitor: Ran out of 
lightningd[4601]: INFO    plugin-clboss: InternetConnectionMonitor: online.
lightningd[4670]: Unhandled exception in concurrent task!
lightningd[4670]: Incorrect type.
lightningd[4670]: ...main loop continuing...
lightningd[4601]: **BROKEN** plugin-clboss: NodeBalanceSwapper: Unexpected 
lightningd[4601]: **BROKEN** plugin-clboss: EarningsRebalancer: Unexpected 
lightningd[4601]: INFO    plugin-clboss: InternetConnectionMonitor: online.
lightningd[4601]: INFO    plugin-clboss: Connector: connected to 0231eccc65
lightningd[4601]: INFO    plugin-clboss: NeedsConnectSolicitor: Connection 
lightningd[4601]: INFO    plugin-clboss: Connector: failed to connect to 02
lightningd[4670]: Unhandled exception in concurrent task!
lightningd[4670]: Incorrect type.
lightningd[4670]: ...main loop continuing...
lightningd[4601]: **BROKEN** plugin-clboss: NodeBalanceSwapper: Unexpected 
lightningd[4601]: **BROKEN** plugin-clboss: EarningsRebalancer: Unexpected 

BROKEN lines related to plugin-clboss was already reported here

But I am not sure what is this repetitive part related to:

lightningd[4670]: Unhandled exception in concurrent task!
lightningd[4670]: Incorrect type.
lightningd[4670]: ...main loop continuing...

The other thing which may or may not be related is FS status:
/var/lib has less than 10%

df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        391M     0  391M   0% /dev
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           2.0G  3.4M  2.0G   1% /run
tmpfs           3.9G  2.2M  3.9G   1% /run/wrappers
/dev/dm-0        20G   12G  7.5G  60% /
/dev/dm-1       859G  758G   58G  93% /var/lib
tmpfs           3.9G     0  3.9G   0% /tmp
/dev/sda1      1022M  104M  919M  11% /boot
tmpfs           781M  4.0K  781M   1% /run/user/0
tmpfs           781M  4.0K  781M   1% /run/user/1001

I would like to find out what the problem is or report any issue, but first need to narrow down to the root cause.
Is anybody able to guide me on this journey, please?

@jonasnick
Copy link
Member

Hi @eltoo-cln !

What does "Too many active HTLC" mean? This seems like an issue with core lightning or clboss.

@eltoo-cln
Copy link
Author

eltoo-cln commented Jul 11, 2024

Hi @eltoo-cln !

What does "Too many active HTLC" mean? This seems like an issue with core lightning or clboss.

Hi @jonasnick and thank you for your time to read my issue.

Two days ago I could see in RTL over 130 active HTLCs
I did have few before, but not more than 10 IIRC.

Currently there are 56 with three different nodes all of which are already Disconnected and in state: Onchain

@eltoo-cln eltoo-cln changed the title Too many active HTLC, lost most of CHs. #CLN #RPi4 Too many active HTLC, lost most of channels #CLN24.05 #bitcoind27.1 #RPi4 Jul 11, 2024
@dazl
Copy link

dazl commented Jul 13, 2024

you can set the max number of concurrent HTLC (per channel I believe), something like

max-concurrent-htlcs=2

in services.clightning.extraConfig within configuration.nix

@urza
Copy link

urza commented Jul 15, 2024

Hi, I think it would be good to get to the bottom of this to understand what happened here, so others can avoid it.

My node had a channel with @eltoo-cln 's node. Their node lost most of the channels in range of few days because their peers automatically force closed channels with their node.

My node was one of the many that closed channel with their node.

Here are channel updates of our channel as seen from my node:

"state_changes": [
            {
               "timestamp": "2022-07-08T21:23:57.561Z",
               "old_state": "CHANNELD_AWAITING_LOCKIN",
               "new_state": "CHANNELD_NORMAL",
               "cause": "user",
               "message": "Lockin complete"
            },
            {
               "timestamp": "2024-07-07T18:32:26.570Z",
               "old_state": "CHANNELD_NORMAL",
               "new_state": "AWAITING_UNILATERAL",
               "cause": "protocol",
               **"message": "Offered HTLC 681 SENT_ADD_ACK_REVOCATION cltv 851131 hit deadline"**
            },
            {
               "timestamp": "2024-07-07T18:43:26.772Z",
               "old_state": "AWAITING_UNILATERAL",
               "new_state": "FUNDING_SPEND_SEEN",
               "cause": "onchain",
               "message": "Onchain funding spend"
            },
            {
               "timestamp": "2024-07-07T18:43:27.151Z",
               "old_state": "FUNDING_SPEND_SEEN",
               "new_state": "ONCHAIN",
               "cause": "onchain",
               "message": "Onchain init reply"
            }
         ],

This is beyond my understanding of the inner workings of LN, but the important part seems to be:

"message": "Offered HTLC 681 SENT_ADD_ACK_REVOCATION cltv 851131 hit deadline"

Is it possible that @eltoo-cln 's node was in a "stuck" state without ability to write to db or something like that and could not cooperate on moving HTLCs to proper state and that is why all the peers force closed the channels?

@northben
Copy link

could be caused by clboss: ElementsProject/lightning#7172 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants