You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What version of nebula are you using? (nebula -version)
1.9.4
What operating system are you using?
Linux
Describe the Bug
setup
I have a nebula client configured with 3 relays (which are also lighthouses but I don't think it matters) to connect to the mesh. Other clients don't need the relays:
client (relayed, in private subnet) -> relays (lighthouses) -> (MESH on public internet) <- other clients (non-relayed)
problem
When the relayed client first registers with the relays, everything works fine. When the client restarts and reconnects to the relays, some non-relayed clients can't connect back to it until I either restart them one by one, or at least one of the relays (which fixes all the problematic non-relayed clients in one go).
The relayed client is in a private subnet
the lighthouses/relays are exposed to the internet with 1:1 NAT
We don't have full control over the network infra of the non-relayed clients: they are on-prem at various customers locations, we asked them to open and forward the nebula port but I think some may still have problematic NAT (that's why we have punchy true)
some non-relayed clients are not affected by this issue (they reconnect immediately to the restarted relayed client), but since we don't have any control over their network infra it's hard to tell the difference between them and the other clients
Note: traffic between all the non-relayed clients is never affected.
fix (workaround)
Either:
restart the non-relayed clients (only fixes connection for individual clients to the relayed client)
or restart at least one lighthouse (fixes all clients connections to the relayed client)
Logs from affected hosts
some anonymized logs when the error happens (the timestamps don't match exactly, but the same messages are looped forever anyway):
lighthouses (relays):
logs:
time="2024-10-09T13:55:22Z" level=info msg="Failed to find target host info by ip" certName=otherclient1.mesh error="unable to find host with relay" localIndex=80854473 relayTo=100.96.2.13 remoteIndex=3176915269 vpnIp=100.99.63.1
...
(this message is quickly repeated for all the other clients and loops forever. 100.96.2.13 is the relayed client)
What version of
nebula
are you using? (nebula -version
)1.9.4
What operating system are you using?
Linux
Describe the Bug
setup
I have a nebula client configured with 3 relays (which are also lighthouses but I don't think it matters) to connect to the mesh. Other clients don't need the relays:
problem
When the relayed client first registers with the relays, everything works fine. When the client restarts and reconnects to the relays, some non-relayed clients can't connect back to it until I either restart them one by one, or at least one of the relays (which fixes all the problematic non-relayed clients in one go).
Note: traffic between all the non-relayed clients is never affected.
fix (workaround)
Either:
Logs from affected hosts
some anonymized logs when the error happens (the timestamps don't match exactly, but the same messages are looped forever anyway):
lighthouses (relays):
logs:
(this message is quickly repeated for all the other clients and loops forever. 100.96.2.13 is the relayed client)
client (relayed):
logs:
(loops forever)
other clients (not relayed):
logs:
(loops forever)
Config files from affected hosts
lighthouses (relays):
config:
client (relayed):
config:
other clients (not relayed):
config:
The text was updated successfully, but these errors were encountered: