Replies: 1 comment 1 reply
-
It is not intended that you would leave the service stopped for an extended period of time. Without the supervisor and kubelet running, pods running on the node are no longer synced to the larger cluster state. CNI functionality, and any iptables programming provided by the kubelet, may break on the affected node as they drift out of sync. Pods are left running to facilitate minimally disruptive upgrades, but in general you should not expect things to work right if the service is left stopped for an extended period of time. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Environmental Info:
RKE2 Version:
v1.25.12+rke2r1
Node(s) CPU architecture, OS, and Version:
Linux 4.18.0-425.19.2.el8_7.x86_64 #1 SMP Fri Mar 17 01:52:38 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
Describe the bug: When the
rke2-server
service is stopped on a control plane node, the following has been observed (depending on the node):NotReady
Note that shutting down the node or running the
rke2-killall.sh
script has tended to resolve the issues seen above.Steps To Reproduce:
rke2-server
service on a control plane nodeExpected behavior: The cluster is able to continue functioning with the loss of the
rke2-server
service on one of the control-plane nodes.Actual behavior: Depending on the node, the following has been observed:
NotReady
Note that shutting down the node or running the
rke2-killall.sh
script has tended to resolve the issues seen above.Additionally, the application running is still accessible at the frontend webpage.
Additional context / logs:
Below log is a fresh worker pod that is unable to connect to the associated database whilst rke2-server is stopped, upon restarting rke2-server it connects successfully
[12:26:45 INF] Starting ASK Web API
2023-08-16T21:26:49.037248513+09:00 [12:26:48 WRN] Storing keys in a directory '/root/.aspnet/DataProtection-Keys' that may not be persisted outside of the container. Protected data will be unavailable when container is destroyed.
[12:26:50 WRN] No XML encryptor configured. Key {key} may be persisted to storage in unencrypted form.
2023-08-16T21:26:50.538069802+09:00 [12:26:50 INF] Worker running at: 08/16/2023 12:26:50 +00:00
[12:27:39 ERR] An error occurred using the connection to database 'ASK' on server 'DB-ASK.domain'.
[12:27:40 ERR] An exception occurred while iterating over the results of a query for context type 'DatabaseModels.Models.AskContext'.
2023-08-16T21:27:40.335924943+09:00 Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 35 - An internal exception was caught)
2023-08-16T21:27:40.335935579+09:00 ---> System.AggregateException: One or more errors occurred. (Resource temporarily unavailable)
2023-08-16T21:27:40.335941685+09:00 ---> System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (00000001, 11): Resource temporarily unavailable
2023-08-16T21:27:40.335947481+09:00 at System.Net.Dns.GetHostEntryOrAddressesCore(String hostName, Boolean justAddresses, AddressFamily addressFamily, ValueStopwatch stopwatch)
Beta Was this translation helpful? Give feedback.
All reactions