RKE2 cluster high disk IO and causing disk thrashing #4747

smartlocus · 2023-09-11T18:57:36Z

smartlocus
Sep 11, 2023

Hello Everyone,
Maybe someone had this issue. Rke2 is constantly saving the state of the cluster under /var/lib/rancher/rke2/server/db/etcd/config
And this is causing disk thrashing in my underlying disks. Is there anyway to reduce how often the rke2 writes to the disk or the logging level of the whole cluster? For now i only have the cluster runniing and nothing more.RKE2 keeps writing to the disks in different time intervals. I would appreciate it if someoen could help me with tuning this. I have put pictures of disk io activity from the virtual machine where my rke2 cluster is running and also from the the underlying TrueNAS Server. AS for disks i am using underlying hdd disks

Thx in advance

brandond · 2023-09-11T22:51:35Z

brandond
Sep 11, 2023
Maintainer

Rke2 is constantly saving the state of the cluster under /var/lib/rancher/rke2/server/db/etcd/config
And this is causing disk thrashing in my underlying disks. Is there anyway to reduce how often the rke2 writes to the disk or the logging level of the whole cluster?

That is not what's happening. What you're seeing is a normal level of constant writing by etcd. Even without a workload deployed, core Kubernetes components are constantly health-checking and renewing leases. Each of these writes needs to be syncd to disk by etcd. It is normally recommended that you deploy etcd on ssd, and if you are running multiple VMs on a single host, avoid using the same backing disk for all the nodes.

If you are trying to reduce overhead to the point where just running etcd is too much to bear, you might look at using a single-server k3s cluster with sqlite?

3 replies

smartlocus Sep 12, 2023
Author

@brandond So there is no way to tune that io intensive etcd down?

brandond Sep 12, 2023
Maintainer

No. Etcd is IO intensive, this is know, and is a big part of why things like kine/sqlite exist for lower-end systems. The figures I'm seeing on your chart don't seem excessive though; what sort of disk are you running it on?

smartlocus Sep 18, 2023
Author

@brandond i am using 4 hdd disks. I calculated everything yesterday and found out that it is still writing 3mbits/second x 4 Discs and that is about 12mbits/second x 3600 seconds and that is about 43.2 Giga bytes an hour. That is a whole lot and will wear out the life of my disks from 5 to two years

serhiynovos · 2023-12-22T01:15:47Z

serhiynovos
Dec 22, 2023

I also was surprised what I have on almost empty cluster Harvester + Small RKE2 cluster 3.3 mb/s write. I'm new to K8s and for me it was a shock that it can kill ssd just in 2 - 3 years just for running Kubernetes and if we add workload by other apps, it will be much less

0 replies

asgharcipher · 2024-06-17T20:42:31Z

asgharcipher
Jun 17, 2024

what is you snapshot count ?10,000?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RKE2 cluster high disk IO and causing disk thrashing #4747

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

RKE2 cluster high disk IO and causing disk thrashing #4747

smartlocus Sep 11, 2023

Replies: 3 comments · 3 replies

brandond Sep 11, 2023 Maintainer

smartlocus Sep 12, 2023 Author

brandond Sep 12, 2023 Maintainer

smartlocus Sep 18, 2023 Author

serhiynovos Dec 22, 2023

asgharcipher Jun 17, 2024

smartlocus
Sep 11, 2023

Replies: 3 comments 3 replies

brandond
Sep 11, 2023
Maintainer

smartlocus Sep 12, 2023
Author

brandond Sep 12, 2023
Maintainer

smartlocus Sep 18, 2023
Author

serhiynovos
Dec 22, 2023

asgharcipher
Jun 17, 2024