Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Kubernetes node pool node_quantity state drift when auto-scaler enabled #472

Open
adamjacobmuller opened this issue Mar 13, 2024 · 2 comments
Assignees
Labels
wontfix This will not be worked on

Comments

@adamjacobmuller
Copy link

Hi,

Describe the bug
If I create a vke cluster like:

resource "vultr_kubernetes" "k8" {
    region  = "ewr"
    label   = "vke-test"
    version = "v1.28.2+1"

    node_pools {
        node_quantity = 1
        plan          = "vc2-1c-2gb"
        label         = "vke-nodepool"
        auto_scaler   = true
        min_nodes     = 1
        max_nodes     = 2
    }
} 

Every terraform runs, if my cluster has scaled up from 1 node to 2, terraform sees this and "fixes" node_quantity so that the cluster scales down to 1 node. The autoscaler then sees that 1 node is not enough and scales my cluster back to 2 nodes.

Very disruptive for workflows and workloads which are dependent on the autoscaler.

To Reproduce
Steps to reproduce the behavior:

  1. create cluster with terraform with autoscaler
  2. deploy enough workload to require the cluster to scale up to more than node_quantity node
  3. run terraform plan/apply again
  4. watch cluster scale down to node_quantity then back up to max_nodes (or whatever satisfies your workload)

Expected behavior

if auto_scaler == True:
set node_quantity and max_nodes only
else:
set node_quantity

Additional context

Thank you kindly.

@optik-aper optik-aper self-assigned this Apr 15, 2024
@optik-aper optik-aper changed the title [BUG] - vke - autoscaler and node_quantity conflict [BUG] - Kubernetes node pool node_quantity state drift when auto-scaler enabled Apr 15, 2024
@optik-aper optik-aper added wontfix This will not be worked on and removed bug labels Apr 19, 2024
@optik-aper
Copy link
Member

optik-aper commented Apr 19, 2024

My feeling is that removing the value updates would create a workflow expectation which is too opinionated for a provider. Not only would you have to silence/ignore the quantity updates but, you'd have to ignore the new node_pools[...].nodes elements as well. That goes against the spirit of the provider.

Have you tried using the lifecycle rules to ignore_changes automatically? Here's an example for the two forms that a node pool resource takes in our provider:

resource "vultr_kubernetes" "k8" {
    region  = "ewr"
    label   = "vke-test"
    version = "v1.29.2+1"

    node_pools {
        node_quantity = 3
        plan          = "vc2-1c-2gb"
        label         = "vke-nodepool"
        auto_scaler   = true
        min_nodes     = 1
        max_nodes     = 3
    }

    lifecycle {
      ignore_changes = [node_pools]
    }
} 

resource "vultr_kubernetes_node_pools" "k8-np" {
  cluster_id = vultr_kubernetes.k8.id
  node_quantity = 3
  plan          = "vc2-1c-2gb"
  label         = "vke-nodepool-2"
  # auto_scaler   = true
  # min_nodes     = 1
  # max_nodes     = 3
 
  lifecycle {
    ignore_changes = [node_quantity]  
  }
}

With these settings, any updates that come to node_pools in the vultr_kubernetes resource are automatically added to the terraform state file. Same with the vultr_kubernetes_node_pools value for node_quantity.

@adamjacobmuller
Copy link
Author

Hi @optik-aper,

Thanks for the lifecycle tip, I didn't know you could do that.

Specifically doing ignore_changes=[node_pools[0].node_quantity] is great and solves my immediate issue.

With regards to the original issue, I still think the way this provider handles things is wrong though (and if you look at other providers for kubernetes clusters they seem to agree)

In my mind there are two modes for things:

A) you're using auto_scaler=true in which case you should specify min_nodes,max_nodes (and it should refuse to accept node_quantity)
B) you're using auto_scaler=false in which case you should specify node_quantity (and it should refuse to accept min_nodes,max_nodes)

This behaviour mirrors how GCP (just the one I'm most familiar with) works for example.

Also, keep in mind, this is also exactly how your web UI for example works right now. If I pick autoscale, I specify min/max, if I pick manual, I specify node quantity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants