Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coschedule isolate-pipelinerun doesn't work #2318

Open
ruialves7 opened this issue Sep 12, 2024 · 2 comments
Open

coschedule isolate-pipelinerun doesn't work #2318

ruialves7 opened this issue Sep 12, 2024 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@ruialves7
Copy link

ruialves7 commented Sep 12, 2024

Expected Behavior

I have configured my tekton operator with:

  pipeline:
    disable-affinity-assistant: true
    coschedule: isolate-pipelinerun
    enable-api-fields: "alpha"

I'm using the autoscaler on my node group on aws eks is using a ASG multi AZ. My trigger template has this configuration:

(...)
      podTemplate:
        securityContext:
          fsGroup: 65532
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                  - key: pipelines
                    operator: In
                    values:
                      - "pipelines"
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                    - key: app.kubernetes.io/component
                      operator: In
                      values:
                         - "affinity-assistant"
                topologyKey: "kubernetes.io/hostname"  # Add the topologyKey here
    
        nodeSelector:
          pipelines: tom-pipelines
        tolerations:
          - key: dedicated
            operator: Equal
            value: pipelines
            effect: NoSchedule
(...)

Actual Behavior

My pipelinerun has PVC and if I understand correctly the documentation, when we use coschedule isolate-pipelinerun, each pipelinerun should run in a different physical node, but it doesn't happen and retun this error:

pod status "PodScheduled":"False"; message: "0/7 nodes are available: 1 node(s) didn''t match pod anti-affinity rules, 1 node(s) had untolerated taint {app: permanentpod}, 2 node(s) had untolerated taint {app: 24h}, 3 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/7 nodes are available: 1 No preemption victims found for incoming pod, 6 Preemption is not helpful for scheduling."

Steps to Reproduce the Problem

Additional Info

  • Kubernetes version: 1.30

    Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.4-eks-8ccc7ba", GitCommit:"892db4a4e439987d7addade5f9595cadfa06db2e", GitTreeState:"clean", BuildDate:"2023-08-15T16:06:56Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"30+", GitVersion:"v1.30.3-eks-2f46c53", GitCommit:"69ba22bf73c1112e7933fc61b220c00b554a7f66", GitTreeState:"clean", BuildDate:"2024-07-25T04:23:44Z", GoVersion:"go1.22.5", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.27) and server (1.30) exceeds the supported minor version skew of +/-1
  • Tekton Pipeline version: latest-v

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

@ruialves7 ruialves7 added the kind/bug Categorizes issue or PR as related to a bug. label Sep 12, 2024
@khrm
Copy link
Contributor

khrm commented Sep 26, 2024

Can you share what's the PipelineRun that is being generated by Trigger? Is feature flag configmap generated correctly?

@ruialves7
Copy link
Author

Hi,
On my K8s AutoScaler this is the message:
I0927 09:31:51.685584 1 orchestrator.go:565] Pod pipelines/dev-pull-request-runfdpxn-curl-in-progress-pod can't be scheduled on eks-pipelines-node-group-b0c8f03f-1e3a-7266-dd3c-d6b07096b6c3, predicate checking error: node(s) didn't match pod affinity rules; predicateName=InterPodAffinity; reasons: node(s) didn't match pod affinity rules; debugInfo=
The error message on pipeline run pod:
'pod status "PodScheduled":"False"; message: "0/8 nodes are available: 1 node(s) didn''t match pod anti-affinity rules, 2 node(s) had untolerated taint {app: 24h}, 2 node(s) had untolerated taint {app: permanentpod}, 3 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/8 nodes are available: 1 No preemption victims found for incoming pod, 7 Preemption is not helpful for scheduling."'

My tektonConfig:
apiVersion: operator.tekton.dev/v1alpha1
kind: TektonConfig
metadata:
name: config
spec:
profile: basic
config:
nodeSelector:
app: 24h
tolerations:
- key: app
operator: Equal
value: 24h
effect: NoSchedule
targetNamespace: tekton-pipelines
pruner:
resources:
- pipelinerun
- taskrun
keep: 100
schedule: "0 8 * * *"
pipeline:
coschedule: isolate-pipelinerun
disable-affinity-assistant: true
enable-api-fields: alpha

trigger:
options:
deployments:
tekton-triggers-controller:
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- "permanentpod"
nodeSelector:
app: permanentpod
tolerations:
- key: app
operator: Equal
value: permanentpod
effect: NoSchedule
tekton-triggers-webhook:
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- "permanentpod"
nodeSelector:
app: permanentpod
tolerations:
- key: app
operator: Equal
value: permanentpod
effect: NoSchedule

    tekton-triggers-core-interceptors:
      spec:
        template:
            spec:
                affinity:
                    nodeAffinity:
                        requiredDuringSchedulingIgnoredDuringExecution:
                            nodeSelectorTerms:
                                - matchExpressions:
                                    - key: app
                                      operator: In
                                      values:
                                      - "permanentpod"
                nodeSelector:
                    app: permanentpod
                tolerations:
                    - key: app
                      operator: Equal
                      value: permanentpod
                      effect: NoSchedule

chain:
options:
deployments:
tekton-chains-controller:
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- "permanentpod"
nodeSelector:
app: permanentpod
tolerations:
- key: app
operator: Equal
value: permanentpod
effect: NoSchedule
My template:
apiVersion: triggers.tekton.dev/v1beta1
kind: TriggerTemplate
metadata:
name: dev-pull-request
namespace: tekton-pipelines
spec:
params:
- name: gitrevision
description: "Revision to checkout. (branch, tag, sha, ref, etc...)"
- name: linkbuild
description: "The url to update build status"
resourcetemplates:

  • apiVersion: tekton.dev/v1beta1
    kind: PipelineRun
    metadata:
    generateName: dev-pull-request-run
    namespace: pipelines
    spec:
    serviceAccountName: pipelines-sa
    pipelineRef:
    name: dev-pull-request
    taskRunSpecs:
    - pipelineTaskName: curl-in-progress
    computeResources:
    requests:
    cpu: "200m"
    memory: "256Mi"
    limits:
    cpu: "200m"
    memory: "256Mi"

    - pipelineTaskName: git-clone
      computeResources:
        requests:
          cpu: "200m"
          memory: "256Mi"
        limits:
          cpu: "200m"
          memory: "256Mi"
    - pipelineTaskName: maven-run
      computeResources:
        requests:
          cpu: "1700m"
          memory: "6.5Gi"
        limits:
          cpu: "1700m"
          memory: "6.5Gi"
    - pipelineTaskName: curl-in-successful
      computeResources:
        requests:
          cpu: "200m"
          memory: "256Mi"
        limits:
          cpu: "200m"
          memory: "256Mi"
    - pipelineTaskName: curl-in-failed
      computeResources:
        requests:
          cpu: "200m"
          memory: "256Mi"
        limits:
          cpu: "200m"
          memory: "256Mi"
    - pipelineTaskName: curl-in-canceled
      computeResources:
        requests:
          cpu: "200m"
          memory: "256Mi"
        limits:
          cpu: "200m"
          memory: "256Mi"
    

    podTemplate:
    securityContext:
    fsGroup: 65532
    affinity:
    nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchExpressions:
    - key: pipelines
    operator: In
    values:
    - "x-pipelines"
    nodeSelector:
    pipelines: x-pipelines
    tolerations:
    - key: dedicated
    operator: Equal
    value: x-pipelines
    effect: NoSchedule
    workspaces:
    - name: shared-data
    volumeClaimTemplate:
    spec:
    accessModes:
    - ReadWriteMany
    resources:
    requests:
    storage: 10Gi
    storageClassName: efs-sc

    - name: m2-cache
      persistentVolumeClaim:
        claimName: m2-pvc
    - name: ssh-creds
      secret:
          secretName:  (confidential secrets)
    

    params:
    - name: url
    value: (confidential url)
    - name: gitrevision
    value: $(tt.params.gitrevision)
    - name: linkbuild
    value: $(tt.params.linkbuild)

This works if i used AWS EFS on my PVC and disable affinity roles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants