Failed to execute method NodeOps.repair #1370

JBOClara · 2024-07-16T17:58:53Z

What happened?

Cassandra container shows the following error in the logs:

com.datastax.oss.driver.api.core.servererrors.ServerError: Failed to execute method NodeOps.repair

Did you expect to see something different?

/api/v2/repairs status=500 Internal Server Error should return with a 200.

How to reproduce it (as minimally and precisely as possible):

Visible in the cassandra logs

Environment

K8ssandra Operator version:

this error is visible with:

helm ls -A -a -d | grep k8ss
k8ssandra-operator       	k8ssandra-operator	1       	2024-05-22 17:13:25.033002 +0200 CEST   	deployed	k8ssandra-operator-1.16.0                 	1.16.0

and

k8ssandra-operator    	k8ssandra-operator	29      	2024-07-13 17:55:28.039314 +0200 CEST  	deployed	k8ssandra-operator-1.17.0          	1.17.0

k describe po -n k8ssandra-operator | grep "Image:" | sort -u
Alias tip: k describe po -n k8ssandra-operator | grep --color "Image:" | sort -u
    Image:          cr.k8ssandra.io/k8ssandra/cass-management-api:4.1.4
    Image:          cr.k8ssandra.io/k8ssandra/system-logger:v1.21.0
    Image:          docker.io/k8ssandra/medusa:0.19.1
    Image:          docker.io/k8ssandra/medusa:0.21.0
    Image:          docker.io/thelastpickle/cassandra-reaper:3.5.0
    Image:          timberio/vector:0.26.0-alpine
    Image:         bitnami/kubectl:1.29.3
    Image:         busybox:1.28
    Image:         cr.k8ssandra.io/k8ssandra/cass-management-api:4.1.4
    Image:         cr.k8ssandra.io/k8ssandra/cass-operator:v1.21.0
    Image:         cr.k8ssandra.io/k8ssandra/k8ssandra-client:v0.4.0
    Image:         cr.k8ssandra.io/k8ssandra/k8ssandra-operator:v1.17.0
    Image:         docker.io/thelastpickle/cassandra-reaper:3.5.0

Image hash

k describe po -n k8ssandra-operator | grep "Image ID:" | sort -u
    Image ID:       cr.k8ssandra.io/k8ssandra/cass-management-api@sha256:e606bae0bd49e794dffdb508bd461e6734e8bba415ac30f2f58742f647fab38c
    Image ID:       cr.k8ssandra.io/k8ssandra/system-logger@sha256:a25251eb74ca08dc87d5ceb3d22bfcb7ac93c1ec7b673c3ce2f8c7bc32769c1f
    Image ID:       docker.io/k8ssandra/medusa@sha256:1a8e63b9dd49744cf13678584f9558c6452ed1b160de17c149174d6035e053d7
    Image ID:       docker.io/k8ssandra/medusa@sha256:4f2991f88c92441bd6ed5034c4a0cdab94b52e37590183753b2b5786eb25abd9
    Image ID:       docker.io/thelastpickle/cassandra-reaper@sha256:9e84f87108994d63bc76cec25b2cdd2e1f02072585f825fd2ca493b09371fc38
    Image ID:       docker.io/timberio/vector@sha256:13779856a8afe8240a1549208040dec12a50cd9b9d98b577d9327d2c212499d8
    Image ID:      cr.k8ssandra.io/k8ssandra/cass-management-api@sha256:e606bae0bd49e794dffdb508bd461e6734e8bba415ac30f2f58742f647fab38c
    Image ID:      cr.k8ssandra.io/k8ssandra/cass-operator@sha256:d851410079654d6f0acd55d220f647f042d7691dd28a6b3866efcc120c34aeae
    Image ID:      cr.k8ssandra.io/k8ssandra/k8ssandra-client@sha256:4cd4f97e74ea4ce256cb55aa166039471b977c5c4f75e92971d012579146b050
    Image ID:      cr.k8ssandra.io/k8ssandra/k8ssandra-operator@sha256:00cd1e0bab61aba16df7edcfbcdab5aa5c9d6c29d3656d1e467aca312090890d
    Image ID:      docker.io/bitnami/kubectl@sha256:f5fc0d561d9ef931f9ecb2e8b65d93eb92767c57f64897c56a100bfe28102c74
    Image ID:      docker.io/library/busybox@sha256:141c253bc4c3fd0a201d32dc1f493bcf3fff003b6df416dea4f41046e0f37d47
    Image ID:      docker.io/thelastpickle/cassandra-reaper@sha256:9e84f87108994d63bc76cec25b2cdd2e1f02072585f825fd2ca493b09371fc38

Kubernetes version information:

`kubectl version````kubectl version
Alias tip: k version
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.2-eks-db838b0


And:

kubectl version
Alias tip: k version
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4-eks-036c24b


* Kubernetes cluster kind:

EKS

* Manifests:

<details>
  <summary>Manifests</summary>

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
annotations:
eks.amazonaws.com/skip-containers: cassandra,server-system-logger,server-config-init
finalizers:

finalizer.cassandra.datastax.com
generation: 1
labels:
app.kubernetes.io/component: cassandra
app.kubernetes.io/name: k8ssandra-operator
app.kubernetes.io/part-of: k8ssandra
k8ssandra.io/cleaned-up-by: k8ssandracluster-controller
k8ssandra.io/cluster-name: cassandra
k8ssandra.io/cluster-namespace: k8ssandra-operator
name: us-east
namespace: k8ssandra-operator
spec:
additionalServiceConfig:
additionalSeedService: {}
allpodsService: {}
dcService: {}
nodePortService: {}
seedService: {}
clusterName: cassandra
config:
cassandra-env-sh:
additional-jvm-opts:
- -Dcassandra.allow_alter_rf_during_range_movement=true
- -Dcassandra.system_distributed_replication=us-east:3
- -Dcassandra.jmx.authorizer=org.apache.cassandra.auth.jmx.AuthorizationProxy
- -Djava.security.auth.login.config=$CASSANDRA_HOME/conf/cassandra-jaas.config
- -Dcassandra.jmx.remote.login.config=CassandraLogin
- -Dcom.sun.management.jmxremote.authenticate=true
- -Djavax.net.ssl.trustStore=/mnt/client-truststore/truststore
- -Djavax.net.ssl.keyStore=/mnt/client-keystore/keystore
- -Djavax.net.debug=ssl
- -Dcom.sun.management.jmxremote.registry.ssl=true
- -Dcassandra.consistent.rangemovement=false
- -Dcom.sun.management.jmxremote.ssl.need.client.auth=true
- -Dcom.sun.management.jmxremote.registry.ssl=true
- -Dcom.sun.management.jmxremote.ssl=true
- -Dcassandra.allow_new_old_config_keys=true
  cassandra-yaml:
  authenticator: PasswordAuthenticator
  authorizer: CassandraAuthorizer
  auto_bootstrap: true
  auto_snapshot: true
  batch_size_fail_threshold: 1500KiB
  batch_size_warn_threshold: 10KiB
  client_encryption_options:
  enabled: true
  keystore: /mnt/client-keystore/keystore
  keystore_password: READACTED
  optional: false
  require_client_auth: false
  truststore: /mnt/client-truststore/truststore
  truststore_password: READACTED
  concurrent_counter_writes: 64
  concurrent_materialized_view_writes: 64
  concurrent_reads: 64
  concurrent_writes: 64
  counter_cache_size: 50MiB
  materialized_views_enabled: true
  native_transport_port: 9042
  num_tokens: 256
  range_request_timeout: 10000ms
  read_request_timeout: 15000ms
  request_timeout: 20000ms
  role_manager: CassandraRoleManager
  server_encryption_options:
  internode_encryption: all
  keystore: /mnt/server-keystore/keystore
  keystore_password: READACTED
  require_client_auth: false
  truststore: /mnt/server-truststore/truststore
  truststore_password: READACTED
  write_request_timeout: 2000ms
  jvm-server-options:
  initial_heap_size: 4294967296
  jmx-connection-type: local-no-auth
  jmx-port: 7199
  jmx-remote-ssl: true
  max_heap_size: 4294967296
  jvm11-server-options:
  garbage_collector: G1GC
  configBuilderResources: {}
  managementApiAuth: {}
  networking: {}
  podTemplateSpec:
  metadata: {}
  spec:
  containers:
- env:
  - name: LOCAL_JMX
    value: "no"
  - name: MANAGEMENT_API_HEAP_SIZE
    value: "128000000"
  - name: MGMT_API_DISABLE_MCAC
    value: "true"
    livenessProbe:
    failureThreshold: 3
    httpGet:
    path: /api/v0/probes/liveness
    port: 8080
    scheme: HTTP
    initialDelaySeconds: 230
    periodSeconds: 15
    successThreshold: 1
    timeoutSeconds: 10
    name: cassandra
    readinessProbe:
    failureThreshold: 3
    httpGet:
    path: /api/v0/probes/readiness
    port: 8080
    scheme: HTTP
    initialDelaySeconds: 270
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 10
    resources: {}
    volumeMounts:
  - mountPath: /crypto
    name: certs
  - mountPath: /home/cassandra/.cassandra/cqlshrc
    name: cqlsh-config
    subPath: cqlshrc
  - mountPath: /home/cassandra/.cassandra/nodetool-ssl.properties
    name: nodetool-config
    subPath: nodetool-ssl.properties
  - mountPath: /mnt/client-keystore
    name: client-keystore
  - mountPath: /mnt/client-truststore
    name: client-truststore
  - mountPath: /mnt/server-keystore
    name: server-keystore
  - mountPath: /mnt/server-truststore
    name: server-truststore
- name: server-system-logger
  resources: {}
- env:
  - name: MEDUSA_MODE
    value: GRPC
  - name: MEDUSA_TMP_DIR
    value: /var/lib/cassandra
  - name: POD_NAME
    valueFrom:
    fieldRef:
    fieldPath: metadata.name
  - name: CQL_USERNAME
    valueFrom:
    secretKeyRef:
    key: username
    name: cassandra-medusa
  - name: CQL_PASSWORD
    valueFrom:
    secretKeyRef:
    key: password
    name: cassandra-medusa
    image: docker.io/k8ssandra/medusa:0.21.0
    imagePullPolicy: IfNotPresent
    livenessProbe:
    exec:
    command:
    - /bin/grpc_health_probe
    - --addr=:50051
      failureThreshold: 10
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
      name: medusa
      ports:
  - containerPort: 50051
    name: grpc
    protocol: TCP
    readinessProbe:
    exec:
    command:
    - /bin/grpc_health_probe
    - --addr=:50051
      failureThreshold: 10
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
      resources:
      limits:
      memory: 512Mi
      requests:
      cpu: 10m
      memory: 116Mi
      volumeMounts:
  - mountPath: /etc/cassandra
    name: server-config
  - mountPath: /var/lib/cassandra
    name: server-data
  - mountPath: /etc/medusa
    name: cassandra-medusa
  - mountPath: /etc/podinfo
    name: podinfo
  - mountPath: /etc/certificates
    name: certificates
    initContainers:
- command:
  - sysctl
  - -w
  - vm.max_map_count=1048575
    image: busybox:1.28
    name: sysctl
    resources: {}
    securityContext:
    privileged: true
- name: server-config-init
  resources: {}
- env:
  - name: MEDUSA_MODE
    value: RESTORE
  - name: MEDUSA_TMP_DIR
    value: /var/lib/cassandra
  - name: POD_NAME
    valueFrom:
    fieldRef:
    fieldPath: metadata.name
  - name: CQL_USERNAME
    valueFrom:
    secretKeyRef:
    key: username
    name: cassandra-medusa
  - name: CQL_PASSWORD
    valueFrom:
    secretKeyRef:
    key: password
    name: cassandra-medusa
    image: docker.io/k8ssandra/medusa:0.21.0
    imagePullPolicy: IfNotPresent
    name: medusa-restore
    resources:
    limits:
    memory: 8Gi
    requests:
    cpu: 100m
    memory: 100Mi
    volumeMounts:
  - mountPath: /etc/cassandra
    name: server-config
  - mountPath: /var/lib/cassandra
    name: server-data
  - mountPath: /etc/medusa
    name: cassandra-medusa
  - mountPath: /etc/podinfo
    name: podinfo
  - mountPath: /etc/certificates
    name: certificates
    volumes:
- name: certs
  secret:
  secretName: cassandra-jks-keystore
- configMap:
  name: cqlsh-config
  name: cqlsh-config
- configMap:
  name: nodetool-config
  name: nodetool-config
- name: client-keystore
  secret:
  items:
  - key: keystore.jks
    path: keystore
    secretName: cassandra-jks-keystore
- name: client-truststore
  secret:
  items:
  - key: truststore.jks
    path: truststore
    secretName: cassandra-jks-keystore
- name: server-keystore
  secret:
  items:
  - key: keystore.jks
    path: keystore
    secretName: cassandra-jks-keystore
- name: server-truststore
  secret:
  items:
  - key: truststore.jks
    path: truststore
    secretName: cassandra-jks-keystore
- configMap:
  name: cassandra-medusa
  name: cassandra-medusa
- downwardAPI:
  items:
  - fieldRef:
    fieldPath: metadata.labels
    path: labels
    name: podinfo
- name: certificates
  secret:
  secretName: medusa-certificates
  racks:
name: 1a
nodeAffinityLabels:
topology.kubernetes.io/zone: us-east-1a
name: 1d
nodeAffinityLabels:
topology.kubernetes.io/zone: us-east-1b
name: 1c
nodeAffinityLabels:
topology.kubernetes.io/zone: us-east-1c
resources:
limits:
memory: 9Gi
requests:
cpu: "1"
memory: 9Gi
serverType: cassandra
serverVersion: 4.1.4
size: 3
storageConfig:
additionalVolumes:
- mountPath: /etc/vector
  name: vector-config
  volumeSource:
  configMap:
  name: cassandra-us-east-cass-vector
- mountPath: /opt/management-api/configs
  name: metrics-agent-config
  volumeSource:
  configMap:
  items:
  - key: metrics-collector.yaml
    path: metrics-collector.yaml
    name: cassandra-us-east-metrics-agent-config
    cassandraDataVolumeClaimSpec:
    accessModes:
- ReadWriteOnce
  resources:
  requests:
  storage: 300Gi
  storageClassName: ebs-xfs-sc
  superuserSecretName: cassandra-superuser
  systemLoggerResources:
  limits:
  memory: 512Mi
  requests:
  cpu: 100m
  memory: 128Mi
  users:
secretName: cassandra-reaper
superuser: true
secretName: cassandra-medusa
superuser: true

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
annotations:
config.kubernetes.io/origin: |
path: ../../base/k8ssandra-encrypted.yaml
k8ssandra.io/initial-system-replication: '{"us-east":3}'
finalizers:

k8ssandracluster.k8ssandra.io/finalizer
generation: 5
name: cassandra
namespace: k8ssandra-operator
spec:
auth: true
cassandra:
clientEncryptionStores:
keystorePasswordSecretRef:
name: jks-password
keystoreSecretRef:
key: keystore.jks
name: cassandra-jks-keystore
truststorePasswordSecretRef:
name: jks-password
truststoreSecretRef:
key: truststore.jks
name: cassandra-jks-keystore
config:
cassandraYaml:
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
auto_bootstrap: true
auto_snapshot: true
batch_size_fail_threshold: 1500KiB
batch_size_warn_threshold: 10KiB
client_encryption_options:
enabled: true
optional: false
require_client_auth: false
concurrent_counter_writes: 64
concurrent_materialized_view_writes: 64
concurrent_reads: 64
concurrent_writes: 64
counter_cache_size: 50MiB
materialized_views_enabled: true
native_transport_port: 9042
num_tokens: 256
range_request_timeout: 10000ms
read_request_timeout: 15000ms
request_timeout: 20000ms
server_encryption_options:
internode_encryption: all
require_client_auth: false
write_request_timeout: 2000ms
jvmOptions:
additionalOptions:
- -Djavax.net.debug=ssl
- -Dcom.sun.management.jmxremote.registry.ssl=true
- -Dcassandra.consistent.rangemovement=false
- -Dcom.sun.management.jmxremote.ssl.need.client.auth=true
- -Dcom.sun.management.jmxremote.registry.ssl=true
- -Dcom.sun.management.jmxremote.ssl=true
- -Dcassandra.allow_new_old_config_keys=true
  gc: G1GC
  heap_initial_size: 4Gi
  heap_max_size: 4Gi
  jmx_connection_type: local-no-auth
  jmx_port: 7199
  jmx_remote_ssl: true
  containers:
livenessProbe:
failureThreshold: 3
httpGet:
path: /api/v0/probes/liveness
port: 8080
scheme: HTTP
initialDelaySeconds: 230
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 10
name: cassandra
readinessProbe:
failureThreshold: 3
httpGet:
path: /api/v0/probes/readiness
port: 8080
scheme: HTTP
initialDelaySeconds: 270
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
volumeMounts:
- mountPath: /crypto
  name: certs
- mountPath: /home/cassandra/.cassandra/cqlshrc
  name: cqlsh-config
  subPath: cqlshrc
- mountPath: /home/cassandra/.cassandra/nodetool-ssl.properties
  name: nodetool-config
  subPath: nodetool-ssl.properties
  datacenters:
initContainers:
- command:
  - sysctl
  - -w
  - vm.max_map_count=1048575
    image: busybox:1.28
    name: sysctl
    securityContext:
    privileged: true
    metadata:
    name: us-east
    perNodeConfigInitContainerImage: mikefarah/yq:4
    racks:
- name: 1a
  nodeAffinityLabels:
  topology.kubernetes.io/zone: us-east-1a
- name: 1d
  nodeAffinityLabels:
  topology.kubernetes.io/zone: us-east-1b
- name: 1c
  nodeAffinityLabels:
  topology.kubernetes.io/zone: us-east-1c
  resources:
  limits:
  memory: 9Gi
  requests:
  cpu: 1
  memory: 9Gi
  size: 3
  stopped: false
  extraVolumes:
  volumes:
- name: certs
  secret:
  secretName: cassandra-jks-keystore
- configMap:
  name: cqlsh-config
  name: cqlsh-config
- configMap:
  name: nodetool-config
  name: nodetool-config
  metadata:
  annotations:
  eks.amazonaws.com/skip-containers: cassandra,server-system-logger,server-config-init
  mgmtAPIHeap: 128M
  networking:
  hostNetwork: false
  perNodeConfigInitContainerImage: mikefarah/yq:4
  serverEncryptionStores:
  keystorePasswordSecretRef:
  name: jks-password
  keystoreSecretRef:
  key: keystore.jks
  name: cassandra-jks-keystore
  truststorePasswordSecretRef:
  name: jks-password
  truststoreSecretRef:
  key: truststore.jks
  name: cassandra-jks-keystore
  serverType: cassandra
  serverVersion: 4.1.4
  softPodAntiAffinity: false
  storageConfig:
  cassandraDataVolumeClaimSpec:
  accessModes:
  - ReadWriteOnce
    resources:
    requests:
    storage: 300Gi
    storageClassName: ebs-xfs-sc
    telemetry:
    mcac:
    enabled: false
    prometheus:
    enabled: true
    vector:
    components:
    sinks:
    - config: |
      target = "stdout"
      [sinks.console_output.encoding]
      codec = "json"
      inputs:
      - cassandra_metrics
        name: console_output
        type: console
        enabled: true
        resources:
        limits:
        memory: 512Mi
        requests:
        cpu: 100m
        memory: 128Mi
        scrapeInterval: 30s
        medusa:
        certificatesSecretRef:
        name: medusa-certificates
        containerImage:
        name: medusa
        registry: docker.io
        repository: k8ssandra
        tag: 0.21.0
        containerResources:
        limits:
        memory: 512Mi
        requests:
        cpu: 10m
        memory: 116Mi
        storageProperties:
        bucketName: dow-backups
        concurrentTransfers: 10
        credentialsType: role-based
        maxBackupAge: 0
        maxBackupCount: 0
        multiPartUploadThreshold: 104857600
        prefix: cassandra-tests
        region: us-east-1
        secure: true
        storageProvider: s3
        storageSecretRef:
        name: ""
        transferMaxBandwidth: 90MB/s
        reaper:
        ServiceAccountName: default
        autoScheduling:
        enabled: true
        initialDelayPeriod: PT15S
        percentUnrepairedThreshold: 10
        periodBetweenPolls: PT10M
        repairType: AUTO
        scheduleSpreadPeriod: PT6H
        timeBeforeFirstSchedule: PT5M
        containerImage:
        name: cassandra-reaper
        repository: thelastpickle
        tag: 3.6.0
        deploymentMode: SINGLE
        heapSize: 2Gi
        httpManagement:
        enabled: true
        keyspace: reaper_db
        secretsProvider: internal
        telemetry:
        cassandra:
        endpoint:
        address: 0.0.0.0
        mcac:
        enabled: false
        prometheus:
        enabled: true
        vector:
        enabled: true
        resources:
        limits:
        cpu: 100m
        memory: 512Mi
        requests:
        cpu: 100m
        memory: 128Mi
        secretsProvider: internal


</details>

* K8ssandra Operator Logs:

INFO [nioEventLoopGroup-2-2] 2024-07-16 12:31:35,347 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v2/repairs status=500 Internal Server Error
INFO [nioEventLoopGroup-2-1] 2024-07-16 12:31:38,541 Cli.java:663 - address=/10.210.20.219:56784 url=/api/v0/probes/readiness status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:31:43,538 Cli.java:663 - address=/10.210.20.219:51656 url=/api/v0/probes/liveness status=200 OK
INFO [nioEventLoopGroup-2-1] 2024-07-16 12:31:48,540 Cli.java:663 - address=/10.210.20.219:51666 url=/api/v0/probes/readiness status=200 OK
INFO [nioEventLoopGroup-2-1] 2024-07-16 12:31:58,539 Cli.java:663 - address=/10.210.20.219:48066 url=/api/v0/probes/liveness status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:31:58,540 Cli.java:663 - address=/10.210.20.219:48068 url=/api/v0/probes/readiness status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:02,818 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:02,820 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:02,909 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v1/ops/tables/compactions status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:05,371 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:05,373 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:05,466 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v1/ops/tables/compactions status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:08,541 Cli.java:663 - address=/10.210.20.219:55514 url=/api/v0/probes/readiness status=200 OK
INFO [nioEventLoopGroup-2-1] 2024-07-16 12:32:13,538 Cli.java:663 - address=/10.210.20.219:58392 url=/api/v0/probes/liveness status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:18,540 Cli.java:663 - address=/10.210.20.219:58402 url=/api/v0/probes/readiness status=200 OK
INFO [nioEventLoopGroup-2-1] 2024-07-16 12:32:28,539 Cli.java:663 - address=/10.210.20.219:52776 url=/api/v0/probes/liveness status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:28,540 Cli.java:663 - address=/10.210.20.219:52790 url=/api/v0/probes/readiness status=200 OK
INFO [nioEventLoopGroup-2-1] 2024-07-16 12:32:38,541 Cli.java:663 - address=/10.210.20.219:39932 url=/api/v0/probes/readiness status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:40,870 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:40,873 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:40,989 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v1/ops/tables/compactions status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:41,561 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:41,564 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:41,657 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v1/ops/tables/compactions status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:43,539 Cli.java:663 - address=/10.210.20.219:44100 url=/api/v0/probes/liveness status=200 OK
INFO [nioEventLoopGroup-2-1] 2024-07-16 12:32:48,540 Cli.java:663 - address=/10.210.20.219:44112 url=/api/v0/probes/readiness status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:32:58,538 Cli.java:663 - address=/10.210.20.219:36508 url=/api/v0/probes/liveness status=200 OK
INFO [nioEventLoopGroup-2-1] 2024-07-16 12:32:58,541 Cli.java:663 - address=/10.210.20.219:36520 url=/api/v0/probes/readiness status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:08,541 Cli.java:663 - address=/10.210.20.219:52446 url=/api/v0/probes/readiness status=200 OK
INFO [nioEventLoopGroup-2-1] 2024-07-16 12:33:13,538 Cli.java:663 - address=/10.210.20.219:52002 url=/api/v0/probes/liveness status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:17,148 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:17,150 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:17,152 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:17,161 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:17,162 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:17,254 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v1/ops/tables/compactions status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:18,537 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:18,539 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:18,540 Cli.java:663 - address=/10.210.20.219:52018 url=/api/v0/probes/readiness status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:18,643 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v1/ops/tables/compactions status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:26,184 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
INFO [nioEventLoopGroup-2-2] 2024-07-16 12:33:26,186 Cli.java:663 - address=/10.210.18.172:49500 url=/api/v0/metadata/endpoints status=200 OK
com.datastax.oss.driver.api.core.servererrors.ServerError: Failed to execute method NodeOps.repair
at com.datastax.oss.driver.api.core.servererrors.ServerError.copy(ServerError.java:54)
at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
at com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54)
at com.datastax.mgmtapi.CqlService.executePreparedStatement(CqlService.java:57)
at com.datastax.mgmtapi.resources.v2.RepairResourcesV2.lambda$repair$0(RepairResourcesV2.java:80)
at com.datastax.mgmtapi.resources.common.BaseResources.handle(BaseResources.java:67)
at com.datastax.mgmtapi.resources.v2.RepairResourcesV2.repair(RepairResourcesV2.java:71)
at jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:170)
at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:130)
at org.jboss.resteasy.core.ResourceMethodInvoker.internalInvokeOnTarget(ResourceMethodInvoker.java:643)
at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTargetAfterFilter(ResourceMethodInvoker.java:507)
at org.jboss.resteasy.core.ResourceMethodInvoker.lambda$invokeOnTarget$2(ResourceMethodInvoker.java:457)
at org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:364)
at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:459)
at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:419)
at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:393)
at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:68)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:492)
at org.jboss.resteasy.core.SynchronousDispatcher.lambda$invoke$4(SynchronousDispatcher.java:261)
at org.jboss.resteasy.core.SynchronousDispatcher.lambda$preprocess$0(SynchronousDispatcher.java:161)
at org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:364)
at org.jboss.resteasy.core.SynchronousDispatcher.preprocess(SynchronousDispatcher.java:164)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:247)
at org.jboss.resteasy.plugins.server.netty.RequestDispatcher.service(RequestDispatcher.java:86)
at org.jboss.resteasy.plugins.server.netty.RequestHandler.channelRead0(RequestHandler.java:51)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:61)
at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:370)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)


**Anything else we need to know?**:

No



┆Issue is synchronized with this [Jira Story](https://datastax.jira.com/browse/K8OP-9) by [Unito](https://www.unito.io)
┆Issue Number: K8OP-9

The text was updated successfully, but these errors were encountered:

iAlex97 · 2024-07-31T11:10:16Z

We have also encountered the same issue when enabling autoScheduling for Reaper. Further checking the logs from mgmt-api, I think this is due to Reaper having an invalid combination of default parameters (only happens for Cassandra 4.x) when setting up automatic schedules. The error which led me to think this:

INFO  [epollEventLoopGroup-5-3] 2024-07-31 08:44:16,274 RpcMethod41x.java:138 - Failed to execute method NodeOps.repair
java.lang.reflect.InvocationTargetException: null
	at jdk.internal.reflect.GeneratedMethodAccessor47.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at com.datastax.mgmtapi.rpc.RpcMethod41x.execute(RpcMethod41x.java:130)
	at com.datastax.mgmtapi.rpc.RpcMethod41x.execute(RpcMethod41x.java:33)
	at com.datastax.mgmtapi.interceptors.QueryHandlerInterceptor.lambda$handle$1(QueryHandlerInterceptor.java:120)
	at com.datastax.mgmtapi.shims.CassandraAPI.handleRpcResult(CassandraAPI.java:73)
	at com.datastax.mgmtapi.interceptors.QueryHandlerInterceptor.handle(QueryHandlerInterceptor.java:120)
	at com.datastax.mgmtapi.interceptors.QueryHandlerInterceptor.intercept(QueryHandlerInterceptor.java:80)
	at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java)
	at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
	at org.apache.cassandra.transport.Message$Request.execute(Message.java:255)
	<redacted>
Caused by: java.io.IOException: Invalid repair combination. Incremental repair if Parallelism is not set
	at com.datastax.mgmtapi.NodeOpsProvider.repair(NodeOpsProvider.java:824)
	... 43 common frames omitted

K8ssandraCluster CRD has autoScheduling.repairType set as AUTO which for Cassandra 4.x will behave as INCREMENTAL and will setup the schedules accordingly.

From the Reaper docs we understand that for an Incremental repair the only allowed value for repairParallelism is PARALLEL:

Sets the default repair type unless specifically defined for each run. Note that this is only supported with the PARALLEL repairParallelism setting. For more details in incremental repair, please refer to the following article.http://www.datastax.com/dev/blog/more-efficient-repairs

This is checked by the management-api here which indeed throws the error that I'm seeing.

Run exec into a reaper pod to check it's configuration, we see that its /etc/cassandra-reaper/config/cassandra-reaper.yml sets repairParallelism to the value of an env variable called REAPER_REPAIR_PARALELLISM. The value for that variable is

REAPER_REPAIR_PARALELLISM=DATACENTER_AWARE

We can further check this by looking at the reaper tables inside cassandra:

prod-superuser@cqlsh> use reaper_db;
prod-superuser@cqlsh:reaper_db> select * from repair_schedule_v1;

 id                                   | adaptive | creation_time                   | days_between | intensity | last_run                             | next_activation                 | owner           | pause_time                      | percent_unrepaired_threshold | repair_parallelism | repair_unit_id                       | run_history | segment_count | segment_count_per_node | state
--------------------------------------+----------+---------------------------------+--------------+-----------+--------------------------------------+---------------------------------+-----------------+---------------------------------+------------------------------+--------------------+--------------------------------------+-------------+---------------+------------------------+--------
 b3cb2180-4e7c-11ef-9f1c-4d0488525d6c |    False | 2024-07-30 14:04:57.112000+0000 |            7 |       0.9 | 2db1ccf0-4e97-11ef-92a3-c328b392dd6d | 2024-08-06 17:08:38.033000+0000 | auto-scheduling | 2024-07-30 14:10:43.136000+0000 |                           10 |        dc_parallel | b3c9e900-4e7c-11ef-9f1c-4d0488525d6c |        null |          null |                     64 | ACTIVE
 b3d533a0-4e7c-11ef-9f1c-4d0488525d6c |    False | 2024-07-30 14:04:57.178000+0000 |            7 |       0.9 |                                 null | 2024-07-30 20:09:57.150000+0000 | auto-scheduling |                            null |                           10 |        dc_parallel | b3d337d0-4e7c-11ef-9f1c-4d0488525d6c |        null |          null |                     64 | ACTIVE

which confirms that the default parallelism was set to dc_parallel or DATACENTER_AWARE.

My confusion comes from where this variable is set. From my limited research is not specified in the reaper deployment, it is not inside the Dockerfile, nor can it be configured from CRD.

For possible workarounds, I see the following:

set autoScheduling.repairType inside the CRD to REGULAR, because ADAPTIVE is only recommended for cassandra 3.x
manually edit the entries from the schedules table and set repair_parallelism to parallel
manually edit the reaper deployment like so:
1. Initial deployment of reaper with "wrong" config
2. Scale down the deployment to 0
3. Edit the deployment and set the env variable REAPER_REPAIR_PARALELLISM=PARALLEL
4. Delete reaper_db keyspace
5. Scale up reaper deployment which should re-run migrations and populate schedules table with proper parallelism value

@adejanovski what do you think?

JBOClara · 2024-10-16T18:57:15Z

Hello @adejanovski

Can you tell us if there is enough information?

JBOClara added the bug Something isn't working label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to execute method NodeOps.repair #1370

Failed to execute method NodeOps.repair #1370

JBOClara commented Jul 16, 2024 •

edited by sync-by-unito bot

Loading

iAlex97 commented Jul 31, 2024

JBOClara commented Oct 16, 2024

Failed to execute method NodeOps.repair #1370

Failed to execute method NodeOps.repair #1370

Comments

JBOClara commented Jul 16, 2024 • edited by sync-by-unito bot Loading

iAlex97 commented Jul 31, 2024

JBOClara commented Oct 16, 2024

JBOClara commented Jul 16, 2024 •

edited by sync-by-unito bot

Loading