Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E migration cases don't support k8s cluster switch correctly #8292

Open
blackpiglet opened this issue Oct 14, 2024 · 1 comment
Open

E2E migration cases don't support k8s cluster switch correctly #8292

blackpiglet opened this issue Oct 14, 2024 · 1 comment
Assignees
Labels
Bug E2E Tests End to end test

Comments

@blackpiglet
Copy link
Contributor

blackpiglet commented Oct 14, 2024

What steps did you take and what happened:

Prepare two k8s clusters.
Run the Velero E2E test cases(including the migration cases) on those clusters.
Take this CLI as an example:

CLOUD_PROVIDER=azure \
VELERO_SERVER_DEBUG_MODE=true \
DEFAULT_CLUSTER=nightly-test-1728875035788-azure-default-6-default \
STANDBY_CLUSTER=nightly-test-1728875035788-azure-standby-6-standby \
DEFAULT_CLUSTER_NAME=nightly-test-1728875035788-azure-default-6 \
STANDBY_CLUSTER_NAME=nightly-test-1728875035788-azure-standby-6 \
PLUGINS=gcr.io/velero-gcp/velero-plugin-for-microsoft-azure:main \
CREDS_FILE=/velero/workspace/E2E-debug/azure-credential BSL_CONFIG=resourceGroup=velero-nightly,storageAccount=veleronightly,subscriptionId=2261f3e7-d159-48fe-95a3-0e6a96e11159 \
BSL_BUCKET=velero-e2e-testing-1728875035788 \
ADDITIONAL_BSL_PLUGINS=gcr.io/velero-gcp/velero-plugin-for-aws:main \
ADDITIONAL_OBJECT_STORE_PROVIDER=aws ADDITIONAL_BSL_CONFIG=region=minio,s3ForcePathStyle=true,s3Url=http://minio.minio.svc:9000/ \
ADDITIONAL_BSL_BUCKET=velero-e2e-testing ADDITIONAL_BSL_PREFIX=additional \
ADDITIONAL_CREDS_FILE=/velero/workspace/E2E-debug/minio-credential-additional \
VELERO_IMAGE=gcr.io/velero-gcp/velero:main \
RESTORE_HELPER_IMAGE=gcr.io/velero-gcp/velero-restore-helper:main VERSION=main \
STANDBY_CLUSTER_CLOUD_PROVIDER=azure \
STANDBY_CLUSTER_OBJECT_STORE_PROVIDER=aws \
STANDBY_CLUSTER_PLUGINS=gcr.io/velero-gcp/velero-plugin-for-microsoft-azure:main \
DISABLE_INFORMER_CACHE=true \
VERSION=main \
REGISTRY_CREDENTIAL_FILE=/root/.docker/config.json \
GINKGO_LABELS=(!LongTime) \
KIBISHII_DIRECTORY=/velero/workspace/E2E-debug/e2e/distributed-data-generator/kubernetes/yaml/ \
make test-e2e

The E2E failed randomly. The error always happened after running a migration case.

What did you expect to happen:
The E2E should run successfully.

  [FAILED] in [It] - /velero/workspace/E2E-debug/e2e/velero/test/e2e/backups/deletion.go:76 @ 10/14/24 03:36:10.91

Test case failed and fail fast is enabled. Skip resource clean up.

• [FAILED] [21.079 seconds]

Velero tests of snapshot backup deletion when kibishii is the sample workload [It] Deleted backups are deleted from object storage and backups deleted from object storage can be deleted locally [Backups, Deletion, Snapshot, SkipVanillaZfs]

/velero/workspace/E2E-debug/e2e/velero/test/e2e/backups/deletion.go:75



  [FAILED] Failed to run backup deletion test

  Expected success, but got an error:

      <*errors.withStack | 0xc0008302b8>: 

      Failed to install and prepare data for kibishii backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0: Failed to install Kibishii workload: failed to install kibishii, stderr=# Warning: 'bases' is deprecated. Please use 'resources' instead. Run 'kustomize edit fix' to update your Kustomization automatically.

      Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found

      Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found

      Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found

      Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found

      Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found

      Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found

      Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found

      Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found

      Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found

      : exit status 1

      {

          error: <*errors.withMessage | 0xc00090a380>{

              cause: <*errors.withStack | 0xc000830258>{

                  error: <*errors.withMessage | 0xc00090a360>{

                      cause: <*errors.withStack | 0xc000830228>{

                          error: <*errors.withMessage | 0xc00090a340>{

                              cause: <*exec.ExitError | 0xc00090a320>{

                                  ProcessState: {

                                      pid: 23290,

                                      status: 256,

                                      rusage: {

                                          Utime: {Sec: ..., Usec: ...},

                                          Stime: {Sec: ..., Usec: ...},

                                          Maxrss: 176904,

                                          Ixrss: 0,

                                          Idrss: 0,

                                          Isrss: 0,

                                          Minflt: 41783,

                                          Majflt: 0,

                                          Nswap: 0,

                                          Inblock: 0,

                                          Oublock: 133816,

                                          Msgsnd: 0,

                                          Msgrcv: 0,

                                          Nsignals: 0,

                                          Nvcsw: 11355,

                                          Nivcsw: 5676,

                                      },

                                  },

                                  Stderr: nil,

                              },

                              msg: "failed to install kibishii, stderr=# Warning: 'bases' is deprecated. Please use 'resources' instead. Run 'kustomize edit fix' to update your Kustomization automatically.\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\n",

                          },

                          stack: [0x1e935dd, 0x1e949e5, 0x1e98650, 0x1e97aa5, 0x89a393, 0x8ae54d, 0x47b261],

                      },

                      msg: "Failed to install Kibishii worklo...



  Gomega truncated this representation as it exceeds 'format.MaxLength'.

  Consider having the object provide a custom 'GomegaStringer' representation

  or adjust the parameters in Gomega's 'format' package.

The following information will help us better understand what's going on:
This error happened due to the current E2E test cases having multiple ways to communicate with the Kubernetes API server.

  • The cases use kubectl CLI to switch cluster contexts, and the cases velero CLI to create and delete the backup and restore resources.
  • The cases also use the client-go to talk to the Kubernetes API server to create k8s resources.

The migration cases use the kubectl CLI to switch the k8s clusters. That change modifies the kubeconfig. All the CLI commands depending on the ~/.kube/config can take effect.
But the client-go cannot share the same k8s cluster switch result.

The test case failure happened because the kubectl switched to the standby cluster to install the Velero, but the client-go created the backup target namespaces on the active cluster. As a result the following procedure on the standby cluster failed to find the created namespaces.

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:

Environment:

  • Velero version (use velero version):
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@blackpiglet
Copy link
Contributor Author

The #8293 is a temporary workaround for the error.
For the long term, we need a solution to align all the communicating methods with the Kubernetes API server to control the connected k8s cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug E2E Tests End to end test
Projects
None yet
Development

No branches or pull requests

1 participant