-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mayastor upgrade 2.5.0 to 2.7.0 ERROR operator_diskpool: Failed to create CRD, error #1709
Comments
Hi @innotecsol, It seems like we have latest crd installed on the cluster. However, diskpool operator getting started is of the older version.. Can you share output for following commands.
|
Hi abhilashshetty04,
|
Sorry, there exists a deploy with mayastor prefix
|
Hi @innotecsol , As suspected. v1beta2(latest crd spec) already exists and diskpool operator is running on older build cc: @niladrih |
@innotecsol -- How did you upgrade from 2.5.0 to 2.7.0? What were the steps that you followed? |
Hi, I downloaded mayastor kubectl plugin from https://github.com/openebs/mayastor/releases/download/v2.7.0/kubectl-mayastor-x86_64-linux-musl.tar.gz and executed
Initially (2.5.0 version) I came from |
@innotecsol -- For versions 2.2.0-2.5.0 (both included), you'd have to add the set flag
Ref: https://openebs.io/docs/user-guides/upgrade#replicated-storage (these instructions are for the openebs/openebs helm chart, the instructions have to adapted for the mayastor/mayastor chart to some degree) I'm going to try to see if your helm release is in a healthy state so that you can try again. Could you share the output of |
Hi niladrih, here the required output:
The status of the package
The pods:
Should i execute the command
Thanks for your support! |
We ran into the same issue from a 2.5.1 upgrade to 2.7.0 using: This leaves the diskpool pod using version 2.5.1 and has the error message. |
For me it looks like all the components are still on 2.5.0 except etcd, it uses docker.io/bitnami/etcd:3.5.6-debian-11-r10 image - here I am not sure, but the statefulset was definitly changed as I had some podaffinity added which was gone. kubectl describe pod -n mayastor | grep 2.5.0 |
Let's try these steps:
If the STATUS says 'deployed', then proceed with the rest, otherwise share the output and any failure logs in the above commands.
Proceed only if upgrade succeeded so far
The CRD issue should resolve itself by this time. |
However the ugprade runs into error
|
It seems the pods were upgraded except for io engine all others run with images of 2.7.0
the upgrade job does not run anymore
|
Hi,
I have redone the upgrade This time the upgrade went through successfully
I have attached the upgrade log. However not all of my replicas come back up.
all nodes are up
all pools are online
I have not enabled partial rebuild yet. How do I get the volumes in a consistent state again? Thanks & BR |
To re-enable partial rebuild: As for the volumes to be consistent, please mount them on a plan and they should rebuild back to the specified number of replicas. What is the current state of your volumes? |
Hi, I am not sure about your answer. All the volumes are mounted and as displayed in my previous entry
e.g. 1107276f-ce8e-4dfd-b2aa-feeaaed7843b says 3 replicas and status degraded
It only shows two replicas for the volume. This is the status after a few days. Do you say that it will resolve after enabling partial rebuild? Thanks & BR |
Can you attach a support bundle? |
mayastor-2024-08-12--18-54-41-UTC-partaa-of-tar.gz I have uploaded the files. As the tar.gz is too big (48MB) I splitted it - you need to Thanks |
Hmm
But it doesn't seem likely that we have actually ran out of metadata pages on the pool, given how few volumes you have. I suspect you have hit a variation of another bug. I can't find the ticket now but it was related to a race condition on the pool. However, I see a lot of EIO errors on the device, might mean the pool disk /dev/sda is not working properly. Otherwise we should reset the pool disk and re-create the pool anew, example:
|
For the record, I had the same problem upgrading 2.5.0 to 2.7.0 and forgetting to disable partial rebuild and stuck with a partially upgraded install that helm didn't like. Following the instructions here fixed it:
It would be nice if the documentation were a bit clearer about the relationship between helm and mayastor upgrade, since the first time through it wasn't clear that mayastor upgrade effectively upgrades the chart, and you should not do helm upgrade normally. |
I changed all the volumes replica count to 2
relabled the cp0
thus the last step does not work
and rescaling the volume to 3 returns with an error: |
I removed the disk-pool via the yaml file kubectl delete -f .... However the diskpool is now in terminating state How do I get the finalizer cleaned up
|
I got the diskpool removed - there was still a single replica pv hanging on a node. By rebooting the node the diskpool was removed. I recreated the diskpool by applying the yaml. Alls seems to be in a consistent state.
|
@innotecsol , Can you please send us the latest support bundle. Preferably after retrying the same operation
|
mayastor-2024-08-22--06-51-37-UTC.tar.gz |
@innotecsol, Scale up operation involves a new replica creation. pool : "pool-adm-cp0" on admin-cp0 node was picked. CreateReplicaRequest has failed due to crc metadata mismatch. Logs:
All the scale operation that failed were attempted on this node only. Hence, crc mismatch error is seen only on pool-adm-cp0 pool. Lets try to manually create a Replica on the affected pool and on the non affected just to see if its a device issue.. Do the following exec on the pod running on admin-cp0 node,
This creates replica on pool-adm-cp0. Lets verify if replica created successfully using
Lets do the same operation on admin-cp1 node. exec on io-engine pod running on admin-cp1
Thoughts: @dsharma-dc , @dsavitskiy , @tiagolobocastro |
adm-cp0 fails:
adm-cp1 works:
replica list: |
@innotecsol , Seems like replica_create issue is specific to this node/pool. You have hit this before..
Do you have any replicas on the pool?
If no, Can we delete the diskpool and recreate it using same spec. Seems like we skipped pool deletion before?? |
where adm-cp0-mayastor.yaml
after deletion
The replicas are back again. Do I deletion the wrong way? Still failing
|
I think your dd command did not work somehow, otherwise the pool should not come up as Online when you relabel your io-engine node. |
@innotecsol did you manage to resolve this? |
Describe the bug
During the upgrade of mayastor from 2.5.0 to 2.7.0 the following error is displayed in the log of
mayastor-operator-diskpool-5cd48746c-46zwb
To Reproduce
Steps to reproduce the behavior:
kubectl mayastor upgrade --skip-single-replica-volume-validation
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
** OS info (please complete the following information):**
talos
version 1.6.7How can I fix the CRD?
The text was updated successfully, but these errors were encountered: