Microk8s, NFS CSI a...
 
Notifications
Clear all

Microk8s, NFS CSI and snapshots

11 Posts
4 Users
0 Likes
529 Views
(@mscrocdile)
Posts: 33
Eminent Member
Topic starter
 

Hi, 

I've installed NFS and NFS CSI with the help of this https://microk8s.io/docs/how-to-nfs

Works fine. Then I've installed K10 and wanted to make snapshots.

But it refused to work. 

I found externalSnapshotter in helm chart values so i enabled it. I also created VolumeSnapshotClass. 

Do i have to install https://github.com/kubernetes-csi/external-snapshotter ? That tutorial seems to me very complex so i'm not exactly sure what to do.

I don't understand it at all.

 

Latest error from k10 is 

cause: '{"cause":{"cause":{"cause":{"cause":{"message":"Failed to create
      snapshot: failed to take snapshot of the volume
      10.0.0.6#nfsserver/pv#pvc-c5cec9f6-1555-442c-bd66-c984cad71d16##: \"rpc
      error: code = Internal desc = failed to create archive for snapshot: exit
      status 1:
 
Posted : 22/02/2024 2:01 am
Brandon Lee
(@brandon-lee)
Posts: 543
Member Admin
 

@mscrocdile This looks really interesting. I have not played around with an external snapshotter like this before. However, definitely going to put on my list of things to try. @t3hbeowulf any insights with this or the error?

 
Posted : 22/02/2024 11:32 am
(@t3hbeowulf)
Posts: 27
Eminent Member
 

That is interesting. I haven't used external-snapshotter yet either but based on a first glance, I think you will need to install it to get that to feature to work. The directive in K10 is likely expecting external-snapshotter to be there and configured when it is called. 

Do the container logs for K10 or cluster events reveal anything interesting? 

I'll have to give this a try when I get home. Maybe external-snapshotter just needs a more succinct "getting started" guide.

 
Posted : 22/02/2024 11:48 am
(@termv)
Posts: 17
Eminent Member
 

Hi @mscrocdile 

The csi-driver-nfs project / chart contains everything you need to create snapshots.

I have the project installed on k3s via the manifests rather than the helm chart, but the helm chart should be fine as per the microk8s tutorial. I had never tried the snapshot feature and just now I followed this example tutorial. It worked fine and I was left with this on my NFS server:

pvc-b82e7bc7-c10a-424a-ba75-156a4ba317d6/outfile

snapshot-5d3b1d3d-eda7-4490-8da8-8d2d866f1833/pvc-b82e7bc7-c10a-424a-ba75-156a4ba317d6.tar.gz

Make sure you're checking your NFS server logs for permission errors. Also check to see if the PVC directory was actually created on your NFS server before snapshotting it.

Since you're running on microk8s, make sure that you have the kubeletDir set correctly in your values file:

kubeletDir: /var/snap/microk8s/common/var/lib/kubelet
 
Posted : 22/02/2024 5:47 pm
(@mscrocdile)
Posts: 33
Eminent Member
Topic starter
 

@termv Hello, you are right, i noticed that yesterday checking another sources. It was important to me to know that csi-driver-nfs contains all i need.

As i reinstalled helm chart many times once i forgot to add kubeletDir parameter and spend lot of time at that. So that was also one of problems.

Honestly, i'm not sure what was the next problem because k10 did not work even after i added that kubeletDir.

But once i created snapshot via kubectl then suddenly k10 started to make snapshots without problem.

What i just noticed is that if i create volumesnapshot with the source of pvc from another namespace then it is never ready to use. Maybe wrong yaml.

Thank you all for your help. I can keep going.

 

 
Posted : 23/02/2024 1:40 am
(@mscrocdile)
Posts: 33
Eminent Member
Topic starter
 

I just mention here another sub problem related to this.

In K10 if i restore snapshot into new namespace it works.

But if i delete namespace and restore from K10/Applications/Removed then i will always get error.

 

k logs csi-nfs-controller-d96ccb59c-b7cxx -n kube-system

I0223 07:49:22.750254       1 controller.go:1366] provision "rumburak-novy/data-my-postgresql-0" class "nfs-csi": started
I0223 07:49:22.759026       1 event.go:364] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"rumburak-novy", Name:"data-my-postgresql-0", UID:"c36787bf-9cef-4c50-b199-5cd9b2aeb215", APIVersion:"v1", ResourceVersion:"6262566", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "rumburak-novy/data-my-postgresql-0"
W0223 07:49:22.775566       1 controller.go:1202] requested volume size 8589934592 is greater than the size 0 for the source snapshot k10-csi-snap-xvsqjrc55fx6qmdt. Volume plugin needs to handle volume expansion.
I0223 07:49:23.015094       1 controller.go:1075] Final error received, removing PVC c36787bf-9cef-4c50-b199-5cd9b2aeb215 from claims in progress
W0223 07:49:23.017682       1 controller.go:934] Retrying syncing claim "c36787bf-9cef-4c50-b199-5cd9b2aeb215", failure 6
E0223 07:49:23.017769       1 controller.go:957] error syncing claim "c36787bf-9cef-4c50-b199-5cd9b2aeb215": failed to provision volume with StorageClass "nfs-csi": rpc error: code = Internal desc = failed to copy volume for snapshot: exit status 2: tar (child): /tmp/snapshot-29e6025c-b9f0-431a-8b82-76814cf3ccb5/snapshot-29e6025c-b9f0-431a-8b82-76814cf3ccb5/pvc-f2967ba0-5664-4342-bab2-fbd3243e5011.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
I0223 07:49:23.015448       1 event.go:364] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"rumburak-novy", Name:"data-my-postgresql-0", UID:"c36787bf-9cef-4c50-b199-5cd9b2aeb215", APIVersion:"v1", ResourceVersion:"6262566", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "nfs-csi": rpc error: code = Internal desc = failed to copy volume for snapshot: exit status 2: tar (child): /tmp/snapshot-29e6025c-b9f0-431a-8b82-76814cf3ccb5/snapshot-29e6025c-b9f0-431a-8b82-76814cf3ccb5/pvc-f2967ba0-5664-4342-bab2-fbd3243e5011.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now

If i find something i will put it here.

 

 
Posted : 23/02/2024 3:04 am
(@mscrocdile)
Posts: 33
Eminent Member
Topic starter
 

I've added "allowVolumeExpansion: true" into storage class, did everything again (with different namespace) but got different error.

 

cause:
    cause:
      cause:
        cause:
          cause:
            message: "Specified 1 replicas and only 0 are ready: could not get
              StatefulSet{Namespace: bramborak, Name: my-postgresql}: client
              rate limiter Wait returned an error: context deadline exceeded"
          fields:
            - name: statefulset
              value: my-postgresql
          file: kasten.io/k10/kio/kube/workload/workload.go:47
          function: kasten.io/k10/kio/kube/workload.WaitForWorkloadReady
          linenumber: 47
          message: Statefulset not in ready state
        fields:
          - name: namespace
            value: bramborak
          - name: name
            value: my-postgresql
        file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:773
        function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).waitForWorkload
        linenumber: 773
        message: Error waiting for workload to be ready
      file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:373
      function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreApp
      linenumber: 373
      message: Failed to restore workloads
    file: kasten.io/k10/kio/exec/internal/runner/phase_runner.go:144
    function: kasten.io/k10/kio/exec/internal/runner.(*phaseRunner).execPlannedPhase
    linenumber: 144
    message: Failure in planned phase
  message: Job failed to be executed

 

I0223 10:20:09.201267       1 controller.go:1366] provision "bramborak/data-my-postgresql-0" class "nfs-csi": started
I0223 10:20:09.203737       1 event.go:364] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"bramborak", Name:"data-my-postgresql-0", UID:"88c5b80a-4b5e-4196-936e-090169088370", APIVersion:"v1", ResourceVersion:"6280012", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "bramborak/data-my-postgresql-0"
W0223 10:20:09.228956       1 controller.go:934] Retrying syncing claim "88c5b80a-4b5e-4196-936e-090169088370", failure 25
I0223 10:20:09.229011       1 event.go:364] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"bramborak", Name:"data-my-postgresql-0", UID:"88c5b80a-4b5e-4196-936e-090169088370", APIVersion:"v1", ResourceVersion:"6280012", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "nfs-csi": error getting handle for DataSource Type VolumeSnapshot by Name k10-csi-snap-6mlwfckcpfk2hh6k: error getting snapshot k10-csi-snap-6mlwfckcpfk2hh6k from api server: volumesnapshots.snapshot.storage.k8s.io "k10-csi-snap-6mlwfckcpfk2hh6k" not found
E0223 10:20:09.229071       1 controller.go:957] error syncing claim "88c5b80a-4b5e-4196-936e-090169088370": failed to provision volume with StorageClass "nfs-csi": error getting handle for DataSource Type VolumeSnapshot by Name k10-csi-snap-6mlwfckcpfk2hh6k: error getting snapshot k10-csi-snap-6mlwfckcpfk2hh6k from api server: volumesnapshots.snapshot.storage.k8s.io "k10-csi-snap-6mlwfckcpfk2hh6k" not found
 
Posted : 23/02/2024 5:29 am
(@mscrocdile)
Posts: 33
Eminent Member
Topic starter
 

Just want you to have the latest info. I think problem is still the same. RESTORESIZE is 0

Does it mean that restore would like to restore but has no snapshot?

Caused by namespace removing?

 

k get volumesnapshotcontent

NAME                                                                         READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER           VOLUMESNAPSHOTCLASS                       VOLUMESNAPSHOT                  VOLUMESNAPSHOTNAMESPACE   AGE
k10-csi-snap-x7jfghpwrpq4gdrm-content-42f8b418-4ee1-4b61-a60b-4cdad19d6dff   true         0             Retain           nfs.csi.k8s.io   k10-clone-csi-nfs-snapclas            s   k10-csi-snap-x7jfghpwrpq4gdrm   rumburak                  122m
k10-csi-snap-jk8mmb22skg8ws4b-content-c1524607-33fd-40a7-b8c5-c153c4ca6280   true         0             Retain           nfs.csi.k8s.io   k10-clone-csi-nfs-snapclas            s   k10-csi-snap-jk8mmb22skg8ws4b   dyne                      8m20s
 
Posted : 23/02/2024 6:22 am
(@mscrocdile)
Posts: 33
Eminent Member
Topic starter
 

There is a progress. If i create policy using "Enable Backups via Snapshot Exports" and put that into any S3 bucket then I'm able to restore the whole namespace with that.

So I really think that if I delete whole namespace then something important is deleted too.

It make sense to me - no backup no restore point in removed namespace. But it should not offer that point for restoring.

I probably still miss something.

 
Posted : 23/02/2024 9:36 am
Brandon Lee
(@brandon-lee)
Posts: 543
Member Admin
 

@mscrocdile great information. It sounds like you are still chipping away at the issue. Keep us posted. I have got to try this myself, very good for learning.

 
Posted : 24/02/2024 10:18 am
(@termv)
Posts: 17
Eminent Member
 

@mscrocdile I reproduced your problem. The volume snapshot is stored in your application's namespace so if you delete the namespace, it will delete the snapshots as well. You need to have K10 export your backups.

If you simply restore a backup in-place it works fine.

 
Posted : 24/02/2024 10:06 pm