Kubernetes NFS migration on the same cluster

I needed to perform a Kubernetes NFS migration from one back-end server to another. An application suite installed, that was already being used, called Cloud Pak for Data was installed and we did not want to reset it. The cluster is an Openshift Container Platform version 4.3. It is running on a POWER system (ppc64le). The procedure should work for any kind of application or Kubernetes above v1.16, as long as you make sure no pods are using the PVCs while you are in this process I show how to do that. I am using the dynamic provisioner that is published here:
https://github.com/kubernetes-incubator/external-storage/tree/master/nfs

The migration was from an old HDD server to a new NVMe server. This means the IP changed, unfortunately I did not defined the provisioner using a name, so, lessons learned. Do not use IP when defining the provisioner. Also the mount point was different, and I had to copy all the persistent volume content to the new server. I will maintain the storage class name and the RBAC authorizations I had.

This procedure was done on OpenShift Container Platform 4.3, I changed here all “oc” commands to “kubectl”.

Kubernetes NFS migration – Collecting Metadata

[root@ocp1-master01 migration]# kubectl get nodes
NAME                   STATUS   ROLES    AGE   VERSION
master1.ocp4.ibm.lab   Ready    master   17d   v1.16.2+18cfcc9
master2.ocp4.ibm.lab   Ready    master   17d   v1.16.2+18cfcc9
master3.ocp4.ibm.lab   Ready    master   17d   v1.16.2+18cfcc9
worker1.ocp4.ibm.lab   Ready    worker   17d   v1.16.2+18cfcc9
worker2.ocp4.ibm.lab   Ready    worker   17d   v1.16.2+18cfcc9
worker3.ocp4.ibm.lab   Ready    worker   17d   v1.16.2+18cfcc9
[root@ocp1-master01 migration]#

The only application running on this cluster is Cloud Pak for Data, and it uses a lot of persistent Volumes. It was located on the zen namespace. In my case the default for the volumes were set to Delete:

[root@ocp1-master01 migration]# oc get pv |grep zen |head -5
pvc-0d5da1e4-c294-4a15-9472-95b361967928   10Gi       RWO            Delete           Bound       zen/data-redis-ha-server-1                        managed-nfs-storage            40h
pvc-0eab709d-7160-45a3-923b-b28462375e96   10Gi       RWO            Delete           Bound       zen/datadir-zen-metastoredb-0                     managed-nfs-storage            40h
pvc-19388b7e-fdd8-4f2b-8b1a-22c9d0d7ce4d   5Gi        RWO            Delete           Bound       zen/data-aiopenscale-ibm-aios-zookeeper-1         managed-nfs-storage            40h
pvc-1a5d1206-b106-4ad6-b858-bbec11694938   5Gi        RWO            Delete           Bound       zen/data-aiopenscale-ibm-aios-zookeeper-2         managed-nfs-storage            40h
pvc-1aba494c-c548-46e7-93d9-0cf1e8638f30   30Gi       RWO            Delete           Bound       zen/database-storage-wdp-couchdb-2                managed-nfs-storage            40h

So first I changed all reclaim policy to “Retain” to make sure I would have control over the PV.

[root@ocp1-master01 migration]# oc get pv |grep -v ^NAME |awk '{print $1}' |xargs -l -I{} oc patch pv {} -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
persistentvolume/imageregistry patched
persistentvolume/pvc-0d5da1e4-c294-4a15-9472-95b361967928 patched
persistentvolume/pvc-0eab709d-7160-45a3-923b-b28462375e96 patched
persistentvolume/pvc-19388b7e-fdd8-4f2b-8b1a-22c9d0d7ce4d patched
persistentvolume/pvc-1a5d1206-b106-4ad6-b858-bbec11694938 patched
.
.
.

I know my applications in this namespace are controlled by both deployments and statefulsets. In order to perform a safe Kubernetes NFS migration, make sure you have all the metadata settings. Let’s save the current status to a file:

[root@ocp1-master01 migration]# oc get deployments -n zen > deployment_out
[root@ocp1-master01 migration]# oc get statefulsets -n zen > statefulset_out
[root@ocp1-master01 migration]#

I also created a pvc directory and inside of it I got the existing pvc details along with the pvc anf pv details.

[root@ocp1-master01 migration]# mkdir pvc
[root@ocp1-master01 migration]# cd pvc
[root@ocp1-master01 pvc]# oc get pvc -n zen |grep -v ^NAME |while read PVCNAME STATUS PVNAME OTHERDATA
> do
> mkdir ${PVCNAME}
> oc get pvc $PVCNAME -n zen -o yaml > ${PVCNAME}/PVC_$PVCNAME.yaml
> oc get pv $PVNAME -o yaml > ${PVCNAME}/PV_$PVNAME.yaml
> done
[root@ocp1-master01 pvc]# oc get pvc -n zen >pvclist
[root@ocp1-master01 pvc]#

Kubernetes NFS migration – Scaling down the applications

Now, after I have all metadata configuration I need I will scale down to zero (0) all deployments and statefulsets so I can have a consistent copy to the other NFS Server

[root@ocp1-master01 pvc]# cd ..
[root@ocp1-master01 migration]# cat deployment_out |grep -v ^NAME |while read NAME REPLICA AGE
> do
> kubectl scale deployment $NAME --replicas=0 -n zen
> done
deployment.extensions/aiopenscale-ibm-aios-bias scaled
deployment.extensions/aiopenscale-ibm-aios-bkpicombined scaled
deployment.extensions/aiopenscale-ibm-aios-common-api scaled
deployment.extensions/aiopenscale-ibm-aios-configuration scaled
deployment.extensions/aiopenscale-ibm-aios-dashboard scaled
.
.
.
[root@ocp1-master01 migration]# cat statefulset_out |grep -v ^NAME |while read NAME REPLICA AGE
> do
> kubectl scale statefulset $NAME --replicas=0 -n zen
> done
statefulset.apps/aiopenscale-ibm-aios-etcd scaled
statefulset.apps/aiopenscale-ibm-aios-kafka scaled
statefulset.apps/aiopenscale-ibm-aios-redis scaled
statefulset.apps/aiopenscale-ibm-aios-zookeeper scaled
statefulset.apps/db2wh-1592425754623-db2u scaled
.
.
.

Attention!

You might have other types of implementation so make sure you don’t have replicasets or pods that are not controlled by deployments or statefulsets. You might need to save the yamls of pods to delete them and redeploy later on or scale down replicasets. This was not my case here.

Copying files to the other NFS Server

After I made sure none of the pods using PVCs were still up, I went to the part where I copied all the data to the new NFS, the old one was on the local server as you can see bellow.

[root@ocp1-master01 pvc]# df -h
Filesystem                          Size  Used Avail Use% Mounted on
devtmpfs                            243G     0  243G   0% /dev
tmpfs                               256G  128K  256G   1% /dev/shm
tmpfs                               256G  120M  256G   1% /run
tmpfs                               256G     0  256G   0% /sys/fs/cgroup
/dev/sdm2                           116G   63G   48G  58% /
/dev/mapper/ocp1_vg-ocp1_lv1        4.0T  575G  3.5T  15% /ocp1_xfs1
192.168.122.130:/ocp1_xfs1/nfsshare  4.0T  575G  3.5T  15% /nfsshare
nfs-server:/nfs                      12T  673G   11T   6% /nfs
tmpfs                                52G     0   52G   0% /run/user/1001
[root@ocp1-master01 pvc]# cp -prH /ocp1_xfs1/nfsshare/* /nfs/provisioner/
[root@ocp1-master01 pvc]#

Deleting the PVCs and PVs

Now it is time to delete the PVCs and then the PVs. I did have issues with some pvc that would not terminate, even though I could not find a pod using it. If you also encounter this issue, please refer to my other post:

https://dancasali.com/kubernetes-pvc-stuck-in-terminating

Now to the deletion of the Persistent Volumes

[root@ocp1-master01 pvc]# cat pvclist |grep -v ^NAME |while read PVCNAME STATUS PVNAME OTHERDATA
> do
> kubectl delete pvc $PVCNAME -n zen
> kubectl delete pv $PVNAME
> done
persistentvolumeclaim "backupvol-db2wh-1592335690184-db2u-0" deleted
persistentvolume "pvc-7355c4bf-f17b-4627-b4a4-63877c4cd778" deleted
persistentvolumeclaim "backupvol-db2wh-1592335690184-db2u-1" deleted
persistentvolume "pvc-d8aac7c2-81fa-4633-931c-554e98544771" deleted
persistentvolumeclaim "backupvol-db2wh-1592425754623-db2u-0" deleted
persistentvolume "pvc-615e0b56-ea3a-45a8-a2fc-dab0bff2870f" deleted
persistentvolumeclaim "backupvol-db2wh-1592500694875-db2u-0" deleted
persistentvolume "pvc-208453e5-74b9-4595-9555-53930fab6cc4" deleted
persistentvolumeclaim "cc-home-pvc" deleted
persistentvolume "pvc-da3feec1-d617-481d-b426-064af4d15d73" deleted

Change NFS provisioner deployment.

My provisioner in this case was in the default namespace. Since I will maintain my Storage Class and the RBAC authorizations I already had I will just delete the deployment and change the yaml file to point to the new NFS server. I also show in the example the important part to change.

Important:

Make sure you have the yaml of your deployment before deleting it.

[root@ocp1-master01 yamls]# kubectl get deployments -n default
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
nfs-client-provisioner   1/1     1            1           47d
[root@ocp1-master01 yamls]# oc delete deployment nfs-client-provisioner -n default
deployment.extensions "nfs-client-provisioner" deleted
[root@ocp1-master01 yamls]# tail -12 deployment.yaml
          env:
            - name: PROVISIONER_NAME
              value: fuseim.pri/ifs
            - name: NFS_SERVER
              value: nfsserver.ocp4.ibm.lab
            - name: NFS_PATH
              value: /nfs/provisioner
      volumes:
        - name: nfs-client-root
          nfs:
            server: nfsserver.ocp4.ibm.lab
            path: /nfs/provisioner
[root@ocp1-master01 yamls]# kubectl apply -f deployment.yaml
deployment.apps/nfs-client-provisioner created
[root@ocp1-master01 yamls]# kubectl get deployments -n default
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
nfs-client-provisioner   1/1     1            1           83s

Change PV metadata to match the new server.

I will use sed to change to the PV to the new settings.

Attention

Make a backup of all the files we created before changing them.

In the pvc directory I will change the old IP address to the new name (since this is really the best option. Also the path changes are reflected here on the next sed command issued.

[root@ocp1-master01 pvc]# sed -i 's/server: 192.168.122.130/server: nfsserver.ocp4.ibm.lab/' */PV_*
[root@ocp1-master01 pvc]# sed -i 's/path: \/ocp1_xfs1\/nfsshare/path: \/nfs\/provisioner/' */PV_*
[root@ocp1-master01 pvc]# 

Recreating PVs and PVCs

Now we can recreate our PVs and PVCs. We do need to kill the existing Claim Reference from the PVC otherwise the PV will stay Released and the PVC will be in Lost status.

[root@ocp1-master01 pvc]# cat pvclist |grep -v ^NAME |while read PVCNAME STATUS PVNAME OTHERDATA
> do
> kubectl apply -f ${PVCNAME}/PV_$PVNAME.yaml
> kubectl apply -f ${PVCNAME}/PVC_$PVCNAME.yaml -n zen
> kubectl patch pv $PVNAME -p '{"spec":{"claimRef":null}}'
> done
[root@ocp1-master01 pvc]#

When you do that your PV will be in the

[root@jinete ~]# oc get pv pvc-0d5da1e4-c294-4a15-9472-95b361967928
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                        STORAGECLASS          REASON   AGE
pvc-0d5da1e4-c294-4a15-9472-95b361967928   10Gi       RWO            Retain           Released   zen/data-redis-ha-server-1   managed-nfs-storage            2m13s
[root@jinete ~]# oc get pvc data-redis-ha-server-1 -n zen
NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          AGE
data-redis-ha-server-1   Lost     pvc-0d5da1e4-c294-4a15-9472-95b361967928   0                         managed-nfs-storage   56s
[root@jinete ~]#

Kubernetes NFS migration – Scaling up the application

We will use the saved state that we had on the “deployment_out” and “statefulset_out” files to recover to the previous replica state.

[root@ocp1-master01 pvc]# cd ..
[root@ocp1-master01 migration]# cat deployment_out |grep -v ^NAME |while read NAME REPLICA AGE
> do
> REPLICA=$(echo $REPLICA |cut -d/ -f2)
> kubectl scale deployment $NAME --replicas=$REPLICA -n zen
> done
deployment.extensions/aiopenscale-ibm-aios-bias scaled
deployment.extensions/aiopenscale-ibm-aios-bkpicombined scaled
deployment.extensions/aiopenscale-ibm-aios-common-api scaled
deployment.extensions/aiopenscale-ibm-aios-configuration scaled
deployment.extensions/aiopenscale-ibm-aios-dashboard scaled
.
.
.
[root@ocp1-master01 migration]# cat statefulset_out |grep -v ^NAME |while read NAME REPLICA AGE
> do
> REPLICA=$(echo $REPLICA |cut -d/ -f2)
> kubectl scale statefulset $NAME --replicas=$REPLICA -n zen
> done
statefulset.apps/aiopenscale-ibm-aios-etcd scaled
statefulset.apps/aiopenscale-ibm-aios-kafka scaled
statefulset.apps/aiopenscale-ibm-aios-redis scaled
statefulset.apps/aiopenscale-ibm-aios-zookeeper scaled
statefulset.apps/db2wh-1592425754623-db2u scaled
.
.
.

Hint

If you had any lost pods that were not bound to deployments or statefulsets you can recover the yaml that you saved now. Also remember to scale back replicasets if that is your case.

Kubernetes NFS Migration done

Your pods should be coming back using your new NFS Server.