I needed to perform a Kubernetes NFS migration from one back-end server to another. An application suite installed, that was already being used, called Cloud Pak for Data was installed and we did not want to reset it. The cluster is an Openshift Container Platform version 4.3. It is running on a POWER system (ppc64le). The procedure should work for any kind of application or Kubernetes above v1.16, as long as you make sure no pods are using the PVCs while you are in this process I show how to do that. I am using the dynamic provisioner that is published here:
https://github.com/kubernetes-incubator/external-storage/tree/master/nfs
The migration was from an old HDD server to a new NVMe server. This means the IP changed, unfortunately I did not defined the provisioner using a name, so, lessons learned. Do not use IP when defining the provisioner. Also the mount point was different, and I had to copy all the persistent volume content to the new server. I will maintain the storage class name and the RBAC authorizations I had.
This procedure was done on OpenShift Container Platform 4.3, I changed here all “oc” commands to “kubectl”.
Kubernetes NFS migration – Collecting Metadata
[root@ocp1-master01 migration]# kubectl get nodes NAME STATUS ROLES AGE VERSION master1.ocp4.ibm.lab Ready master 17d v1.16.2+18cfcc9 master2.ocp4.ibm.lab Ready master 17d v1.16.2+18cfcc9 master3.ocp4.ibm.lab Ready master 17d v1.16.2+18cfcc9 worker1.ocp4.ibm.lab Ready worker 17d v1.16.2+18cfcc9 worker2.ocp4.ibm.lab Ready worker 17d v1.16.2+18cfcc9 worker3.ocp4.ibm.lab Ready worker 17d v1.16.2+18cfcc9 [root@ocp1-master01 migration]#
The only application running on this cluster is Cloud Pak for Data, and it uses a lot of persistent Volumes. It was located on the zen namespace. In my case the default for the volumes were set to Delete:
[root@ocp1-master01 migration]# oc get pv |grep zen |head -5 pvc-0d5da1e4-c294-4a15-9472-95b361967928 10Gi RWO Delete Bound zen/data-redis-ha-server-1 managed-nfs-storage 40h pvc-0eab709d-7160-45a3-923b-b28462375e96 10Gi RWO Delete Bound zen/datadir-zen-metastoredb-0 managed-nfs-storage 40h pvc-19388b7e-fdd8-4f2b-8b1a-22c9d0d7ce4d 5Gi RWO Delete Bound zen/data-aiopenscale-ibm-aios-zookeeper-1 managed-nfs-storage 40h pvc-1a5d1206-b106-4ad6-b858-bbec11694938 5Gi RWO Delete Bound zen/data-aiopenscale-ibm-aios-zookeeper-2 managed-nfs-storage 40h pvc-1aba494c-c548-46e7-93d9-0cf1e8638f30 30Gi RWO Delete Bound zen/database-storage-wdp-couchdb-2 managed-nfs-storage 40h
So first I changed all reclaim policy to “Retain” to make sure I would have control over the PV.
[root@ocp1-master01 migration]# oc get pv |grep -v ^NAME |awk '{print $1}' |xargs -l -I{} oc patch pv {} -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' persistentvolume/imageregistry patched persistentvolume/pvc-0d5da1e4-c294-4a15-9472-95b361967928 patched persistentvolume/pvc-0eab709d-7160-45a3-923b-b28462375e96 patched persistentvolume/pvc-19388b7e-fdd8-4f2b-8b1a-22c9d0d7ce4d patched persistentvolume/pvc-1a5d1206-b106-4ad6-b858-bbec11694938 patched . . .
I know my applications in this namespace are controlled by both deployments and statefulsets. In order to perform a safe Kubernetes NFS migration, make sure you have all the metadata settings. Let’s save the current status to a file:
[root@ocp1-master01 migration]# oc get deployments -n zen > deployment_out [root@ocp1-master01 migration]# oc get statefulsets -n zen > statefulset_out [root@ocp1-master01 migration]#
I also created a pvc directory and inside of it I got the existing pvc details along with the pvc anf pv details.
[root@ocp1-master01 migration]# mkdir pvc [root@ocp1-master01 migration]# cd pvc [root@ocp1-master01 pvc]# oc get pvc -n zen |grep -v ^NAME |while read PVCNAME STATUS PVNAME OTHERDATA > do > mkdir ${PVCNAME} > oc get pvc $PVCNAME -n zen -o yaml > ${PVCNAME}/PVC_$PVCNAME.yaml > oc get pv $PVNAME -o yaml > ${PVCNAME}/PV_$PVNAME.yaml > done [root@ocp1-master01 pvc]# oc get pvc -n zen >pvclist [root@ocp1-master01 pvc]#
Kubernetes NFS migration – Scaling down the applications
Now, after I have all metadata configuration I need I will scale down to zero (0) all deployments and statefulsets so I can have a consistent copy to the other NFS Server
[root@ocp1-master01 pvc]# cd .. [root@ocp1-master01 migration]# cat deployment_out |grep -v ^NAME |while read NAME REPLICA AGE > do > kubectl scale deployment $NAME --replicas=0 -n zen > done deployment.extensions/aiopenscale-ibm-aios-bias scaled deployment.extensions/aiopenscale-ibm-aios-bkpicombined scaled deployment.extensions/aiopenscale-ibm-aios-common-api scaled deployment.extensions/aiopenscale-ibm-aios-configuration scaled deployment.extensions/aiopenscale-ibm-aios-dashboard scaled . . . [root@ocp1-master01 migration]# cat statefulset_out |grep -v ^NAME |while read NAME REPLICA AGE > do > kubectl scale statefulset $NAME --replicas=0 -n zen > done statefulset.apps/aiopenscale-ibm-aios-etcd scaled statefulset.apps/aiopenscale-ibm-aios-kafka scaled statefulset.apps/aiopenscale-ibm-aios-redis scaled statefulset.apps/aiopenscale-ibm-aios-zookeeper scaled statefulset.apps/db2wh-1592425754623-db2u scaled . . .
Attention!
You might have other types of implementation so make sure you don’t have replicasets or pods that are not controlled by deployments or statefulsets. You might need to save the yamls of pods to delete them and redeploy later on or scale down replicasets. This was not my case here.
Copying files to the other NFS Server
After I made sure none of the pods using PVCs were still up, I went to the part where I copied all the data to the new NFS, the old one was on the local server as you can see bellow.
[root@ocp1-master01 pvc]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 243G 0 243G 0% /dev tmpfs 256G 128K 256G 1% /dev/shm tmpfs 256G 120M 256G 1% /run tmpfs 256G 0 256G 0% /sys/fs/cgroup /dev/sdm2 116G 63G 48G 58% / /dev/mapper/ocp1_vg-ocp1_lv1 4.0T 575G 3.5T 15% /ocp1_xfs1 192.168.122.130:/ocp1_xfs1/nfsshare 4.0T 575G 3.5T 15% /nfsshare nfs-server:/nfs 12T 673G 11T 6% /nfs tmpfs 52G 0 52G 0% /run/user/1001 [root@ocp1-master01 pvc]# cp -prH /ocp1_xfs1/nfsshare/* /nfs/provisioner/ [root@ocp1-master01 pvc]#
Deleting the PVCs and PVs
Now it is time to delete the PVCs and then the PVs. I did have issues with some pvc that would not terminate, even though I could not find a pod using it. If you also encounter this issue, please refer to my other post:
https://dancasali.com/kubernetes-pvc-stuck-in-terminating
Now to the deletion of the Persistent Volumes
[root@ocp1-master01 pvc]# cat pvclist |grep -v ^NAME |while read PVCNAME STATUS PVNAME OTHERDATA > do > kubectl delete pvc $PVCNAME -n zen > kubectl delete pv $PVNAME > done persistentvolumeclaim "backupvol-db2wh-1592335690184-db2u-0" deleted persistentvolume "pvc-7355c4bf-f17b-4627-b4a4-63877c4cd778" deleted persistentvolumeclaim "backupvol-db2wh-1592335690184-db2u-1" deleted persistentvolume "pvc-d8aac7c2-81fa-4633-931c-554e98544771" deleted persistentvolumeclaim "backupvol-db2wh-1592425754623-db2u-0" deleted persistentvolume "pvc-615e0b56-ea3a-45a8-a2fc-dab0bff2870f" deleted persistentvolumeclaim "backupvol-db2wh-1592500694875-db2u-0" deleted persistentvolume "pvc-208453e5-74b9-4595-9555-53930fab6cc4" deleted persistentvolumeclaim "cc-home-pvc" deleted persistentvolume "pvc-da3feec1-d617-481d-b426-064af4d15d73" deleted
Change NFS provisioner deployment.
My provisioner in this case was in the default namespace. Since I will maintain my Storage Class and the RBAC authorizations I already had I will just delete the deployment and change the yaml file to point to the new NFS server. I also show in the example the important part to change.
Important:
Make sure you have the yaml of your deployment before deleting it.
[root@ocp1-master01 yamls]# kubectl get deployments -n default NAME READY UP-TO-DATE AVAILABLE AGE nfs-client-provisioner 1/1 1 1 47d [root@ocp1-master01 yamls]# oc delete deployment nfs-client-provisioner -n default deployment.extensions "nfs-client-provisioner" deleted [root@ocp1-master01 yamls]# tail -12 deployment.yaml env: - name: PROVISIONER_NAME value: fuseim.pri/ifs - name: NFS_SERVER value: nfsserver.ocp4.ibm.lab - name: NFS_PATH value: /nfs/provisioner volumes: - name: nfs-client-root nfs: server: nfsserver.ocp4.ibm.lab path: /nfs/provisioner [root@ocp1-master01 yamls]# kubectl apply -f deployment.yaml deployment.apps/nfs-client-provisioner created [root@ocp1-master01 yamls]# kubectl get deployments -n default NAME READY UP-TO-DATE AVAILABLE AGE nfs-client-provisioner 1/1 1 1 83s
Change PV metadata to match the new server.
I will use sed to change to the PV to the new settings.
Attention
Make a backup of all the files we created before changing them.
In the pvc directory I will change the old IP address to the new name (since this is really the best option. Also the path changes are reflected here on the next sed command issued.
[root@ocp1-master01 pvc]# sed -i 's/server: 192.168.122.130/server: nfsserver.ocp4.ibm.lab/' */PV_* [root@ocp1-master01 pvc]# sed -i 's/path: \/ocp1_xfs1\/nfsshare/path: \/nfs\/provisioner/' */PV_* [root@ocp1-master01 pvc]#
Recreating PVs and PVCs
Now we can recreate our PVs and PVCs. We do need to kill the existing Claim Reference from the PVC otherwise the PV will stay Released and the PVC will be in Lost status.
[root@ocp1-master01 pvc]# cat pvclist |grep -v ^NAME |while read PVCNAME STATUS PVNAME OTHERDATA > do > kubectl apply -f ${PVCNAME}/PV_$PVNAME.yaml > kubectl apply -f ${PVCNAME}/PVC_$PVCNAME.yaml -n zen > kubectl patch pv $PVNAME -p '{"spec":{"claimRef":null}}' > done [root@ocp1-master01 pvc]#
When you do that your PV will be in the
[root@jinete ~]# oc get pv pvc-0d5da1e4-c294-4a15-9472-95b361967928 NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-0d5da1e4-c294-4a15-9472-95b361967928 10Gi RWO Retain Released zen/data-redis-ha-server-1 managed-nfs-storage 2m13s [root@jinete ~]# oc get pvc data-redis-ha-server-1 -n zen NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-redis-ha-server-1 Lost pvc-0d5da1e4-c294-4a15-9472-95b361967928 0 managed-nfs-storage 56s [root@jinete ~]#
Kubernetes NFS migration – Scaling up the application
We will use the saved state that we had on the “deployment_out” and “statefulset_out” files to recover to the previous replica state.
[root@ocp1-master01 pvc]# cd .. [root@ocp1-master01 migration]# cat deployment_out |grep -v ^NAME |while read NAME REPLICA AGE > do > REPLICA=$(echo $REPLICA |cut -d/ -f2) > kubectl scale deployment $NAME --replicas=$REPLICA -n zen > done deployment.extensions/aiopenscale-ibm-aios-bias scaled deployment.extensions/aiopenscale-ibm-aios-bkpicombined scaled deployment.extensions/aiopenscale-ibm-aios-common-api scaled deployment.extensions/aiopenscale-ibm-aios-configuration scaled deployment.extensions/aiopenscale-ibm-aios-dashboard scaled . . . [root@ocp1-master01 migration]# cat statefulset_out |grep -v ^NAME |while read NAME REPLICA AGE > do > REPLICA=$(echo $REPLICA |cut -d/ -f2) > kubectl scale statefulset $NAME --replicas=$REPLICA -n zen > done statefulset.apps/aiopenscale-ibm-aios-etcd scaled statefulset.apps/aiopenscale-ibm-aios-kafka scaled statefulset.apps/aiopenscale-ibm-aios-redis scaled statefulset.apps/aiopenscale-ibm-aios-zookeeper scaled statefulset.apps/db2wh-1592425754623-db2u scaled . . .
Hint
If you had any lost pods that were not bound to deployments or statefulsets you can recover the yaml that you saved now. Also remember to scale back replicasets if that is your case.
Kubernetes NFS Migration done
Your pods should be coming back using your new NFS Server.