I recently went through the process of upgrading the memory of all nodes in a Nutanix cluster. The process was pretty painless but thought I’d document the commands I used anyway incase I need to refer back to them in the future.
Preparation – Check the cluster
The first thing I needed to do was to make sure the cluster was healthy and could sustain a node failure. The Data resiliency Status was OK, so I was good to proceed.
Step One – Maintenance Mode
To ensure this went as smooth as possible I needed to enable maintenance mode on both the CVM (so the CVM wasn’t servicing IO) and the AHV host (so the VMs would migrate off the host).
First, I needed to get the host ID, so I used
ncli host list to give me a list of all the hosts (and their CVMs) in the cluster. You can see the host ID is the first line in the output. The bit I needed is the number after the ‘::’, in my case it was 23. You can also see from the output that “Under Maintenance Mode” is null, indicating that this CVM is not in maintenance mode yet.
Now that I have the host ID and I have confirmed that the CVM isn’t in maintenance mode yet, I can issue the following command to enable maintenance mode on the CMV.
ncli host edit id=23 enable-maintenance-mode=true
We can see the CVM is now in maintenance mode by the “Under Maintenance Mode” line reading true.
Now that the CVM is in maintenance mode, it’s time to put the AHV host into maintenance mode. The command I need to put the AHV host into maintenance mode is
acli host.enter_maintenance_mode <host_IP>
Once the command finishes – the host will be in maintenance mode and it is safe to shut the host down. I just like to confirm the host is in maintenance mode by running
acli host.list and confirm the node’s schedulable state is false.
Step two – shutdown
So, now that I have both the CVM and the AHV host in maintenance mode, I SSH to the CVM that is in maintenance mode and run a quick check to ensure the CVM services are stopped before I shut down the CVM. To do this, I execute
genesis status. This gives me a list of the CVM resources and their PID (if the process is running.
All the services are stopped, so now I execute
cvm_shutdown -P now and wait for the CVM to shutdown.
Before shutting down the host, I like to do one final check to ensure the CVM is in fact off, so I ssh to the AHV host and execute
virsh list --all to ensure the CVM is indeed powered off before I go any further.
The CVM is listed as “shut off” so I can now go ahead and execute
poweroff to shutdown the AHV host.
I can now go ahead and perform any maintenance I need to on the host, like adding extra memory/disk, etc.
Step three – bringing the CVM and host out of maintenance mode
Once the node has been physically powered on and is contactable on the network, I need to reverse the steps I have done previously to take the host and CVM out of maintenance mode.
So first of all, I take the AHV host out of maintenance mode by SSH-ing to the cluster and running
acli host.exit_maintenance_mode <hostIP>
Now I execute
ncli host edit id=23 enable-maintenance-mode=false to bring the CVM out of maintenance mode.
I can see now that the CVM is out of maintenance mode. At this stage I like to SSH into the CVM and watch the services start up so I know the CVM is okay. Once I SSH into the CVM I execute
watch -d 'genesis status'. This will poll the
genesis status command every two seconds and show what has changed so I can see the services starting.
Once all services are started, I wait for the Data Resiliency Status in Prism to return to OK.
Step four – Verify cluster health
Cluster is healthy, job done.