I recently went through the process of upgrading the memory of all nodes in a Nutanix cluster. The process was pretty painless but thought I’d document the commands I used anyway incase I need to refer back to them in the future.

Preparation – Check the cluster

The first thing I needed to do was to make sure the cluster was healthy and could sustain a node failure. The Data resiliency Status was OK, so I was good to proceed.

Data Resiliency Status of the cluster

Step One – Maintenance Mode

To ensure this went as smooth as possible I needed to enable maintenance mode on both the CVM (so the CVM wasn’t servicing IO) and the AHV host (so the VMs would migrate off the host).

First, I needed to get the host ID, so I used ncli host list to give me a list of all the hosts (and their CVMs) in the cluster. You can see the host ID is the first line in the output. The bit I needed is the number after the ‘::’, in my case it was 23. You can also see from the output that “Under Maintenance Mode” is null, indicating that this CVM is not in maintenance mode yet.

ncli host list

Now that I have the host ID and I have confirmed that the CVM isn’t in maintenance mode yet, I can issue the following command to enable maintenance mode on the CMV.
ncli host edit id=23 enable-maintenance-mode=true

We can see the CVM is now in maintenance mode by the “Under Maintenance Mode” line reading true.

CVM in maintenance mode

Now that the CVM is in maintenance mode, it’s time to put the AHV host into maintenance mode. The command I need to put the AHV host into maintenance mode is acli host.enter_maintenance_mode <host_IP>

AHV host in maintenance mode

Once the command finishes – the host will be in maintenance mode and it is safe to shut the host down. I just like to confirm the host is in maintenance mode by running acli host.list and confirm the node’s schedulable state is false.

AHV host listed as not schedulable

Step two – shutdown

So, now that I have both the CVM and the AHV host in maintenance mode, I SSH to the CVM that is in maintenance mode and run a quick check to ensure the CVM services are stopped before I shut down the CVM. To do this, I execute genesis status. This gives me a list of the CVM resources and their PID (if the process is running.

All the services are stopped, so now I execute cvm_shutdown -P now and wait for the CVM to shutdown.

Before shutting down the host, I like to do one final check to ensure the CVM is in fact off, so I ssh to the AHV host and execute virsh list --all to ensure the CVM is indeed powered off before I go any further.

The CVM is listed as “shut off” so I can now go ahead and execute poweroff to shutdown the AHV host.

I can now go ahead and perform any maintenance I need to on the host, like adding extra memory/disk, etc.

Step three – bringing the CVM and host out of maintenance mode

Once the node has been physically powered on and is contactable on the network, I need to reverse the steps I have done previously to take the host and CVM out of maintenance mode.

So first of all, I take the AHV host out of maintenance mode by SSH-ing to the cluster and running acli host.exit_maintenance_mode <hostIP>

Now I execute ncli host edit id=23 enable-maintenance-mode=false to bring the CVM out of maintenance mode.

I can see now that the CVM is out of maintenance mode. At this stage I like to SSH into the CVM and watch the services start up so I know the CVM is okay. Once I SSH into the CVM I execute watch -d 'genesis status'. This will poll the genesis status command every two seconds and show what has changed so I can see the services starting.

genesis status

Once all services are started, I wait for the Data Resiliency Status in Prism to return to OK.

Step four – Verify cluster health

Cluster is healthy, job done.

Data Resiliency Status

Leave A Comment

Your email address will not be published. Required fields are marked *