Nutanix have released AOS 5.10.4 out into the wild as of the 21st of May 2019. This particular release is more of a bug fix release rather than a feature release so no new features are included in this update.

AOS 5.10.4 also brings a new AHV version (AHV-20170830.270) which is also a fix release.

This AOS release brings the version number up to the same as Prism Central (5.10.4) which was released about a week ago (14th of May, 2019).

Resolved Issues in AOS 5.10.4

AHV-Management

  • ENG-73536 – Improved traceability of VMs affected by a High Availability (HA) event by associating a VM name to the HA restart task.
  • ENG-190469 – Improved debugability of AHV by periodically recording runtime traces, which are now collected by NCC.
  • ENG-214823 – You might repeatedly see the following task message on Prism after an AOS upgrade. update vm db state. This task is now hidden and processed in the background.
  • ENG-215649 – The manage_ovs update_uplinks command to update uplinks might not work successfully prior to cluster creation or in manual installation mode.
  • ENG-220169 – The host stalls at Entering Maintenance Mode in a cluster with a Never Schedulable node. Upgrade operations may hang on AOS 5.10.1.
  • ENG-220937 – All the AHV hosts might not be schedulable after upgrading AOS due to a rare race condition that causes all tasks to be stuck.
  • ENG-221101 – The Acropolis Dynamic Scheduler might become unresponsive and prevent VMs from powering on automatically. When powered on manually, the VMs take longer than usual to start up.

Cassandra

  • ENG-194262 – The Cassandra cluster service will add sub-shards as needed on the metadata disks.

Citrix Integration

  • ENG-214809 – You might not be able to see Nutanix VM statistics on Citrix Director on clusters running AOS 5.10.x.

Data Protection

  • ENG-171024 – When you perform a Disaster Recovery failover, the VM disks might display size of 0 GB after the failover. This was observed when datastore containers contained special characters, such as a parenthesis. If you have such datastore naming and disaster recovery replication configured, you must upgrade immediately to mitigate future failover issues.
  • ENG-211423 – After upgrading a Hyper-V cluster to either AOS 5.9.x or 5.10.x, VMs might be randomly missing from Protection Domain snapshots. The following alert message displays. Unable to locate VM(s). If you have Hyper-V with Protection Domains configured, you must upgrade to this release to prevent these failures.

Infrastructure / Services

  • ENG-115506 ENG-121658 – Added capability to utilize in-built IPv4 to Phoenix which is useful for Phoenix workflows (such as SATADOM replacements) in networks that do not support IPv6.
  • ENG-145764 – Curator scans might be unresponsive and not complete if the chronos task management service is also unresponsive. The underlying chronos unresponsiveness has been fixed in this release.
  • ENG-149414 – The upgrade from Hyper-V 2012 R2 to Hyper-V 2016 now ensures that non-default vSwitches are preserved upon upgrade.
  • ENG-170722 – After expanding an AHV cluster with two new nodes, one of the two nodes might not have the correct version of NCC due to a race condition. Multi-node expansions will now have the correct NCC on every new node.
  • ENG-171332 – A 2-node cluster failover is not working during a node failure event. The state of the cluster state goes down and the witness VM is not able to successfully communicate with the nodes as expected.
  • ENG-178459 – Fixed an issue where both expanding a cluster and foundation validate_network_configuration stall because the IP address of the IPMI interface of new node is in a different network than existing nodes.
  • ENG-183195 – Upgrading all hosts might not be successful with AHV 1-click upgrade.
  • ENG-193316 – After replacing an SSD, the Controller VM might stall with the following message even when enough space is available on SATADOM. Memory available on satadom is less than required.
  • ENG-193351 – Enhanced message clarity when the pre-checks fail for 1-click hypervisor upgrade in ESXi.
  • ENG-194445 – Enhanced curator background task distribution. This is a general improvement, but is especially helpful for clusters with small amounts of very large vDisks.
  • ENG-203519 – AOS upgrade fails to start, due pre-check failure test_nos_signature_validation. This pre-check now correctly verifies tarball signatures
  • ENG-204589 – An AOS upgrade from 5.10.1 on an AHV cluster may lead to one or more VMs to restart unexpectedly. This bug occurs more frequently with GPU enabled VMs and was specific to clusters running 5.10.1 through 5.1.3.2.
  • ENG-210039 – Two nodes might stall in kStandAlone for a longer time than expected to move to kSwitchToTwoNode.
  • ENG-219075 – After adding bulk NFS whitelist, a separate entry is created for each NFS port.

LCM

  • ENG-195061 – The node fails to exit Maintenance mode after upgrading BIOS or BMC using Life Cycle Manager. In this case, the actual BMC/BIOS update usually succeeded, but a timing issue within AOS might prevent a node from coming back in service, thus hanging LCM progress. The timing issue has been fixed in this release
  • ENG-203361 – The Remote Procedure Call (RPC) request to stop Foundation service gets stalled during an LCM update run and can cause the LCM to stall forward progress. This is due to a timing issue in which the first RPC does not properly retry. In these cases, stopping foundation does not successfully complete until the cluster service is restarted. With this release, RPC failures will be properly retried to eliminate timing related race conditions.
  • ENG-214486 – Upgrading LCM might not put the Controller VM into maintenance mode gracefully, which could cause Zookeeper to crash. In this release, Controllers will now turn down all services gracefully.

Licensing

  • ENG-204949 – After upgrading AOS to 5.10 with a licensed cluster, the Licensing window displays Starter. When you click Show Licenses, the window shows that the Pro licenses are applied.

NCC

  • ENG-207799 – NCC might stall after upgrading from version 3.6.4 to 3.7.0 running AOS 5.10.1.1.

Networking

  • ENG-192407 – Incorrect Open vSwitch (OVS) datapath flow logic on AHV clusters might have caused MAC address duplication in AHV clusters. Depending on overall system and physical network configuration, MAC address duplication can cause a myriad of issues, including perceived system lockups, upstream network issues, connectivity to Nutanix Volumes (formerly Acropolis Block Services), and cluster level HA events. This issue has been fixed on AHV 20170830.265, which first appeared on AOS 5.10.3.2. AHV 20170830.270 has been included with this release. Given the severity of this issue, customers running AHV versions earlier than 20170830.265 must immediately upgrade AHV after upgrading to AOS to 5.10.3.2 or later.
  • ENG-197819 – Managing uplinks with manage_ovs might lead to unexpected behavior of the network causing the network to become unavailable. This was most commonly seen where there is no bond but just an uplink port associated with a single nic. In this case, it was possible to create a network loop (spanning-tree loop).
  • ENG-220154 – The AHV host might become unresponsive and Controller VMs might go down, due to an edge case deadlock in Open vSwitch (OVS). This issue has been fixed on AHV 20170830.265, which first appeared on AOS 5.10.3.2. AHV 20170830.270 has been included with this release.

Nutanix Guest Tools

  • ENG-215659 – Installing Nutanix Guest Tools fails on Windows 2008, Windows 2008 R2, or Windows 7 running AOS 5.10.2. The following error message displays. The system cannot find the file specified. cmd.exe /c net start ‘Nutanix Guest Agent’. As a workaround, refer to KB 7136 for more details.

Prism Gateway

  • ENG-192496 – The Prism Web Console UI might be unresponsive when you choose to stop using a Certificate Authority (CA) certificate. This has been observed when a CA is unconfigured as well as a corner case during some AOS upgrades.
  • ENG-194345 – You might not be able to add a Protection Domain Schedule with Cmdlet or REST API after upgrading AOS to version 5.5.6 and later.
  • ENG-203298 – The Get VM v1 API call takes an inordinate amount of time due to a delay in fetching vDisk configuration. The time taken is proportionate to the number of VMs in the cluster and relates to an internal call that repeats for every single VM, rather than just running once. This manifested in issues such as slow power-ons using Citrix AHV XenApp / XenDesktop plugin, Citrix AppLayer vDisk attachment failures, HYCU backup failures, sluggish Prism Element and Prism Central performance, and anything else using this API call.
  • ENG-206673 – On the Tasks dashboard, clicking the entity VM might fail with error message Unknown VM error.
  • ENG-214809 – You might not be able to see Nutanix VM statistics on Citrix Director on clusters running AOS 5.10.x.
  • ENG-216371 – NCC version displays unknown on 5.10.3 Prism Element after you upgrade to NCC version 3.7.1 from the Prism user interface.
  • ENG-223007 – Data Unavailability for clustered applications can occur in the following multi-part scenarios.
    • Your cluster was, at any time, running AOS 5.9.x.
    • Your customer is then upgraded to 5.10.x, prior to 5.10.4.
    • You have one or more vDisks serving a workload that uses persistent SCSI-3 reservations (SCSI-3 PR) through iSCSI, such as Microsoft Failover Clusters via Nutanix Volumes (formerly Acropolis Block Services) or HyperV shared VHDX disks.
    • A cluster property is edited such as cluster name, cluster VIP, common criteria mode, using Prism or NCLI edit-params.In this specific sequence of events, the property update triggers a reset of the internal SCSI-3 PR state. This leads to SCSI-3 PR enabled disks to go into a failed state. If this happens, immediately contact Nutanix Support for a workaround to enable data access. Note: This issue does not affect clusters you upgrade to 5.10.x directly from releases earlier than 5.9.x or on fresh installs of 5.10.x.

Prism UI

  • ENG-73536 – Improved traceability of VMs affected by a High Availability (HA) event by associating a VM name to the HA restart task.
  • ENG-125150 – Improved the user experience in Cluster Health where Check pass/fail history for selected cluster field might show inverted colored lines.
  • ENG-182973 – You might not have been able to access the License panel in Prism running on Internet Explorer browser.
  • ENG-202377 – Enabling or disabling flash mode from Prism is not successful.
  • ENG-210030 – Fixed an issue where Prism user interface frequently reloads.
  • ENG-212206 – Fixed an issue of space unit mismatch in Prism Element, where some views had KiB and GiB flipped.
  • ENG-220835 – The VMs table page in the Prism web console triggers a large number of API calls for v1/utils/entities which made the Prism UI sluggish. This was especially noticeable with multiple concurrent users and was the result of incorrect polling logic. Polling logic has been updated to eliminate redundant internal calls.
  • ENG-221193 – The Pause/Resume button on the Prism user interface has been removed as this functionality is no longer officially supported.

Security

  • ENG-158073 – The LDAP authentication might be unsuccessful when the Active Directory user belongs to the primary group.
  • ENG-166243 – Enhanced debugging mechanism added to the backend LDAP authentication service. Fixed existing debugging error that was not allowing LDAP users to login.
  • ENG-168044 – Fixed OpenJDK issue with insufficient index validation in PatternSyntaxException getMessage () (Concurrency, 8199547) (CVE-2018-2952).
  • ENG-209818 – You might experience Prism accessibility issue while upgrading AOS from version 5.5.x to 5.10.x.
  • ENG-210334 – The controls for zone transfer were not properly applied to Dynamically Loadable Zones (DLZs). An attacker acting as a DNS client could use this flaw to request and receive a zone transfer of a DLZ even when not permitted to do so by the allow-transfer ACL. (CVE-2019-6465).
  • ENG-212941 – polkit: Temporary auth hijacking via PID reuse and non-atomic fork (CVE-2019-6133).

Serviceability

  • ENG-194498 – You might not have been able to send a cluster alert email successfully.
  • ENG-195831 – The Controller VM might become unresponsive because of excessive spawning of the remote tunnel service, which consumed more memory than designed.

Services

  • ENG-171024 – When you perform a Disaster Recovery failover, the VM disks might display size of 0 GB after the failover. This was observed when datastore containers contained special characters, such as a parenthesis. If you have such datastore naming and disaster recovery replication configured, you must upgrade immediately to mitigate future failover issues.

Storage/Tools

  • ENG-212236 – vDisk manipulator fails to run on vDisks that are on storage containers with software encryption set to ON.

Zookeeper

  • ENG-158575 – Improved logic used in handling a degraded node for the cluster services.
  • ENG-160764 – Fixed an issue where the cluster service might have become unresponsive.

You can read the full release notes here.

Leave A Comment

Your email address will not be published. Required fields are marked *