Xi Frame – End User Compute for the Hybrid Cloud

End User Computing has been a staple for businesses for a long time. With the current shift in infrastructure, where we are moving from purely on-premises Data Centres to more hybrid and even fully cloud-based infrastructure, is End User Computing keeping up with change?

Nutanix is, with its latest acquisition of Frame. Inc, by giving you an End User Computing (EUC) platform that can deploy to both your public and private clouds. Nutanix is making End User Computing, something that is typically complex, simple with Nutanix Xi Frame (Frame).

Xi Frame

Xi Frame is a desktop-as-a-service platform built with the simplicity and flexibility of Cloud and Nutanix. Frame is unique in the fact that it runs entirely in your browser. There are no client plugins to install which means as long as you have a HTML5 compatible browser, you can use Frame.

How Frame Works

As with other cloud-based EUC solutions (Citrix Cloud, for example), Frame has a Control Plane/Back Plane, where the service is managed, as well as Workload VMs that run the desktops and applications that are to be made available to users. Workload VMs can be Windows or Linux operating systems. The back end of Frame is a Software-as-a-Service (SaaS) offering hosted by Nutanix while the Workload VMs are deployed to your choice of cloud (Azure, AWS, Google Cloud) or even your on-premises Nutanix Clusters.

This gives you the flexibility to deploy your Workload VMs in one or many clouds (including on-premises) which enables Frame to be a truly multi-cloud hybrid solution.

Why would I chose Frame?

There are other End-User Compute (EUC) solutions out there that offer a similar service so what is Frame doing to be advantageous over the competition?

Nutanix AHV

Frame can have the worker VMs run in any public cloud however, it can now also run worker VMs on your on-premises Nutanix AHV clusters.

Cloud Native

Xi Frame was born in the cloud but it wasn’t created for just one cloud. Frame is truly multi-cloud enabled which means you can run your Workload VMs in the cloud of your choosing (Xi, Google, Azure, AWS); Frame was built to be secure, scalable and cost effective. Your Frame subscription will be paid monthly starting with as little as five (5) users and you can base your subscription on Named or Concurrent users.

Simple

The idea behind Frame is that it should be simple to use for both the end user and the administrator. With Frame, you can run any Windows or Linux based application in a HTML5 compatible browser. The admin interface is simple, easy to setup and configure and is updated in the background so you don’t need to control the updates yourself. Changing settings are simple switches and sliders; there’s no complex configuration needed.

Flexible

Because Frame was born in the cloud and has become part of the Nutanix family of cloud-based services, you get the flexibility and choice of cloud-based operations with the simplicity of Nutanix one-click operations for enterprise integrations such as:

  • Identity – (Google, AD/ADFS,  Auth0, Okta, Ping, SAML/OAuth)
  • Storage – (Drop Box, Box, Google Drive, OneDrive, SMB, Shared Drives)
  • Networking – (Peering, Direct Connect, VPN, ExpressRoute)
  • APIs – (CI/CD, Headless access to apps, Embedded Applications, Frame Application, Frame Web Services)
  • Operating Systems – Frame supports both Windows and Lunix guest OSs

Secure

Frame is designed to be secure with the following features as standout:

  • FedRAMP Ready
  • Role Based Access Control (RBAC)
  • End-to-end encryption built-in

Use cases

Education sectors could benefit from Frame: As instances can be scaled on demand, there is no need for a client-side plugin which would make deployment to students on a variety of different devices simple. Full RBAC support and multi-site management would remove the headache of managing multiple schools/sites.

Businesses employing a BYOD structure for their devices could benefit from Frame: You would not need to on-board devices to deploy a client or install any software. Frame is run entirely from a browser so regardless of the device, the employee will have access to their applications.

Design practices that use heavy graphic workloads can scale without the hefty investment in high-end hardware: Frame is built in the cloud so you can scale with ease and then trim back when you don’t need as much power.

Test Drive

You wouldn’t buy a car without taking it for a spin so why not do the same with your EUC product. You can head on over to https://fra.me/test-drive and sign up for a test drive of Frame.

Xi Frame - End User Compute  for the Hybrid Cloud 2
Fill out the signup form
Xi Frame - End User Compute  for the Hybrid Cloud 3
Select your closest region
Xi Frame - End User Compute  for the Hybrid Cloud 4
Verify email address
Xi Frame - End User Compute  for the Hybrid Cloud 5

Once you have verified your email address, you’ll be able to jump right in to Frame and take it for a spin.

Launchpad and Session

Heading over to https://login.frame.nutanix.com you are presented with the Frame login.

Xi Frame - End User Compute  for the Hybrid Cloud 6

Upon logging into Frame you are presented with a default “Launchpad”. The launchpad is the end user-facing part of the Frame platform interface where users launch and manipulate applications. The Frame platform allows you to have multiple launchpads that can be customised for different use cases and workflows.

Launchpads are attached to “Accounts” and, at their core, they are a representation of the applications that are available for streaming from a Sandbox that is managed by an Organisational or Customer administrator.

An example of a simple application launchpad:

Xi Frame - End User Compute  for the Hybrid Cloud 7

Clicking on the launchpad list icon in the upper middle portion of the page will show users all launchpads they have access to (including titles and thumbnails):

Xi Frame - End User Compute  for the Hybrid Cloud 8

Clicking on an application within the launchpad will launch the app within your browser:

Xi Frame - End User Compute  for the Hybrid Cloud 9

When the application launches it actually loads a ‘desktop’ session with the ability to launch additional apps within the same session.

Xi Frame - End User Compute  for the Hybrid Cloud 10

Here, you can see my application that I chose to launch (Chrome), but if you look a bit closer, you’ll realise that you have actually launched a desktop session and then an app within that session. At the bottom of the screen you can see various stats for Bandwidth, Latency and Distance to sever.

Xi Frame - End User Compute  for the Hybrid Cloud 11
Xi Frame - End User Compute  for the Hybrid Cloud 12

Clicking on the cog in the bottom left corner will give you a menu of published apps that you are able to launch, show/hide the stats, change the screen resolution, play with some settings and disconnect or close the session. It’s quite impressive when you remember that this is all running through a browser.

Administration

Switching over the ‘Dashboard’ section will allow you to configure the backend of Frame. Here you can configure your Frame deployment through a number of sections:

Sandbox

From the dashboard you can see your applications that you have published as well as a Sandbox VM. This is the VM that you will install your Applications onto and test the functionality before publishing the VM. Think of this Sandbox VM as your ‘Gold’ or ‘Master’ image. This VM will power off automatically when you are finished with it.

Xi Frame - End User Compute  for the Hybrid Cloud 13

Utility Server

A Utility server is a stand-alone, general purpose Windows server that can be helpful for a variety of use cases, including:

  • License server: Install a network licensing manager for your software on a Frame Utility server. Your production instances can then connect to this server to get the licenses.
  • Backend for a client-server application: Host a database or other backend function in the Utility server directly on your Frame account. Essentially, your entire system can be hosted in the cloud.
  • Shared file server: Store files that can be accessed by all of your users.

Similar to your Sandbox, the Utility server is accessed from the Frame dashboard, where you can power it on and connect to install applications. Unlike the Sandbox however, the Utility server will NOT power off automatically. It was designed for use cases where you need the server to run 24×7.

Xi Frame - End User Compute  for the Hybrid Cloud 14

Launchpads

Launchpads is where you can see your existing launchpads that are published along with the applications that are available within that launchpad. You can also create new launchpads and assign applications to them here:

Xi Frame - End User Compute  for the Hybrid Cloud 15

Users

In this section you will configure your authentication methods for how your users will login to Frame. the options available as of writing are:

  • Frame (built in users)
  • Google
  • SAML2
  • API

These services are enabled with the toggle of a switch.

Xi Frame - End User Compute  for the Hybrid Cloud 16

Capacity

Capacity is where you configure the amount of worker VMs that are allowed to run under your instance and the times they are available to your users.

Xi Frame - End User Compute  for the Hybrid Cloud 17

Analytics

Analytics will give you an insight into what your Frame environment is doing; how many sessions are running, hourly usage, disk usage and elasticity:

Xi Frame - End User Compute  for the Hybrid Cloud 18

Activity

Activity is your audit trail. You can see when sessions were created, Utility servers provisioned or applications provisioned. Everything is logged here:

Xi Frame - End User Compute  for the Hybrid Cloud 19

Settings

The settings page is where you will configure your Frame environment. General Settings, Session settings, Networking, Availability Zones, Profiles and Personal drives:

Status

The status page shows you all of your VMs configured for Frame, whether they are online or not, the kind of instance they are (Sandbox, Production, Utility), IP addresses and the instance type:

Xi Frame - End User Compute  for the Hybrid Cloud 25

The administration interface of Frame is simple. It’s easy to configure and manage applications as well as user sessions. It’s not as complex as other EUC products however, it is quite full featured for a DaaS platform.

Xi Frame is for everyone

Whether you are a graphic designer in a small design firm or a large enterprise with 10,000 seats, Xi Frame could be for you. With the ability to run any App, on any device, in any browser, and on any cloud, Frame is truly a flexible, simple and cost-effective way to deploy an End User Compute solution in your organisation.

Take Xi Frame for a Test Drive here to see how Xi Frame can fit into your place of business.

Nutanix AOS 5.10.4 (LTS) is now available

Nutanix have released AOS 5.10.4 out into the wild as of the 21st of May 2019. This particular release is more of a bug fix release rather than a feature release so no new features are included in this update.

AOS 5.10.4 also brings a new AHV version (AHV-20170830.270) which is also a fix release.

This AOS release brings the version number up to the same as Prism Central (5.10.4) which was released about a week ago (14th of May, 2019).

Resolved Issues in AOS 5.10.4

AHV-Management

  • ENG-73536 – Improved traceability of VMs affected by a High Availability (HA) event by associating a VM name to the HA restart task.
  • ENG-190469 – Improved debugability of AHV by periodically recording runtime traces, which are now collected by NCC.
  • ENG-214823 – You might repeatedly see the following task message on Prism after an AOS upgrade. update vm db state. This task is now hidden and processed in the background.
  • ENG-215649 – The manage_ovs update_uplinks command to update uplinks might not work successfully prior to cluster creation or in manual installation mode.
  • ENG-220169 – The host stalls at Entering Maintenance Mode in a cluster with a Never Schedulable node. Upgrade operations may hang on AOS 5.10.1.
  • ENG-220937 – All the AHV hosts might not be schedulable after upgrading AOS due to a rare race condition that causes all tasks to be stuck.
  • ENG-221101 – The Acropolis Dynamic Scheduler might become unresponsive and prevent VMs from powering on automatically. When powered on manually, the VMs take longer than usual to start up.

Cassandra

  • ENG-194262 – The Cassandra cluster service will add sub-shards as needed on the metadata disks.

Citrix Integration

  • ENG-214809 – You might not be able to see Nutanix VM statistics on Citrix Director on clusters running AOS 5.10.x.

Data Protection

  • ENG-171024 – When you perform a Disaster Recovery failover, the VM disks might display size of 0 GB after the failover. This was observed when datastore containers contained special characters, such as a parenthesis. If you have such datastore naming and disaster recovery replication configured, you must upgrade immediately to mitigate future failover issues.
  • ENG-211423 – After upgrading a Hyper-V cluster to either AOS 5.9.x or 5.10.x, VMs might be randomly missing from Protection Domain snapshots. The following alert message displays. Unable to locate VM(s). If you have Hyper-V with Protection Domains configured, you must upgrade to this release to prevent these failures.

Infrastructure / Services

  • ENG-115506 ENG-121658 – Added capability to utilize in-built IPv4 to Phoenix which is useful for Phoenix workflows (such as SATADOM replacements) in networks that do not support IPv6.
  • ENG-145764 – Curator scans might be unresponsive and not complete if the chronos task management service is also unresponsive. The underlying chronos unresponsiveness has been fixed in this release.
  • ENG-149414 – The upgrade from Hyper-V 2012 R2 to Hyper-V 2016 now ensures that non-default vSwitches are preserved upon upgrade.
  • ENG-170722 – After expanding an AHV cluster with two new nodes, one of the two nodes might not have the correct version of NCC due to a race condition. Multi-node expansions will now have the correct NCC on every new node.
  • ENG-171332 – A 2-node cluster failover is not working during a node failure event. The state of the cluster state goes down and the witness VM is not able to successfully communicate with the nodes as expected.
  • ENG-178459 – Fixed an issue where both expanding a cluster and foundation validate_network_configuration stall because the IP address of the IPMI interface of new node is in a different network than existing nodes.
  • ENG-183195 – Upgrading all hosts might not be successful with AHV 1-click upgrade.
  • ENG-193316 – After replacing an SSD, the Controller VM might stall with the following message even when enough space is available on SATADOM. Memory available on satadom is less than required.
  • ENG-193351 – Enhanced message clarity when the pre-checks fail for 1-click hypervisor upgrade in ESXi.
  • ENG-194445 – Enhanced curator background task distribution. This is a general improvement, but is especially helpful for clusters with small amounts of very large vDisks.
  • ENG-203519 – AOS upgrade fails to start, due pre-check failure test_nos_signature_validation. This pre-check now correctly verifies tarball signatures
  • ENG-204589 – An AOS upgrade from 5.10.1 on an AHV cluster may lead to one or more VMs to restart unexpectedly. This bug occurs more frequently with GPU enabled VMs and was specific to clusters running 5.10.1 through 5.1.3.2.
  • ENG-210039 – Two nodes might stall in kStandAlone for a longer time than expected to move to kSwitchToTwoNode.
  • ENG-219075 – After adding bulk NFS whitelist, a separate entry is created for each NFS port.

LCM

  • ENG-195061 – The node fails to exit Maintenance mode after upgrading BIOS or BMC using Life Cycle Manager. In this case, the actual BMC/BIOS update usually succeeded, but a timing issue within AOS might prevent a node from coming back in service, thus hanging LCM progress. The timing issue has been fixed in this release
  • ENG-203361 – The Remote Procedure Call (RPC) request to stop Foundation service gets stalled during an LCM update run and can cause the LCM to stall forward progress. This is due to a timing issue in which the first RPC does not properly retry. In these cases, stopping foundation does not successfully complete until the cluster service is restarted. With this release, RPC failures will be properly retried to eliminate timing related race conditions.
  • ENG-214486 – Upgrading LCM might not put the Controller VM into maintenance mode gracefully, which could cause Zookeeper to crash. In this release, Controllers will now turn down all services gracefully.

Licensing

  • ENG-204949 – After upgrading AOS to 5.10 with a licensed cluster, the Licensing window displays Starter. When you click Show Licenses, the window shows that the Pro licenses are applied.

NCC

  • ENG-207799 – NCC might stall after upgrading from version 3.6.4 to 3.7.0 running AOS 5.10.1.1.

Networking

  • ENG-192407 – Incorrect Open vSwitch (OVS) datapath flow logic on AHV clusters might have caused MAC address duplication in AHV clusters. Depending on overall system and physical network configuration, MAC address duplication can cause a myriad of issues, including perceived system lockups, upstream network issues, connectivity to Nutanix Volumes (formerly Acropolis Block Services), and cluster level HA events. This issue has been fixed on AHV 20170830.265, which first appeared on AOS 5.10.3.2. AHV 20170830.270 has been included with this release. Given the severity of this issue, customers running AHV versions earlier than 20170830.265 must immediately upgrade AHV after upgrading to AOS to 5.10.3.2 or later.
  • ENG-197819 – Managing uplinks with manage_ovs might lead to unexpected behavior of the network causing the network to become unavailable. This was most commonly seen where there is no bond but just an uplink port associated with a single nic. In this case, it was possible to create a network loop (spanning-tree loop).
  • ENG-220154 – The AHV host might become unresponsive and Controller VMs might go down, due to an edge case deadlock in Open vSwitch (OVS). This issue has been fixed on AHV 20170830.265, which first appeared on AOS 5.10.3.2. AHV 20170830.270 has been included with this release.

Nutanix Guest Tools

  • ENG-215659 – Installing Nutanix Guest Tools fails on Windows 2008, Windows 2008 R2, or Windows 7 running AOS 5.10.2. The following error message displays. The system cannot find the file specified. cmd.exe /c net start ‘Nutanix Guest Agent’. As a workaround, refer to KB 7136 for more details.

Prism Gateway

  • ENG-192496 – The Prism Web Console UI might be unresponsive when you choose to stop using a Certificate Authority (CA) certificate. This has been observed when a CA is unconfigured as well as a corner case during some AOS upgrades.
  • ENG-194345 – You might not be able to add a Protection Domain Schedule with Cmdlet or REST API after upgrading AOS to version 5.5.6 and later.
  • ENG-203298 – The Get VM v1 API call takes an inordinate amount of time due to a delay in fetching vDisk configuration. The time taken is proportionate to the number of VMs in the cluster and relates to an internal call that repeats for every single VM, rather than just running once. This manifested in issues such as slow power-ons using Citrix AHV XenApp / XenDesktop plugin, Citrix AppLayer vDisk attachment failures, HYCU backup failures, sluggish Prism Element and Prism Central performance, and anything else using this API call.
  • ENG-206673 – On the Tasks dashboard, clicking the entity VM might fail with error message Unknown VM error.
  • ENG-214809 – You might not be able to see Nutanix VM statistics on Citrix Director on clusters running AOS 5.10.x.
  • ENG-216371 – NCC version displays unknown on 5.10.3 Prism Element after you upgrade to NCC version 3.7.1 from the Prism user interface.
  • ENG-223007 – Data Unavailability for clustered applications can occur in the following multi-part scenarios.
    • Your cluster was, at any time, running AOS 5.9.x.
    • Your customer is then upgraded to 5.10.x, prior to 5.10.4.
    • You have one or more vDisks serving a workload that uses persistent SCSI-3 reservations (SCSI-3 PR) through iSCSI, such as Microsoft Failover Clusters via Nutanix Volumes (formerly Acropolis Block Services) or HyperV shared VHDX disks.
    • A cluster property is edited such as cluster name, cluster VIP, common criteria mode, using Prism or NCLI edit-params.In this specific sequence of events, the property update triggers a reset of the internal SCSI-3 PR state. This leads to SCSI-3 PR enabled disks to go into a failed state. If this happens, immediately contact Nutanix Support for a workaround to enable data access. Note: This issue does not affect clusters you upgrade to 5.10.x directly from releases earlier than 5.9.x or on fresh installs of 5.10.x.

Prism UI

  • ENG-73536 – Improved traceability of VMs affected by a High Availability (HA) event by associating a VM name to the HA restart task.
  • ENG-125150 – Improved the user experience in Cluster Health where Check pass/fail history for selected cluster field might show inverted colored lines.
  • ENG-182973 – You might not have been able to access the License panel in Prism running on Internet Explorer browser.
  • ENG-202377 – Enabling or disabling flash mode from Prism is not successful.
  • ENG-210030 – Fixed an issue where Prism user interface frequently reloads.
  • ENG-212206 – Fixed an issue of space unit mismatch in Prism Element, where some views had KiB and GiB flipped.
  • ENG-220835 – The VMs table page in the Prism web console triggers a large number of API calls for v1/utils/entities which made the Prism UI sluggish. This was especially noticeable with multiple concurrent users and was the result of incorrect polling logic. Polling logic has been updated to eliminate redundant internal calls.
  • ENG-221193 – The Pause/Resume button on the Prism user interface has been removed as this functionality is no longer officially supported.

Security

  • ENG-158073 – The LDAP authentication might be unsuccessful when the Active Directory user belongs to the primary group.
  • ENG-166243 – Enhanced debugging mechanism added to the backend LDAP authentication service. Fixed existing debugging error that was not allowing LDAP users to login.
  • ENG-168044 – Fixed OpenJDK issue with insufficient index validation in PatternSyntaxException getMessage () (Concurrency, 8199547) (CVE-2018-2952).
  • ENG-209818 – You might experience Prism accessibility issue while upgrading AOS from version 5.5.x to 5.10.x.
  • ENG-210334 – The controls for zone transfer were not properly applied to Dynamically Loadable Zones (DLZs). An attacker acting as a DNS client could use this flaw to request and receive a zone transfer of a DLZ even when not permitted to do so by the allow-transfer ACL. (CVE-2019-6465).
  • ENG-212941 – polkit: Temporary auth hijacking via PID reuse and non-atomic fork (CVE-2019-6133).

Serviceability

  • ENG-194498 – You might not have been able to send a cluster alert email successfully.
  • ENG-195831 – The Controller VM might become unresponsive because of excessive spawning of the remote tunnel service, which consumed more memory than designed.

Services

  • ENG-171024 – When you perform a Disaster Recovery failover, the VM disks might display size of 0 GB after the failover. This was observed when datastore containers contained special characters, such as a parenthesis. If you have such datastore naming and disaster recovery replication configured, you must upgrade immediately to mitigate future failover issues.

Storage/Tools

  • ENG-212236 – vDisk manipulator fails to run on vDisks that are on storage containers with software encryption set to ON.

Zookeeper

  • ENG-158575 – Improved logic used in handling a degraded node for the cluster services.
  • ENG-160764 – Fixed an issue where the cluster service might have become unresponsive.

You can read the full release notes here.

Nutanix Technology Champion 2019 – Second Time Around

Each year the team at Nutanix release a list of names of people from around the globe who have been chosen to be part of the Nutanix Technology Champions.

This group of IT professionals, are from every cloud, application and technology group. Their diverse backgrounds, experiences, and expertise help their organizations and the virtualization community to challenge the status quo.

For the second year running I am honored to have been chosen to be part of this group of IT Pros who advocate Nutanix and the technology that is changing the IT game for 2019.

This year there were a few fellow Aussie’s and Kiwi’s chosen to represent Nutanix as NTC’s: Dan MorrisMatt DayHugh Devaux and  Guy Defryn. It’s amazing to see the number of Aussies/Kiwi’s included in the NTC group growing each year.

2019 is going to be an incredible year as an NTC and I am so thankful to be part of it all.

Congratulations to all who made the NTC list this year.

You can read the official Nutanix NTC announcement article here.

Nutanix AHV with Citrix MCS and Citrix Cloud – VMs not starting from Citrix Studio

I found out the hard way that there are somethings that should remain untouched when it comes to Citrix MCS and Nutanix AHV. I’m hoping this post help others who may have come across similar issues.

TLDR; Do not delete XDSNAP Snapshots on AHV if you are using Citrix MCS for your Virtual Apps and Desktops with the hosting on Nutanix.

Background

I recently deployed a Nutanix cluster to utilise Citrix Cloud with Nutanix AHV (AOS 5.8.2) as the hosting. This solution uses Citrix MCS with Citrix App Layering for the purpose of image management.

The Virtual Apps and Desktops portion of this cluster is utilising Citrix Cloud in a hybrid setup – Citrix Delivery Controllers are on Citrix Cloud whereas the VDAs and StoreFront servers are on-prem. All on-prem servers are Windows Server 2016 based VMs.

Once all the catalogs, delivery groups and VDAs were all set and tested I decided to do some house cleaning in AHV because I noticed a lot of orphaned snapshots when building Machine Catalogs. So I logged into one of the CVMs and take a look at the snapshots that were on the cluster.

I noticed a lot of snapshots existed that started with XDSNAPxxxxxpreparationxxx. I noticed that when I create a Machine Catalog a preparation VM is created, booted and the deleted so I just figured the process just doesn’t remove the leftover snapshot. Being the clean freak that I am, I decided to remove these seemingly orphaned and unused snapshots.

So off I went acli snapshot.delete XDNAP*. All went well and the overall running of the cluster and VM was unchanged. Happy with my house cleaning I proceeded to move through the project and started towards a Pilot phase.

Here is where things started to get a little weird.

The Issue

Just before going into pilot I wanted to test image updates and more specifically scheduled restarts from Citrix Studio (Citrix Cloud). So I created a new Image in Citrix App Layering, published to the cluster and then went through the process of updating the Machine Catalog with the latest image ready to update the VDAs during the scheduled restart window that I had configured on the Delivery Groups. This restart was scheduled to restart the VDAs at 1:30am. I finished up for the day and went home expecting that by the time i got back to work the next day the VDAs would be updated with the new image.

When I got to work the next day however, all the VDAs were powered off. I figured that there must have been some sort of glitch so I tried to power on the VDAs from the Citrix Studio console. Nothing. I tried again. Still nothing. Thinking that it might be an issue with that VM I tried to power on another VDA from Citrix Studio in another machine catalog. Nada. Manually powering on the VM from the Nutanix side obviously works and the power state is reflected in Citrix Studio but the VM does not get the updated image as the power on command needs to come from Citrix Studio. Given that the VM was now on, I decided to see if the shutdown command worked from Studio – it did.

So now I was extremely confused. The shutdown command works but not the start command. How was this possible.

I started troubleshooting.

  • Refreshed my Citrix Cloud session
  • Deleted the VDAs and recreated
  • Checked that the HostedMachineID in Citrix Cloud matched the VM ID in Nutanix
  • Found the Powershell command to start the VDA from Citrix Cloud – Powershell command was accepted but still nothing happened.

At this point I figured there was an issue between the Citrix Cloud Connector (on-prem VM) where the Nutanix AHV plugin was installed and the Nutanix API to start the VM. So I grabbed the Citrix CDFTrace tool and ran it on the Citrix Cloud Connector VM, Started a trace and tried to power on the VM. There was nothing in the logs from the CDFTracte tool that told me that a command was ever received to start the VM. There was however logs to show me that the shutdown command was sent to Nutanix and that it worked.

There was definitely something going on with the communication from Citrix Cloud to Nutanix. By now I had a case open with Nutanix and Citrix. Neither of which could tell me what was going on or why this was happening. I was confused.

The Resolution

The solution to this problem was super simple. So simple it is almost comedic.

Given that this issue was going on far too long and I know this had worked in the past I decided to go back to square one. I deleted the VMs, and also deleted the Machine Catalogs. I decided to leave the Delivery Groups in place as I would be re-creating the Machine Catalogs anyway.

I created a catalog, watched the preparation VM startup and shutdown and then the catalog was created successfully. I added the VMs created with that new catalog to an existing Delivery Group and what do you know, the VMs started. I wasn’t entirely surprised because I knew this had worked in the past.

Once the VMs had started and the VDAs were registered I did some testing. Shutdown the VM from Citrix Studio and it worked instantly. The real test was powering it back on with Citrix Studio. I crossed my fingers clicked the start button. Waited a few seconds and up came the VM! It had worked! 🙃 I waited for it to register and repeated the process. It had worked a second time.

Now that one of the catalogs was working I set about re-creating the remaining catalogs and performed the same tests. Worked. Every. Time. 🤗

The one ting I did differently this time? I didn’t delete the XDSNAP snapshot that is created when the Machine Catalog is created.

The End

Multiple days have passed now and the VMs are restarting as they should according to their schedules without any issue.

The XDSNAP is the master vDisk of the Machine Catalog. When you remove it, everything will ok until a reboot. The master vDisk will be the source of all reads at first, later, data is moved locally for performance and scalability hence why it works while still running. Remove it and the initial reads will fail. the only point at which these snapshots can be deleted is when you have pushed an upaded image and all VDAs have rebooted and are now running off the new image.

Bottom line is – Don’t delete your snapshots too quickly.

Thank you to Kees Baggerman (Nutanix) for the explanation of how the XDSNAP is used.

Nutanix AOS 5.9 (STS) is here!

AOS 5.9 (STS) is now available to download from the Nutanix Portal.

NOTE: AOS 5.9 is a Short Term Service (STS) branch of AOS. This means it is covered by support for only 6 months. If this is too short of a support window for you it is recommended you stay on the Long Term Service (LTS) branch of AOS which is currently AOS 5.5.6

NOTE: After upgrading to AOS 5.9 from AOS 5.5.6, the web console Upgrade Software AOS tab incorrectly shows the current AOS 5.9 versions as an LTS (long term support) release. AOS 5.9 is a Short Term Support (STS) Release.

Here is a breakdown of what is included in this release.

New and Updated Features

  • New Settings Menu in the Web Console – The settings menu has been redesigned which is a welcome change. Settings are now easier to find and broken up into groups.
     
  • RDMA for Nutanix NX G6 Platforms – Remote Direct Memory Access RDMA provides a node with direct access to the memory subsystems of other nodes in the cluster, without needing the CPU-bounded network stack of the operating system. RDMA allows low-latency data transfer between memory subsystems, and so helps to improve network latency and lower CPU use.
  • NGT Management in Prism Central – With the Nutanix Guest Tools (NGT) bulk operations feature, you will be able to select multiple VMs in Prism Central VM entity browser and install, manage, upgrade NGT on these VMs.
  • Rack Fault Tolerance – This is a big one for me. You can now configure your nodes to be rack aware meaning that redundant copies of data are made and placed on the nodes that are not in the same rack.
  • Background Encryption – Another huge one for me. Previously Software based encryption could be enabled but only if the container is empty. This meant that you had to know ahead of time if you wanted to encrypt a container or not. Now with AOS 5.9 you can turn on encryption with existing data on the container – Brilliant!
  • Linux Guest Clustering for Nutanix Volumes
  • Support for NVIDIA Tesla V100 16 GB GPU
  • Support for VMware ESXi 6.7
  • Metro Support for Hyper-V
  • Interoperability Between Asynchronous DR and NearSync DR in a Protection Domain
  • Application-Consistent Snapshot Support for NearSync DR
  • One-Time Snapshot Support for NearSync DR
  • No longer able to delete Storage Pools
  • Recursive Directory Search Option

Tech Preview

  • Nutanix Karbon 0.8 – The ability to deploy and managing Kubernetes clusters using Linux containers.

Upgrade Restrictions

  • AOS 5.9 supports upgrading your cluster from the AOS 5.5, AOS 5.6, and AOS 5.8 family versions. You cannot upgrade to AOS 5.9 from the AOS 5.1 family versions.
  • Upgrading software through the web console (1-click upgrade) does not support configurations where a cluster is mapped to two vCenters, includes host-affinity VMs, and is mapped to two VMware clusters in the same vCenter.

Overall AOS 5.9 is a pretty big release with quite a few goodies i there that I am personally excited about.

If you want to upgrade to AOS 5.9 here is the link to the portal page.

Enjoy!

Prism Central – Configure SAML Authentication using ADFS

The latest version of Prism Central (v5.8) brought a bunch of new features. One of these features is the support for using an external Identity Provider (IDP) instead of or along side LDAP (Active Directory or OpenLDAP).

For this post I’ll be configuring ADFS for SSO to Prism Central.

Preliminary work

To make this whole process easier, grab the Federation XML from your ADFS site.
Head on over to https://federationURLHere/federationmetadata/2007-06/federationmetadata.xml and download the XML. You will need to substitute in your Federation URL (eg. sts.corp.com). This will download an XML document which contains the settings Prism Central needs to setup the connection.

Crete an A record for your Prism Central VM (eg. prism.domain.local).

Prism Central Configuration

Before you go any further, click on the link at the bottom of the Authentication window that says “Download Metadata”. This will download another XML for use later when we need to configure ADFS so keep this safe for the time being.

Once you have the XML, in Prism Central, head to the gear icon and select Authentication.

Prism Central - Configure SAML Authentication using ADFS 29

Now go ahead and click the New IDP button so we can configure the Prism Central side of things.

Prism Central - Configure SAML Authentication using ADFS 30

Now we can give our configuration a name (this name will be shown on the Prism Central login page) – I went with something super original, ADFS, and then click on the radio button for “Upload Metadata”. Once you click the radio button you’ll see an Import Metadata button. You do have the option to configure this manually if you can’t get the metadata for whatever reason.

Prism Central - Configure SAML Authentication using ADFS 31

Click the Import Metadata button and select the FederationMatadata.xml you downloaded earlier from your federation URL.

Prism Central - Configure SAML Authentication using ADFS 32
Prism Central - Configure SAML Authentication using ADFS 33

Once the XML has been uploaded, click Save.
you’ll be taken back to the Authentication Configuration page where you can see your configured IDP Authentication, in this case ADFS.

Prism Central - Configure SAML Authentication using ADFS 34

you can now go ahead and setup your Role Mappings for this new Authentication type. Note: When using IDP as opposed to LDAP, you cannot map roles to groups. Role mapping is done to an individual user not a group. For ADFS, this needs to be the users UPN.

Prism Central - Configure SAML Authentication using ADFS 35

Now that the Prism Central config is done, we can switch over to our ADFS server and configure the connector to Prism Central.

ADFS Configuration

Adding a Relaying Party Trust

The connection between ADFS and Prism Central is defined using a Relying Party Trust (RPT).

Select the Relying Party Trusts folder from AD FS Management, and add a new Standard Relying Party Trust from the Actions sidebar. This starts the configuration wizard for a new trust.

Prism Central - Configure SAML Authentication using ADFS 36

In the Select Data Source screen, select Import data from the relaying party from a file.

Prism Central - Configure SAML Authentication using ADFS 37

Choose the metadata file that was downloaded from Prism Central.

Prism Central - Configure SAML Authentication using ADFS 38

On the next screen, specify a display name.

Prism Central - Configure SAML Authentication using ADFS 39

You may configure multi-factor authentication on this next screen, but this is beyond the scope of this guide.

Prism Central - Configure SAML Authentication using ADFS 40

On the next screen, select the Permit all users to access this relying party radio button.

Prism Central - Configure SAML Authentication using ADFS 41

On the next two screens, the wizard will display an overview of your settings. On the final screen use the Close button to exit and open the Claim Rules editor.

Prism Central - Configure SAML Authentication using ADFS 42

Creating Claim Rules

Once the relying party trust has been created, you can create the claim rules and update the RPT with minor changes that aren’t set by the wizard. By default the claim rule editor opens once you created the trust.

To create a new rule, click on Add Rule. Create a Send LDAP Attributes as Claims rule.

Prism Central - Configure SAML Authentication using ADFS 43

On the next screen, using Active Directory as your attribute store, do the following:

  1. From the LDAP Attribute column, select User-Principal-Name.
  2. From the Outgoing Claim Type, select Name ID.
Prism Central - Configure SAML Authentication using ADFS 44

Click on OK to save the new rule.

Fort the most part, the ADFS config is now complete and should work when you go to Prism Central and select Login with ADFS. However I did notice that when logging in, it was sending the IP address of Prism Central to login to (which is fine on the same network). However, I wanted to adjust the trust so that the correct URL was sent and so I wouldn’t get SSL cert errors.

Adjusting the Trust Settings

With the ADFS Management screen still open, highlight your Prism Central trust and click properties in the action pane.

Prism Central - Configure SAML Authentication using ADFS 45

Switch to the Identifiers tab and add a new relying party identifier (this will be your DNS record for Prism Central).

Prism Central - Configure SAML Authentication using ADFS 46

Now switch to the Endpoints tab. You’ll notice that the SAML Assertion Consumer Endpoint is set to the IP address of the Prism Central VM. Highlight the SAML Assertion Consumer Endpoint and click edit.

Prism Central - Configure SAML Authentication using ADFS 47

Now, under the Trusted URL, enter your Prism Central DNS address instead of the IP address.

Prism Central - Configure SAML Authentication using ADFS 48

Click Ok to close the trust editor and give it a few minutes for Federation Sync to occur.

Head on over to Prism Central in your web browser and you should now see a Login with ADFS (or whatever you called it in Prism Central) button above the username/password fields.

Prism Central - Configure SAML Authentication using ADFS 49

And there you have it. You can now login to Prism Central with ADFS.

How To Remove Old Snapshots on Nutanix AHV

Sometimes when deleting VMs you can forget to check if the VM you are deleting has existing snapshots prior to it being deleted resulting in orphaned snapshots hanging around on the cluster. This can also happen when VMs are being deleted by API (when using something like Citrix App Layering) and the API call doesn’t delete the existing snapshots.

Luckily we can use acli to remove these old snapshots and restore order to your cluster once more.

acli

To clean up your old or orphaned snapshots, SSH to your cluster IP address (or directly to a CVM). Now you’ll want to switch over to the acli prompt. If you switch to the acropolis prompt you’ll be able to make use of handy tab completion acli
This will give you the acropolis prompt.

How To Remove Old Snapshots on Nutanix AHV 50

Now you can list the existing snapshots on your cluster with: snapshot.list

How To Remove Old Snapshots on Nutanix AHV 51

As you can see, I have a lot of snapshots on the cluster. I want to remove all the snapshots that Citrix Applayering has left behind. these are the snapshots starting with XDSNAP_. To achieve this I use the below commandsnapshot.delete XDSNAP*

Before you do this, ensure you actually want to remove ALL snapshots that begin with XDSNAP.

This command will ask me if I want to go ahead and delete all snapshots that start with XDSNAP. Once confirming, I will be able to see the result against each snapshot name.

How To Remove Old Snapshots on Nutanix AHV 52

To confirm my snapshots have been deleted I can run snapshot.list once again to view the snapshots on the cluster.

How To Remove Old Snapshots on Nutanix AHV 53

All my old or orphaned snapshots are now gone and my cluster makes sense again.

Nutanix Beam – Multi-Cloud Optimisation to Reduce Costs & Enhance Security

One of the biggest and most exciting announcements to come out of the annual Nutanix .NEXT conference this year is Nutanix Beam. Beam is a new Software as a Service (SaaS) offering from Nutanix as part of the Nutanix Enterprise Cloud OS which is designed to help customers optimise and better manage cost and security when it comes to their public clouds. Currently Amazon AWS and Microsoft Azure are the two supported public clouds however this could expand in the future to include Google Cloud Platform.

I’m going to make this a two part blog series as there’s just too much to cover in a single post. Part 1 (this post) will give you an overview of Nutanix Beam and an introduction to the Beam UI and what you can expect to find there. Part 2 will continue through the Beam UI and talk a bit about the Cost Optimisation and automation features that Beam is best known for. So lets get into Part 1!

Beam Overview

Nutanix Beam provides you deep visibility into your multi-cloud environment and ability to optimize your clouds with one-click. Beam identifies underutilized and unused cloud services and then provides convenient single click remediation suggestions, empowering cloud operators to realize cost savings immediately.

Beam tracks cost consumption across all cloud resources at both aggregate and granular levels – per application workload, team and business unit. The visibility that Beam provides will help you identify the cost of your applications across multiple clouds as well as give you projections for future costs across your services.

Pair the simplicity of Nutanix One-Click operations and the deep visibility of Beam and you have a multi-cloud optimisaiton platform that is sure to disrupt the industry yet again.

If you want a more in depth overview of Beam head on over to the Nutanix blog here to read up.

I wanted to take Beam for a test drive to get an idea of how this new service works and how easy it is to use. I started this blog with the intention of it being an overview on the service however it has quickly evolved into a deep dive of sorts. Come with me on this journey as I take Beam for a spin and have a look at the different dashboards and screen Beam provides.

Getting Started

Beam Signup

Beam is a SaaS offering so no on premise hardware is required. Nutanix is currently offering a 14-day trial so you can get a feel of Beam and if it will work for you. Head on over to https://beam.nutanix.com to login or https://www.nutanix.com/products/beam/signup/ to sign up for a free trial.

Adding your Clouds

Beam Signup

Once you are signed up and logged in to the Beam service you are greeted with a screen to select which public cloud you would like to add to Beam.

Beam Signup

After you select with AWS or Azure you are then given the option to take a tour of Beam with demo data or add your live AWS or Azure tenancy to Beam to play with live figures that actually mean something to you. In order to add your tenancy (in my case Azure) you’ll need your Enrollment ID and Access Key (API). The Beam setup wizard helps you with where to find these pieces of information.

Dashboard

The first thing you are presented after adding your AWS or Azure tenancy is the Dashboard and wow is this thing beautiful. One thing Nutanix has always done well is their UI/UX. The UI is easy to understand, not overcrowded and give you enough information without being like drinking from a firehose.

Beam Signup
Beam Signup

The Cost Governance Dashboard is broken up into four sections:

  • Spend Overview – Your spend across your various subscriptions (if you have multiple subscriptions) otherwise it will show your services if you only have the one subscription (Virtual Machines, Data Services, Business Analytics, Cloud Services, Others)
  • Spend Analysis – Spend trend over time along with your Average Daily Cost, Last 7 Day Spend, and your Projected Yearly Spend
  • Top Services – Breakdown of your most used Services
  • Top Resource Groups – Breakdown of your most used Resource Groups

Clicking on any of the values in the graphs will take you to the Analyze page (more on this page below) for that service so you can dig in a bit further to see where your spend is going.

This dashboard is perfect for getting a quick insight into how your costing is looking across your entire tenancy. What’s nice is that you are also given a future projection based on the current usage which will help to tailor your usage over time.

Analyze

If you click on any of the graphs or figures on the Dashboard you’ll come through to the Analyze page. This is the spot where you can drill further down into your spending to see what each service is costing you.

The Analyze section is split into 5 subsections:

  • Current Spend
  • Projected Spend
  • Virtual Machine
  • Storage
  • Data Services

Current Spend

As the name of this screen suggests, the Current Spend section is all about analyzing your current spend (duh). This is where you can see your current costs across your services within a specific time frame. The default view is for 7 days worth of data however there is a filter off to the right that will allow you to change the time frame.

Overview

Current Spend Overview

The first page you land at in the Current Spend section is the Overview screen. This screen gives you a look at what you are spending on a day to day basis. There is also options to change the graph view from Bar to Line, Pie or Table view as well as a drop down to quickly change from Day to Month view.

The subsequent screens following on from Overview (Subscriptions, Service Categories, Service Type, Region, Tag, Cost Center, Department, Resource Group) will give you a similar view of the cost breakdown per day (in the default view) across those additional Azure sections. Hovering over each element on the graph will pop out some meaningful tool tips showing you the cost breakdown for that particular day. In order to keep this post from turning into a small novel, I’ve included screen shots of those screens below.

Current Spend Subscriptions
Current Spend Service Categories
Current Spend Service Type
Current Spend Region
Current Spend Tag
Current Spend Cost Center
Current Spend Department
Current Spend Resource Groups

Projected Spend

The Projected Spend section is all about analyzing your potential future spend (calculated by how you are using your resources now). This is where you can see your predicted costs across your services within a specific time frame. The default view is for 3 months + the current month of data however there is a filter off to the right that will allow you to change the time frame.

Overview

Projected Spend Overview

The first page you land at in the Projected Spend section is the Overview screen. This screen gives you a look at what you are spending on a day to day basis. There is also options to change the graph view from Bar to Line, Pie or Table view as well as a drop down to quickly change from Day to Month view.

Again, subsequent screens following on from Overview (Subscriptions, Services) will give you a similar view of the cost breakdown across those additional Azure sections. Hovering over each element on the graph will pop out some meaningful tool tips showing you the cost breakdown for that particular day. I’ve included screen shots of those screens below.

Current Spend Department
Projected Spend Services

Virtual Machine, Storage & Data Services

The final three sections on the Analyze page are very similar to one another (which is why I have grouped them together). They each have four sub screens (Overview, Sub Service, Region, Service ID) and the same default filter (7 days) on the right. These pages will again show you a break down of cost per day across the different sections of each service giving you the granularity to see exactly what is costing you money.

I think I will leave it here for today as this has become quite a long post.
Stay tuned for the next post on Nutanix Beam where I will dive into the Cost Optimisation and automation features that is generating all the buzz around Nutanix Beam.

Nutanix AHV Best Practices Guide Updated – v4.1 Now Available

There are new versions of AOS and AHV in the wild and so the AHV Best Practices Guide (BPG) has been updated to cover any new features or enhancements that may have come out since the last BPG was released. The latest AOS version number depends on what release track you have decided to be on (AOS 5.6 for Short Term Support or AOS 5.5.2 for Long Term Support).

The AHV Best Practices Guide covers everything from the Nutanix AHV Architecture Overview through to Live Migration, Resource Oversubscription and VM Data Protection.

Here’s a quick snapshot of the table of contents:

  • Networking
  • Virtual Machine High Availability
  • Acropolis Dytnamic Scheduler
  • VM Deployment
  • Nutanix Guest Tools
  • VM Data Protection
  • Hypervisor Mobility and Conversion
  • Live Migration
  • CPU Configuration
  • Memory
  • Hotplug CPU and Memory
  • AHV Turbo Technology
  • Disks
  • Resource Oversubscription
  • Hugepages

There is a lot of great content in there so go grab yourself a copy of the updated AHV BPG and get reading!

You can grab the updated AHV Best Practices Guide from the Nutanix public website here or if you have access to the Nutanix Portal you can grab it from here

Nutanix Era: It’s all about Databases

At this year’s .NEXT conference, Nutanix pulled back the covers on a few new services they have been working on. I’m going detail a few of them here on my blog. First up is Era.

An Intro to Nutanix Era

Nutanix Era is a new suite of software from Nutanix which is set to bring the famous ‘One-Click’ simplicity to Database provisioning and Lifecycle Management. The first piece of Era will be Copy Data Management (CDM), which gives DB Administrators the ability to provision, clone, refresh and restore their databases to any point in time through a Prism like UI.

Era

Copy Data Management (CDM)

When Era becomes available later this year there will be two funcitons/actions within CDM that will be availble to users, Time Machine and Database Clone/Refresh.

One-Click time machine uses Nutanix based snapshots and application-centric APIs to create space efficient snapshots which enables databases running on Nutanix to be cloned or restored to any specific point in time – even up to the last database transaction.

One-Click database clone/refresh allows admins to create database clones/refreshes to any point in time in just a few minutes thanks to the Time Machine like snapshots.

Era
Era

As CDM is just the first piece to Era, these capabilities will expand over time to include more features.

Thoughts on Era

Nutanix Era is a perfect evolotunary step for Nutanix following on from X-Tract for DBs. Imagine being able to bring your DBs across from your legacy platforms (while applying best practices) and then use a familiar Prism like interface to then manage and interact with your databases on a daily basis. Seems like a winner to me. Even just being able to interact with your database without having to use the vendor tools is a massive win. I am definitely looking forward to playing with Era and I know a few DBAs who will love these features too.

Nutanix Era is expected to be available in the second half of this year (2018) with support for Oracle and Postgres databases initially. This will expand over time to include an increasing number of popular databases, starting with Microsoft SQL Server and MySQL.

Head on over to the Nutanix Blog to read the official Nutanix post on Era and the other new services that were recently announced at .NEXT.