Upgrading your vSphere environment to 5.0 – Part 1 – Licensing

So the time has come to upgrade your existing infrastructure to vSphere 5. Unless you are new to the IT field you will not just pop in the CD and do an upgrade.  The first thing you need to do is to look at your licensing to make sure that your upgraded licenses will cover your existing infrastructure. With vSphere 5, the licensing has been moved from a physical CPU centric model to a vRAM usage model.  For example, if you scaled up your virtual environment (using bigger beefier servers) like a quad-processor quad-core with 1 Terabyte of RAM; under the old licensing shame you needed to purchase four single CPU licenses.  Going to the same licensing level (will use Enterprise+) you are only allotted 96GB per single CPU license which would only allow you to use a maximum of 384 GB for your virtual machines.  To cover your 1 TB of RAM to be used for your virtual machines you would need a total of 11 CPU licenses.  If you are running extremely large virtual machines with a lot of RAM then the maximum amount of memory counted would be that of 1 Enterprise Plus license or 96GB.  Anything above that is not counted for licensing purposes.

Another issue with upgrading that you should plan for is typically when you under go a major version upgrade your Enterprise Plus licensing is downgraded to Enterprise level.  You will need to talk to your VMware representative for your options on this. Updated: According to the entitlement map on VMware’s website, if you are currently an Enterprise Plus license will map to the new Enterprise Plus licensing and so forth.  This is great news for companies that want the enhanced capabilities and don’t want to pay for upgrade licensing like we did going from version 3 to version 4.

vsphere5license1

There are also a number of new features that are only available in vSphere 5 if you only have Enterprise Plus licensing.  These include the new Storage DRS and Profile drive storage.  All these items need to be taken into account, planned for and budgeted before doing your upgrade.

vSphere5Features

 

 

Signs That your disk usage is too high

Whether you are converting physical to virtual, creation of new machines, or existing virtual machines are using additional resources then originally specified two years ago; at sometime your environment will run out of resources. Most of the time you will run out of disk or memory.

When it comes to running out of disk, you could run out of actual space or the number of IOPS (Input Outputs Per Second). Most SANs and even VMware is very good at showing space utilization, but to show that you are suffering issues from maxing out the I/O is not as easily understood.

First lets look at the GUI, you can go to a specific virtual machine and click on the performance tab.  Click the Advanced button.  Under the Switch To drop-down box, select Disk.  Then click on Chart Options, then select Command Aborts. Click OK.  As you can see in my example we are at zero which is good, if this virtual machine was registering anything.  If you have anything other then zeroes, then you I/O pipe is bottlenecked somewhere and the storage is issuing a SCSI command abort.

NewImage

I prefer to do it by the virtual machine, but you can also do it by selecting the host and going through the same steps.  The difference is that the Virtual Machine view will give you only the LUNS that are connected to that virtual machine, where in Host view it will give you all the LUNs connected to the host.

The next method I prefer to use as I think it is not only faster but more accurate; and that is using the command line and ESXTOP.  From the either local tech support or remote tech support command line enter ESXTOP, I prefer removing in via SSH so that you can expand the screen.  Once in ESXTOP, press the d key on the keyboard, this will bring you to the disk view and show you all list disk controllers.  The counters you want to look at specifically are:

  • DAVG/cmd
  • KAVG/cmd

ESXTOP1

DAVG/cmd should not be over 25, this usually means that there is an issue on the array causing the latency.  KAVG/cmd should never be over 2, if it is there is an issue with the host that is causing the latency.  This could be that the is a virtual machine with a high amount of I/O load (SQL or Exchange) that could be causing the issue.  To find out hit the v key on the keyboard to breakout the disk usage by virtual machine.  To break it out LUN you can press the u key.  This way you can figure out the culprit or culprits and take steps to fix the issue and bring the environment back to normal.

 

 

ESXTOP resources

In my continuing study and expansion of my knowledge of VMware I wanted to go more in-depth on the tools under the covers.  The most important tool for performance tuning and troubleshooting is ESXTOP, which is similar to the TOP command in Linux but is geared toward ESX and ESXi installations.  Instead of regurgitating and paraphrasing what I have found, I will supply the links to the appropriate pages.

First off I find Duncan Epping’s page on ESXTOP outstanding.  Not only does he go through and sum up the counters from the ESXTOP bible, but he also gives you recommended thresholds.  This way you have a point of reference of to help you spot issues right-away.  He also goes on explaining how to run ESXTOP in batch mode and then how to interpret the data using Excel, ESXPlot, and PerfMon.  This is my goto page for immediate reference.

Next up is the ESXTOP bible.  I found two versions of this: one for vSphere 4.0 and one for vSphere 4.1.  I have NOT compared both of them; I have focused mainly on the 4.1 version as this is the environment I am currently supporting.  This page does a deep dive into the counters explaining what they are and how they are derived.

Then I found a handy little reference card that give a short summary of the most important counters to know.

Finally, I found reference to a PowerShell commandlet that allows you to access this tool via a script.  When I looked for more information I found some articles by LucD going in-depth on how to use the commandlet.

Look at these great references and let me know if I missed any other.

Minimum permissions needed to deploy from a template

So the other day, I get a call from an admin that a customer opened a ticket because the system administrators for one of the virtual environments are no longer able to deploy from a template.  After looking through I found interesting enough, ALL the roles were gone.  Long story short after doing some research I discovered the minimal permissions that an admin needs to create a virtual machine from a template, this includes using the predefined customizations.

At the Datacenter Level

  • Datastore > Allocate Space
  • Resource > Assign virtual machine resource pool

At the specific folder level (production & template folders)

  • Host > Local operations > Create virtual machine
  • Virtual Machine > ALL settings

Virtual Center Settings

  • Virtual Machine > Provisioning > Read customization specifications

Understand and apply LUN masking using PSA-related commands

Per knowledge base article 1009449.

  1. Look at the Multipath Plug-ins currently installed on your ESX with the command:

    # esxcfg-mpath -G

    The output indicates that there are, at a minimum, 2 plug-ins: the VMware Native Multipath Plug-in (NMP) and the MASK_PATH plug-in, which is used for masking LUNs. There may be other plug-ins if third party software (such as EMC PowerPath) is installed. For example:

  2. List all the claimrules currently on the ESX with the command:

    # esxcli corestorage claimrule list

    There are two MASK_PATH entries: one of class runtime and the other of class file.

    The runtime is the rules currently running in the PSA. The file is a reference to the rules defined in/etc/vmware/esx.conf. These are identical, but they could be different if you are in the process of modifying the /etc/vmware/esx.conf.

  3. Add a rule to hide the LUN with the command:

    # esxcli corestorage claimrule add –rule <number> -t location -A <hba_adapter> -C <channel> -T <target> -L <lun> -P MASK_PATH

    The parameters -A <hba_adapter> -C <channel> -T <target> -L <lun> define a unique path. You can leave some of them unspecified if the LUN is uniquely defined. The value for parameter –rule can be any number between 101 and 200 that does not conflict with a pre-existing rule number from step 2.


  4. Verify that the rule has taken with the command:

    # esxcli corestorage claimrule list

    The output indicates our new rule. It is only of class file. You must then load it into the PSA.


  5. Reload your claimrules with the command:

    # esxcli corestorage claimrule load


  6. Re-examine your claim rules and you verify that you can see both the file and runtime class. Run the command:

    # esxcli corestorage claimrule list

  7. Unclaim all paths to a device and then run the loaded claimrules on each of the paths to reclaim them. Run the command:

    # esxcli corestorage claiming reclaim -d <naa.id>

    where <naa.id> Is the naa id used in step 3. This device is the LUN being unpresented. This command attempts to unclaim all paths to a device and runs the loaded claimrules on each of the paths unclaimed to attempt to reclaim them.

  8. Verify that the masked device is no longer used by the ESX host.

    If you are masking a datastore, perform one of these options:

    • Connect the vSphere Client to the host and click HostConfigurationStorage, then click Refresh. The masked datastore does not appear in the list.
    • Rescan the host by navigating to HostConfigurationStorage Adapters > Rescan All.
    • Run the command:

      # esxcfg-scsidevs -m

      The masked datastore does not appear in the list.

      To verify that a masked LUN is no longer an active device, run the command:

      # esxcfg-mpath -L | grep <naa.id>