ESXTOP resources

In my continuing study and expansion of my knowledge of VMware I wanted to go more in-depth on the tools under the covers.  The most important tool for performance tuning and troubleshooting is ESXTOP, which is similar to the TOP command in Linux but is geared toward ESX and ESXi installations.  Instead of regurgitating and paraphrasing what I have found, I will supply the links to the appropriate pages.

First off I find Duncan Epping’s page on ESXTOP outstanding.  Not only does he go through and sum up the counters from the ESXTOP bible, but he also gives you recommended thresholds.  This way you have a point of reference of to help you spot issues right-away.  He also goes on explaining how to run ESXTOP in batch mode and then how to interpret the data using Excel, ESXPlot, and PerfMon.  This is my goto page for immediate reference.

Next up is the ESXTOP bible.  I found two versions of this: one for vSphere 4.0 and one for vSphere 4.1.  I have NOT compared both of them; I have focused mainly on the 4.1 version as this is the environment I am currently supporting.  This page does a deep dive into the counters explaining what they are and how they are derived.

Then I found a handy little reference card that give a short summary of the most important counters to know.

Finally, I found reference to a PowerShell commandlet that allows you to access this tool via a script.  When I looked for more information I found some articles by LucD going in-depth on how to use the commandlet.

Look at these great references and let me know if I missed any other.

Understand and apply LUN masking using PSA-related commands

Per knowledge base article 1009449.

  1. Look at the Multipath Plug-ins currently installed on your ESX with the command:

    # esxcfg-mpath -G

    The output indicates that there are, at a minimum, 2 plug-ins: the VMware Native Multipath Plug-in (NMP) and the MASK_PATH plug-in, which is used for masking LUNs. There may be other plug-ins if third party software (such as EMC PowerPath) is installed. For example:

  2. List all the claimrules currently on the ESX with the command:

    # esxcli corestorage claimrule list

    There are two MASK_PATH entries: one of class runtime and the other of class file.

    The runtime is the rules currently running in the PSA. The file is a reference to the rules defined in/etc/vmware/esx.conf. These are identical, but they could be different if you are in the process of modifying the /etc/vmware/esx.conf.

  3. Add a rule to hide the LUN with the command:

    # esxcli corestorage claimrule add –rule <number> -t location -A <hba_adapter> -C <channel> -T <target> -L <lun> -P MASK_PATH

    The parameters -A <hba_adapter> -C <channel> -T <target> -L <lun> define a unique path. You can leave some of them unspecified if the LUN is uniquely defined. The value for parameter –rule can be any number between 101 and 200 that does not conflict with a pre-existing rule number from step 2.


  4. Verify that the rule has taken with the command:

    # esxcli corestorage claimrule list

    The output indicates our new rule. It is only of class file. You must then load it into the PSA.


  5. Reload your claimrules with the command:

    # esxcli corestorage claimrule load


  6. Re-examine your claim rules and you verify that you can see both the file and runtime class. Run the command:

    # esxcli corestorage claimrule list

  7. Unclaim all paths to a device and then run the loaded claimrules on each of the paths to reclaim them. Run the command:

    # esxcli corestorage claiming reclaim -d <naa.id>

    where <naa.id> Is the naa id used in step 3. This device is the LUN being unpresented. This command attempts to unclaim all paths to a device and runs the loaded claimrules on each of the paths unclaimed to attempt to reclaim them.

  8. Verify that the masked device is no longer used by the ESX host.

    If you are masking a datastore, perform one of these options:

    • Connect the vSphere Client to the host and click HostConfigurationStorage, then click Refresh. The masked datastore does not appear in the list.
    • Rescan the host by navigating to HostConfigurationStorage Adapters > Rescan All.
    • Run the command:

      # esxcfg-scsidevs -m

      The masked datastore does not appear in the list.

      To verify that a masked LUN is no longer an active device, run the command:

      # esxcfg-mpath -L | grep <naa.id>

 

Determine appropriate RAID level for various Virtual Machine workloads

When you determine the volume layout, you evaluate the type of data to be stored and the number of volumes that you want to create.  Each logical drive should be on a separate volume, for easy future expansion if needed and better performance.

RAIDImage

Typically with the operating system and application data, you would use a RAID 5 volume.  For things like transaction logs or volumes requiring a high volume of changes, you should use a RAID 1 or RAID 1+0 volume.

Physical Design Diagram

The third design diagram is the physical design diagram.  THis diagram will list specific hardware and vendors; along with CPU, memory, storage, PCI cards, configurations and LUN assignments.  This design is typically more than one diagram but usually more like a series of diagrams.  Below is an example of a physical design.

PhysicalImage

Logical Design

After you create a conceptual design, you next need to create a logical design diagram.  This diagram is a lower-level design showing the relationship of the hosts, storage and networks; and how they interact with each other.  All devices are listed as generic containers, specific hardware, CPUs, memory, LUNs and so on are NOT listed in the logical design.  Below is an example of a logical design diagram.

LogicalImage