Gathering real-time vSphere performance metrics in parallel using PowerCLI

Overview

In a previous post I described how to list all vCenter performance metrics and a project that I was working on. To recap, I was asked to gather all real-time performance data from all virtual machines in vCenter and publish the results to an API. As in the previous post on this topic, I’m going to use PowerCLI, but in upcoming posts I’ll show how to get the same data using Perl using the vSphere CLI (I haven’t kept up with the VMware Perl Toolkit in years and it looks like it was merged into the vSphere CLI) and with Ruby using VMware’s rbvmomi. I’d like to integrate the latter two with docker (I’ve created a docker image that uses the vSphere CLI to gather the data but need to put some finishing touches on it).

Processes vs Threads

In Powershell there are at least two ways of parallelizing your scripts: jobs and runspaces. Jobs create separate processes whereas runspaces use threads. You can read more about processes vs threads here. If you use jobs, you will see a powershell.exe processes for each job you create. If you use threads, you’ll see a single powershell.exe. I tried both approaches and found that, for my use case, using jobs was faster than using runspaces so I’m going to show the jobs based approach here. It takes longer to create a process than it does to create a thread so it takes the jobs based approach a little longer to get going, but once they do they performed more quickly than runspaces. Since the work that the jobs/threads performed took upwards of a minute and a half, the slower start up times of jobs was worth the trade off. I imagine as the time it takes for your jobs/threads to perform their work decreases, the balance would start to favor using runspaces.

Getting real-time VM performance data

We will be using the Get-StatType command to retrieve what stat types are available for a VM. In my case I’m only concerned about real-time data, which is stored on the vCenter server for an hour before it’s rolled up into the database in 5 minute intervals. Here is an example of retrieving the available real-time stat types for a VM:

$vm = get-vm MyVM

Get-StatType -entity $vm -realtime

If a VM has been powered off for an hour (or if vCenter hasn’t been receiving performance data for the VM from the host for any reason), you’ll receive no results back when you run the Get-StatType command. If you look at the VM’s real-time performance in vCenter’s performance charts, you’ll see “Performance data is currently not available for this entity.” Since real-time data is collected every 20 seconds, if you power the VM on and wait 20 seconds, you’ll then be able to retrieve the available stat types as well as see data in the vCenter performance charts. After doing this, here is the results of selecting the first five available stat types:

Get-StatType -entity $vm -realtime | sort | select -first 5

cpu.costop.summation
cpu.demand.average
cpu.entitlement.latest
cpu.idle.summation
cpu.latency.average

Now we need to retrieve the performance metrics of the stat types for the VM. My requirements are pretty simple so there is quite a bit more to stats gathering than what I’m going to show. All I need to get is the real-time data every hour. Again, I’d suggest reading Lucd’s awesome PowerCLI & vSphere statistics series for more info. Skimming through it makes me realize I should read it again.

Let’s get the first 5 data points for the ‘cpu.costop.summation’ stat type:

Get-Stat -entity $vm -realtime -stat ‘cpu.costop.summation’ | ? { $_.instance -eq “” } | select -first 5 | format-table -auto

MetricId             Timestamp            Value Unit        Instance
--------             ---------            ----- ----        --------
cpu.costop.summation 6/18/2015 8:17:20 PM     0 millisecond
cpu.costop.summation 6/18/2015 8:17:00 PM     0 millisecond
cpu.costop.summation 6/18/2015 8:16:40 PM     0 millisecond
cpu.costop.summation 6/18/2015 8:16:20 PM     0 millisecond
cpu.costop.summation 6/18/2015 8:16:00 PM     0 millisecond

This command should be pretty self-explanatory. Please read Lucd’s explanation on the Instance column. Let’s run the help command against the Get-Stat commandlet to see what options are available:

help Get-Stat

Get-Stat [-Entity] <VIObject[]> [-Common] [-Memory] [-Cpu] [-Disk] [-Network] [-Stat <String[]>] [-Start <DateTime>] [-Finish <DateTime>] [-MaxSamples <Int
32>] [-IntervalMins <Int32[]>] [-IntervalSecs <Int32[]>] [-Instance <String[]>] [-Realtime] [-Server <VIServer[]>] [<CommonParameters>]

Notice how the -stat parameter is formatted; [-Stat <String[]>]. The []s after String means that this parameter accepts an array of values. This means we can define an array with the stat types we wish to retrieve and then pass it into the Get-Stat commandlet as such:

$statTypes = @(‘cpu.costop.summation’, ‘cpu.demand.average’)

Get-Stat -entity $vm -realtime -stat $statTypes | ? { $_.instance -eq “” }

I actually missed this my first go at the script. It wasn’t until I implemented the script in Perl when I saw that you could pass in an array of values. I thought surely you could do the same in PowerCLI so I checked the syntax and sure enough you can. Before I was iterating through each available stat type and calling Get-Stat (100+ times for each VM). Doh.

Powershell Jobs

For a high-level overview of Powershell Jobs, check out Geek School: Learn How to Use Jobs in PowerShell.

Let’s checkout what job commandlets are available to us:

gcm *job* -module Microsoft.PowerShell.Core

CommandType Name        Version Source
----------- ----        ------- ------
Cmdlet      Debug-Job   3.0.0.0 Microsoft.PowerShell.Core
Cmdlet      Get-Job     3.0.0.0 Microsoft.PowerShell.Core
Cmdlet      Receive-Job 3.0.0.0 Microsoft.PowerShell.Core
Cmdlet      Remove-Job  3.0.0.0 Microsoft.PowerShell.Core
Cmdlet      Resume-Job  3.0.0.0 Microsoft.PowerShell.Core
Cmdlet      Start-Job   3.0.0.0 Microsoft.PowerShell.Core
Cmdlet      Stop-Job    3.0.0.0 Microsoft.PowerShell.Core
Cmdlet      Suspend-Job 3.0.0.0 Microsoft.PowerShell.Core
Cmdlet      Wait-Job    3.0.0.0 Microsoft.PowerShell.Core

We will be mostly working with Start-Job so let’s see what the help command has to say about it:

help Start-Job

NAME
Start-Job

SYNTAX
Start-Job [-ScriptBlock] <scriptblock> [[-InitializationScript] <scriptblock>] [-Name <string>] [-Credential <pscredential>] [-Authentication
<AuthenticationMechanism> {Default | Basic | Negotiate | NegotiateWithImplicitCredential | Credssp | Digest | Kerberos}] [-RunAs32] [-PSVersion <version>]
[-InputObject <psobject>] [-ArgumentList <Object[]>] [<CommonParameters>]

Let’s cover the highlighted options:

-ScriptBlock: A chunk of code that our job will execute.
-InitializationScript: When your job runs the ScriptBlock, the code in the ScriptBlock isn’t going to have access to any of the variables, functions, etc that were defined in the main script unless you make them available to the newly created job. Passing an initialization script to your ScriptBlock is a way of doing exactly this. You can also pass arguments to the ScriptBlock. I’m not sure when you should use one over the other. So far I’ve been placing static info into the initialization script and dynamic info into arguments. In my case I’m going to make functions that I define in my main script available to the ScriptBlock by placing them into the initialization script.
-Name: Name of the job. I’d probably make this something unique like the instance UUID of the VM. You could also call this the name of the VM, but if you’re working with multiple vCenters, you could run into duplicates.
-ArgumentList: A list of variables that we can pass into our ScriptBlock.

Here is what it looks like when I create a job in my script:


$job = Start-Job -name $vm.name -InitializationScript $jobFunctions -ScriptBlock $scriptBlock -ArgumentList $vm.name, $vcConnection, $statTypes, $outputDirectory

Since Start-Job returns the created job, I store it for later use so I can query the state of the job.

The Script

Since this script is for a pretty specific use-case, I doubt it will be of much use to anyone else as is. My hope is that someone will be able learn something useful from the post itself and/or by looking through the script. Since I had no idea what I was doing at first, I hope that this will at least get someone up and running more quickly than it took me. I get the urge to go back and re-write it to be more general so someone could post the results to other endpoints such as building an HTML report, but I have other work to do and don’t want to spend time on something that may never be needed. I’m also not going to go through the script line by line but I did try to comment it heavily.

There is a function named publishResults() that is wrapped in a variable named $jobFunctions. Any function that is in the $jobFunctions variable will be available to the code in the script block. publishResults() is what uploads the performance data stored in CSV files to our API. You could probably re-write publishResults() to do something like create an HTML report.

The variable $scriptBlock is what contains all the code that each Powershell job will perform. On line #106, I’ve commented out the publishResults() call since it wouldn’t work in your environment. If you re-write publishResults() to do something useful, you’ll need to un-comment this line.

The start of the script begins on line #119. I hope the comments will be enough to explain the rest. I’ll paste the code below, but it’s probably easier to read here. If I ever get around to it, I’ll upload my Perl & Ruby examples in this repo as well.

Here is what the script looks like when it’s running:

The output folder will look similar to:

Where each file is a VM’s vCenter UUID.

Each file will have contents similar to:

In this example the VM’s name, VC UUID, power state, cpu.usagemhz.avg label, date, metric value and metric unit.

Here is the code, but like I mentioned earlier, it’s easier to read on Github.


$jobFunctions = {
function publishResults {
&lt;#
.SYNOPSIS
.DESCRIPTION
.EXAMPLE
.EXAMPLE
#&gt;

param(
[Parameter(Mandatory=$true)]
[string] $inFile
)

# Most of what is in this function probably won't be too useful to anyone.  You'd probably need to re-write it.

$method = 'POST'
$uri = [System.Uri] "http://2.2.3.1:6543/api/v1/data_sets?gridname=vm&clustername=vcenter"
$outputFile = 'C:\Users\chris\Documents\powercli\stats\publish-result.txt'

$request = Invoke-RestMethod -uri $uri -method $method -InFile $inFile

if ($request.'Job Status' -eq 'Success') {
add-content $outputFile "Successfully published results for $inFile"
}
else {
add-content $outputFile "Failed to published results for $inFile"
}
}
}

$scriptBlock = {
param($vmName, $vc, $statTypes, $outputDirectory)

# Even though I pass in the vCenter connection, I have to re-establish it.  I'd like to know of a way to prevent this.
$vc = Connect-VIServer $vc.name -Session $vc.sessionId

# Even if I pass in the VM object (and not just the name as is currently being done), I still have to retrieve the VM again.  I'd like to know of a way to prevent this.
try {
$vm = Get-VM $vmName -server $vc
}
catch {
$ErrorMessage = $_.Exception.Message
$FailedItem   = $_.Exception.ItemName
add-content "$($outputDirectory)\log.txt" "$errormessage, $faileditem"
}

$filteredStatTypes = @()

# Get all of the stat types for the time interval (realtime) specified.
# When querying for realtime stats, if no results are returned, this most likely means the VM has been powered off for an hour,
# the host the VM is on wasn't sending data to vCenter, the host the VM is on was disconnected, etc.
$availableStatTypes = get-stattype -entity $vm -realtime -Server $vc | sort

# There is no point continuing if there is no data for the VM so return to the calling context.
if ($availableStatTypes -eq $null) { return }

# We need to place all the stats we want to query into $filteredStatTypes.  If the user doesn't specify any stat types ($statTypes),
# then we just assign all available stats ($availableStatTypes) into $filteredStatTypes.
# If the user does specify stat types, we iterate through each one and make sure it's available in $availableStatTypes and add it
# to $filteredStatTypes.
if ($statTypes -eq $null) {
$filteredStatTypes = $availableStatTypes
}
else {
foreach ($statType in $statTypes) {
if ($statType -in $availableStatTypes) {
$filteredStatTypes += $statType
}
}
}

if ($filteredStatTypes -eq $null) { return }

# Collection info to be used in building the report.
$vmName       = $vm.name
$persistentId = $vm.PersistentId
$vmPowerState = $vm.powerstate
$cluster      = ($vm | get-cluster).name
$outputFile = $outputDirectory + $persistentId + '.txt'

# If the output file already exists, go ahead and delete it.
# if (Test-Path $outputFile) { Add-Content "$($outputDirectory)\dups.txt" "$($vm.name)" ; Remove-Item $outputFile }
if (Test-Path $outputFile) { Remove-Item $outputFile }

# During testing you may want limit the number of results you get back. Uncomment the following line and select the first X amount of stat types to limit the the results.
#$statTypes = $statTypes | select -first 5

$finish = Get-Date

# Since realtime stats only go back an hour before they are rolled up, we only need to get an hours worth of data.
$start = $finish.AddHours(-1)
$stats = Get-Stat -entity $vm -server $vc -realtime -stat $filteredStatTypes -start $start -finish $finish | ? { $_.instance -eq "" }

foreach ($stat in $stats) {
$temp = @()
# Build up a temp array that will contain each of the items to be used to build up a line in the report.
$temp += $vmName, $persistentId, $vmPowerState, $cluster, $stat.MetricId, $stat.Timestamp, $stat.Value, $stat.Unit
# Combine each of the items in the temp array to create a line of comma separated values.
$content = '"' + $($temp -join '","') + '"'
# Store each line into our output file.
Add-Content $outputFile $content
}

try {
#publishResults $outputFile
}
catch {
$ErrorMessage = $_.Exception.Message
$FailedItem   = $_.Exception.ItemName
add-content "$($outputDirectory)\log.txt" "$errormessage, $faileditem, $vmName, $persistentId"
}

# Disconnect this (and only this) instance of vCenter so we don't leave unused sessions lying around.
Disconnect-VIServer $vc -Confirm $false
}

# Record the time the script started.
$start = Get-Date

$vCenterName = 'vc5c.vmware.local'

$vcConnection = Connect-VIServer $vCenterName

# This is where we will store the CSV files with the performance data.
$outputDirectory = 'C:\Users\chris\Desktop\perfdumps\'

# Max amount of jobs (processes) we want running at any time.  You may need to tweak this depending on the resources of your machine.
$maxJobCount = 4

# sleep time in seconds between checking to see if it's okay to run another job.
$sleepTimer = 3

# Here we can define an array of the counters we want to retrieve.  If you have a large list of counters, it may be easier to store them in an external file. For example:
#$statTypes = @('mem.active.average', 'mem.granted.average')
# If you don't define this array, all performance counters will be pulled.

# Retrieve the VMs we want to retrieve stats from.
# When you're testing you may want to only grab a subset of all VMs.  Here are a few examples
# get all VMs in the vCenter(s) you're conneted to: $vms = get-vm
# get all VMs in a specific cluster:                $vms = get-cluster 'resource cluster' | get-vm
# get the first 10 VMs that are powered on:         $vms = get-vm | ? { $_.powerstate -eq 'PoweredOn' } | select -First 10

$vms = get-vm | ? { $_.powerstate -eq 'PoweredOn' } | select -First 10

# Create our job queue.
$jobQueue = New-Object System.Collections.ArrayList

# Main loop of the script.
# Loop through each VM and start a new job if we have less than $maxJobCount outstanding jobs.
# If the $maxJobCount has been reached, sleep 3 seconds and check again.
foreach ($vm in $vms) {
# Wait until job queue has a slot available.
while ($jobQueue.count -ge $maxJobCount) {
echo "jobQueue count is $($jobQueue.count): Waiting for jobs to finish before adding more."
foreach ($jobObject in $jobQueue.toArray()) {
if ($jobObject.job.state -eq 'Completed') {
echo "jobQueue count is $($jobQueue.count): Removing job: $($jobObject.vm.name)"
$jobQueue.remove($jobObject)
}
}
sleep $sleepTimer
}

echo "jobQueue count is $($jobQueue.count): Adding new job: $($vm.name)"
$job = Start-Job -name $vm.name -InitializationScript $jobFunctions -ScriptBlock $scriptBlock -ArgumentList $vm.name, $vcConnection, $statTypes, $outputDirectory
$jobObject     = "" | select vm, job
$jobObject.vm  = $vm
$jobObject.job = $job
$jobQueue.add($jobObject) | Out-Null
}

Get-Job | Wait-Job | Out-Null

#$regex = '([a-zA-Z0-9]+-){4}[a-zA-Z0-9]+.txt'
#gci $outputDirectory | ? { $_.name -match $regex } | % {
#  publishResults "$($outputDirectory)\$($_.name)"
#sleep 3
#}

# Record the time the script started.
$end = Get-Date

echo "Start: $($start), End: $($end)"