SCOM: Unix/Linux Shell Command Monitoring – Unique Requirement

When I was working with a customer recently, I came across an requirement, to execute a Shell Command and based on the results, the monitor state needs to be set for Target Server. One of the best example solution available in web is this.

THE PROBLEM:

But the uniqueness however  in this requirement is that the computer where the Shell Command needs to be executed and the Target Server are not the same.

5-14-2014 17-56-03

As Illustrated above, The monitor is targeted to a “Class A” which has several instances including “Server A”. The Shell Command however needs to be executed on “Server B” and the results needs to be manipulated and the state of “Monitor A” needs to be set.

THE SOLUTION:

The “Microsoft.Unix.WSMan.Invoke.ProbeAction” probe based on which the Unix/Linux monitoring data sources are built in SCOM , has a parameter called “TargetSystem”.

In normal scenarios, the value would always be “$Target/Property[Type=”MicrosoftUnixLibrary!Microsoft.Unix.Computer”]/NetworkName$”. Thus the Shell Command would be executed in “Server A” and the “Monitor A” would have a state based on results.

But the value of “TargetSystem” certainly need not to be Target Server’s name. This can be changed to any server which has a SCOM Unix Agent installed with valid certificate for authentication. Then the Shell Command will be executed in the “Server B” rather than in Target Server.

Additionally you can pass the Target Server’s name along with Shell Command as parameter if you have used the Promote Option for Shell Command

Find the XML code in PDF file here — Example.Unix.ShellCommand.Monitoring

Have great SCOMing!!!

SCOM: What’s wrong with my Unix agents? Why they are grey out?

It is quite difficult to work with the Offline Unix agents especially in a large monitoring environment. Though SCOM offers native heartbeat monitor, it is hard to quickly determine whether the computer is actually down or something wrong with the agent configuration.

An Unix agent may be down due to various reasons like issue with SCX process not running or a run as account password got changed or certificate got reset or the computer might be down. SCOM has “UNIX/Linux Heartbeat Monitor”, “WS-Management Run As Account Health” and “WS-Management Certificate Health” monitors to monitor each of above mentioned criteria and alert for offline agents. But it would be tedious job for support guy to handle multiple alerts for same issue and correlating them to fix the agent which may cost considerable time.

Will it not be easy to have only one alert in case of heartbeat failure with the status of all other monitors in the summary?

But wait, should we also track down the ping status in the alert summary so that the support guy knows what he should do first?

Yes, that’s what we are going to do now using powershell. The below script which can be run in any management server logs event in “Operations Manager” event log.

You can create a rule to look for the events and create an alert. The alert will indicate the agent which is offline and details of other monitors.

 

Import-Module OperationsManager

New-SCOMManagementGroupConnection

$mc = get-scclass -name Microsoft.Unix.Computer

$agents = get-scommonitoringobject -class $mc | where {$_.isavailable -ne ‘True’}

foreach ($agent in $agents) {

$maintmode = $agent.InMaintenanceMode

# Ignore Servers in Maintenance

if ($maintmode -eq $false){

$agentname = $agent.displayname

$RespondsToPing = Test-Connection -ComputerName $agent.displayname -quiet

$sh = $agent.GetMonitoringStateHierarchy()

$avail_mon = $sh.childnodes | where {$_.item.MonitorDisplayName -eq ‘Availability’}

$hb_mon = $avail_mon.childnodes | where {$_.item.MonitorDisplayName -eq ‘Unix/Linux Heartbeat Monitor’}

$hb_mon_state = $hb_mon.item.healthstate

if ($hb_mon_state -ne “Success” -and $hb_mon_state -ne “Uninitialized”){

$config_mon = $sh.childnodes | where {$_.item.MonitorDisplayName -eq ‘Configuration’}

$cert_mon = $config_mon.childnodes | where {$_.item.MonitorDisplayName -eq ‘WS-Management Certificate Health’}

$runas_mon = $config_mon.childnodes | where {$_.item.MonitorDisplayName -eq ‘WS-Management Run As Account Health’}

$cert_mon_state = $cert_mon.item.healthstate

$runas_mon_state = $runas_mon.item.healthstate

if ($RespondsToPing){$pingable = “Pingable”}

else{$pingable = “Not Pingable”}

$status = “PING_STATUS: $pingable HEARTBEAT_STATUS: $hb_mon_state CERTIFICATE_STATUS:

$cert_mon_state, USER_ACCOUNT_STATUS: $runas_mon_state”

write-eventlog -LogName ‘Operations Manager’ -source ‘Health Service Script’ -id 1041 -entrytype Error -Category 0 -Message “UNIX SCOM agent on $agentname is not sending a heartbeat – $status”

}

}

}