We’ve got an application that regularly errors with a 500 HTTP status code. To fix the error we need to recycle the app pool (until the vendor extracts a digit and sorts the issue). I’ve implemented a recovery task on the ‘Web Application Availability Monitoring Test – Web Application Monitor’ to run some PowerShell:
$password = ConvertTo-SecureString "password goes here" -AsPlainText -Force
$cred= New-Object System.Management.Automation.PSCredential ("domain\username here", $password)
Enter-PSSession -ComputerName servername here -Credential $cred
Restart-WebAppPool "AppPool name here"
The PS works when I run it on the server, or call it remotely with WinRM from a SCOM Management Server, but it isn’t triggering when the alert for the 500 error is raised. Any ideas what I’m doing wrong, and where I can find some logged information for this? I can’t find anything in the Operations Manager or PowerShell logs on either the SCOM Management servers or the server that has the issue.
Unfortunately, the 500 error doesn’t leave any evidence on the server that I could use to trigger a task from instead of using the web monitor alert. Also, I know the way I’m passing credentials isn’t ideal, but I just want to get this working before I look at securing it better!
Well, this is embarrassing – it seems to be working now and I can see the Service Account logging on in the Security log called by the SCOM agent.
When I originally created the monitor I disabled it for all objects and enabled it for the specific web monitor I required. It seems to have started working after I deleted and recreated the override to enable it.
Thanks for the help everyone. It’s only masking the issue on the webserver, but at least we don’t have to keep manually logging on to sort it out!
We set up something like this in the days before powershell. Credentials were definitely tricky, we ended up using a RunAs account and manually adding it to the recovery in the MP XML. Then we gave the RunAs account rights on the management server running the web monitor as well as rights to remote to the web server. It took a few tries to get the workflow going — I would suggest starting with a PS that just writes an event to the OpsMgr event log so you can make sure the monitor is properly triggering the recovery with the appropriate values.
According to my experience you can find the 500 errors in the IIS logs. You could either create a monitor which watches the content and fires the recovery task you created.
Or you could create a custom error page for the 500 error. That custom error page could be an asp / aspx that could write in your servers eventlog for instance, but need to take care for security for that case.