I have a wierd problem, that I can’t find a solution for.
We have several SCOM environments that we manage, all versions of SCOM.
A week ago, we started getting an Alert from a handfull of servers “System Center Management Health Service Unloaded System Rule(s)”
Normally when i get this error, I restart the agent or flush the health cache, this have alway in ther past fixed this problem.
But now we have 5 servers in our primary Management Group (1800 agents) with this error, they all failed about the samme time, and the normal fix does not work.
After doing a deeper investigation, the OpsMgr eventlog reveals that the MonitoringHost is crashing, starting up, crashing etc…
The error is :
A monitoring host is unresponsive or has crashed. The status code for the host failure was 2164195371.
The errors in the application log reveals there is a problem with read/write to memory (all servers are Virtual)
Application: MonitoringHost.exe Framework Version: v4.0.30319 Description: The application requested process termination through System.Environment.FailFast(string message). Message: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. Stack: at System.Environment.FailFast(System.String)
What we have tried so far, is to update an agent from a 2012R2, to latest 2019. No difference.
Remove the agent from the Management Group, remove agent software, reinstalling and adding the agent to the Management Group. No difference.
Remove the agent from the Management Group (2012R2), and add it to another Management Group (2019). This resulted in the agent worked as intended.
Returning the agent to the correct Management Group, resulted in the agent crashing again.
Looked at the number of handles, this never exceeds 25000 on the server.
The one thing, we can see is the same for all 5 servers, is that they are running Windows Server 2012R2, (but we have about 200 of these monitored in this management group).
We have suspected that maybe an windows update have caused this, but all our servers are running the same patchlevel.
Now a Customer are begining to experience the same behavior on 1 agent. (SCOM 1807 Management Group)
There have not been updated any management packs in the recent weeks.
Do you guys have any idear of what to try next ? otherwise next step will be to contact Microsoft Support, and I am guessing their first comment will be to upgrade to scom 2019