Health Service Private Bytes and Handle Count leak in SCOM Agent on Windows 2012 and Higher OS

Previously, when we had a Private Bytes and Handle Count leak in SCOM agents on Windows Servers 2008/R2. We would install the following KBs: KB2685811 KB2685813

Now, with the new OS versions of Windows 2012 and higher, we don’t have any solution for this problem.
  • Our Health Service Private Bytes and Handle Count values, has already been modified, according to Kevin’s article
  • Our current version of scom is: SCOM 2012 UR12 (i know it’s outdated but we plan to update it soon)

What evidence do you have that this is happening? is the SCOM agent constantly restarting on some machines?

We have thousands of 2012 servers and don’t have an issue except on a few - typically something like a SharePoint SQL server with 500+ DBs and we just set the thresholds much higher

We have a group that has this ‘high use’ thresholds set against it and plop in a server as required

As starters, we are getting alerts in SCOM about this issue (Monitors: Monitoring Host Handle Count/Host Privates Bytes Threshold).

And when login to those agents,
We can actually see a high number of handle count, used by healthservice.exe process.

Regarding the SCOM agent constantly restarting ,we’ve disabled this option (Auto-Resolve = False).

Might be related to Management pack versions.

I ran into issues with Server 2016 10.0.17.0

https://goo.gl/GXTVaj
If you’ve added any new ones in relatively recently it might be worth trying to remove them and see if that helps any.

All our MP’s are relatively up-to-date, and we do have a variety of management packs that requires a lot of resources (AD,SQL,Cluster,Exchange, etc)

But because this issue appears on only 30 servers within our environment, it doesn’t seems to me like a problem with the MP’s themselves more like a resource leak…

Is there any particular way to investigate this?
Because i don’t see any common denominator between those servers…

Annoyingly I was trying to link to the question I asked here, but that seemed to fail (updated to shortened URL). You can probably get to it through my profile.

I was having the same sort of problems you were having, on certain servers in a certain management group the Handles were running away. For me it came down to a management pack having problems. Got rid of the offending MP and the resource leak went away.

If you think about it the MPs run code on the agents. If one of the pieces of code has issues then the agent running it has issues.

Using process explorer I could see that the handles it was creating were tens of thousands of authentication tokens but not disposing of them. But couldn’t work out why, until I went to look at updating the server 2016 MP and saw a log in the latest version saying that they had addressed a handle leak. Updated the MP and the problems disappeared.

I keep a log of all the changes I make and when, so in hindsight I would have looked at my log for any changes made after that started happening and narrowed down from there.