1

Hi All,

Got a weird one that I can’t find any solid information on. We’re a relatively new environment, one development management group with a few services monitored (20 or so agents so far), production just monitoring itself right now.

So having read Kevin Holman’s health-service restarts article back in the day we’ve got our two environments overridden to 30K handles for the health-service. In our development environment this works perfectly fine and to this day there hasn’t been a single health-service restart. Generally the monitoringhost.exe executable sits on between 2-6k handles with the occasional spike to about 10K once in a blue moon. So in general pretty stable.

Production was also pretty stable until I tried to get development SCOM to monitor the production management group and vice versa, the theory being that if something went wrong with one, the other would be able to tell us about it. Handles on the monitoring hosts for the database servers and web servers went through the roof on both sides and health service restarts multiple times a day. I stopped this pretty quickly. The development servers went back to normal the production ones are still facing this issue (though it has slowed down).

Looking at the process handles using process explorer and handles (sysinternals), there seem to be a number of unnamed handles of various types that simply don’t exist on the development side and tens of thousands of auth tokens (development seems to have system and nothing else (as I would expect) production seems to have all sorts, mostly system, but also things like IIS Pool accounts, my own domain account, service accounts, Desktop window manager accounts, etc.) and just seems to keep hordeing them ever increasing until the monitor trips and restarts the service.

I’ve tried flushing the cache, repairing the agent, pulling it out completely and starting fresh, defragging the healthstore database. None of this seems to help. I’m at a bit of a loss really. Other than starting from scratch is there anything else that would be worth trying?

Cheers,

Dave Lewis answered