Ran into an odd issue today. We had a server hit CPU usage issues during the night, this effected collection of perfmon counters. 1 minute after the collection stopped the health state of the Server in SCOM went green, see below:
Why would this occur? The server was still experiencing severe CPU performance degradation so the alert was missed. As you can see from the pic the “State” went to GOOD, but the metrics shown are breaching the set threshold (overrides applied to alert @ 85%, queue length @ 15).
Trying to troubleshoot this further to prevent it occurring again in the future.