Recently i faced one critical issue where one SCOM agent doesn’t alerted for C drive space full or server down also.
SCOM agent is not MM
SCOM agent is healthy
Monitors are configured for disk space (no override issues)
Able to see data in disk space report.
Its a critical server and has been escalated which made me sleepless nights.
Sill not able identify the root cause ?
For the disk space, it can be a bit tricky sometimes. There are actually two conditions (below certain MB threshold AND below certain % threshold) that have to be met at the same time for the alert to fire. Great concept, as it tries to accommodate for disks of different sizes, but this also happens in missing an alert sometimes which is probably what happened here. You can choose to have to satisfy only one condition if you want, disable the default monitor and enable only the criteria you want (MB or %).
As for the server down alert, do you know if the server was back up within a few minutes? That can cause SCOM to not fire alert as the missing heartbeat is taken into account after 180 sec (default) of downtime. If the server was down for longer than that, see if any override was in place.
Hmm, beats me in that case. Can’t think of anything other than the overrides. Or maybe it was auto-closed for some reason? Do you find that alert anywhere in the console? Have you seen any disk alerts firing fine before this?
Maybe you could do some testing? Override the monitor to trigger on something very high like 90% and then fill up the drive with large files (I think there is software which will fill up a drive with nonsense data as well). See if it triggers then.
You definitely enabled the “Windows Server 2012 Logical Disk Free Space (%) Low” monitor for this server (usually not enabled by default)?