Work at a hospital and we use Epic have a custom monitor in place that watches a performance monitor for number of failures in the Epic print que. Some failures are normal and there is a process to clean them up. What I am looking to do is only trigger an alert if I get x number of failed jobs over a short period of time say 5 failures in 10 minutes would be something I would like to trigger an email and have someone investigate. Not quite sure how to go about this though.
Performance monitors overview:
You want to create a Unit Monitor based on a Windows Performance counter. This blog should give you a rough idea (skip past the rule creation):
Ensure that the monitor target (on the next page) is the same as the rule target (right click the rule > properties, it will have the target on the first page):
You can then configure the sample rate and thresholds further along in the wizard.
That’s pretty much what I did and set a thresh hold of 20 on it. Problem is the we routinely cross that level before he process to clean up failed jobs happens. We need to hold on to failed jobs for a few days incase it was a print device was down and we then just need to resubmit the failed jobs back through. I know it weird how Epic handles printing but it is what it is. So with all that in mind I am trying to only alert if we get 5 failed jobs in say a 10 minute time frame then raise the alert. Since that is about the number we get over a few hours that would be a great indicator of potential issues. This came up since we had one of the servers fail earlier this week and it had a few hundred failed jobs in a very short time frame.