Application Roll-up Dashboard Objects - How To

I'm building a dashboard that will have have 3 columns that each look at an application from a different perspective as shown below.

  1. Application Health (Processes running/TCP Ports listening)

  2. System Health (Memory/CPU/Disk/etc.)

  3. Database Health (DB Management Pack metrics)

System and Database are complete, but I need help figuring out how to instrument the application health in SCOM/Squaredup. Notice that I have 2 Data Centers that have the same apps running on different servers. In each DC, I’d like to monitor the health of the application on all servers hosting that application. For example, let’s say “BOS” application has 2 servers behind a load balancer and they both run the same apps/processes. I’d like this BOS-App icon to show Green if both servers have both processes running (a.exe and b.exe). If any process is not running on either of the 2 servers in this Datacenter, I’d like that icon to go yellow. If any process is down on both servers in this Datacenter, I’d like the icon to go red. The same goes for Datacenter 2 - two different servers running the same 2 processes. See diagram below. The circles next to the application are the ones I’m trying to focus on. Note, that I already used VADA to build out the DAs for the “Sys” column of icons that are linked to SCOM IDs and are working perfectly.

Looking for recommendations on how to build the objects in SCOM or VADA/SCOM to monitor only the application .EXEs for the 2 servers and roll up the status as described. Do I start with building out a DA with the 2 servers in VADA and then create another DA in SCOM to roll the status up to this “parent” DA? In that case, can I disable all system level monitors on these 2 servers in SCOM for these DAs and not affect the other DAs used for the “System” icons? Or do I build groups or DAs in SCOM somehow to monitor the applications? I don’t know how to make this work properly in SCOM, without affecting the systems that are already part of other DAs.

The goal is to be able to quickly see from a single pane of glass if there is an application problem, system problem, or a database problem affecting the application.

Marnix Wolf has a pretty decent series on DA’s and how they should be used:

Why should one consider using DAs? I mean, you’re already monitoring your servers operating systems, Virtual Infrastructure, network, SQL, IIS, DNS, AD and the lot. So monitoring is already in place. Why add more like DAs for instance? Because it looks sharp? Has a nice ring to it?

No way. Because until now you’re monitoring components. A whole bunch of them. But many times business critical processes, applications and ICT environments are groups of those very same components. Wouldn’t it be nice to know when SQL server A bites the dust that Business Critical Application XYZ is still functional but when SQL Server B bites the dust it’s time for some action in order to prevent a real outage of the same application?


I think this series will cover off what you need, give you an idea of how to structure your DA’s and what you can achieve if done right.

It seems as though this is the sort of layout you are trying to achieve (although your application has far fewer components) from part 3 of the series:

The links provided help to some degree. But, not being a SCOM expert, I’m not sure if I can create DAs with the same list of servers and only enable process monitoring within one DA and OS monitoring (disk/cpu/memory) in the other DA.

If so, then that seems like the best approach… Create 2 sets of DAs (one for System monitoring and the other for process or service monitoring).

Focusing on the Application DA, I would then create a single component group with the 2 servers in it. This could be done using VADA.

After importing that DA into SCOM, this is where it gets fuzzy for me. I assume that I can disable all disk/cpu/memory related monitors in this “App” DA and enable the application/process monitors in this DA.

From there, we have the correct monitors setup in the DA and can focus on the rollup behavior by setting the component group health rollup policy to “Worst state of a percentage” and set it such that if >0% and <100% of components fail we get a warning status and if 100%, we get a critical status.

Sound about right or way off base?

You could also put the services in a group and then use Tao Yang’s Group Health Rollup MP to configure the health state of the group. You can mix different types of objects in the group and have their respective health roll up to the top level.

 

Nick

Rich - Looks like you may have your prayers answered (caveat - slide does say "guidance only")

Looks like VADA vNext will automatically use the components of an application within the DA

Taken from the coffee break (2017 roundup)

1 Like

This blogpost could be helpful. The tricky part is getting the rollup to be a warning when one server is down and critical when everything is down.

http://blogs.catapultsystems.com/cfuller/archive/2010/09/13/using-distributed-applications-to-generate-actionable-alerting/

One way to do it would be via Squaredups powershell MP so that you check status on both servers and post back warning if you find one running and critical if you find zero processes running.

We have done a similiar setup. But we dont set them up as warnings. We have a critical alert for each service. But that does not show on the public dashboard. On the public dashboard we only show it as red if both nodes are down.
On the operations dashboard we can see that we have a critical alert for the service on server A for example.

https://www.youtube.com/watch?v=Fz-YlQXqCSs - towards the end, they give a what’s coming in 2018 bit

I have created an “Application” DA in SCOM that check for the process (.exe) instance count within Task manager. The problem I am having is that, because this monitor is targeting a Server, it also affects my “System” DA where this server also lives. With this approach, I need some way to disable the Process monitor within my “System” DA, while still having this process monitor enabled in my “Application” DA.

It seems that if I disable this monitor with an override within one DA, it also affects the other DA. I think this is because the monitor targets the windows computer and if you disable it on the computer, it disables it for all DAs or Groups where this server resides.

Hoping to find a way to disconnect the process monitors from the server itself, so that when a process fails on the server, I can show the application DOWN/RED on the dashboard, but have a different status icon showing GREEN/UP for all System monitors.

Any other thoughts on how this can be accomplished? Or maybe use Powershell or something that is not tied to that server to capture the status of the process remotely?