I’m building a dashboard that will have have 3 columns that each look at an application from a different perspective as shown below.
1) Application Health (Processes running/TCP Ports listening)
2) System Health (Memory/CPU/Disk/etc.)
3) Database Health (DB Management Pack metrics)
System and Database are complete, but I need help figuring out how to instrument the application health in SCOM/Squaredup. Notice that I have 2 Data Centers that have the same apps running on different servers. In each DC, I’d like to monitor the health of the application on all servers hosting that application. For example, let’s say “BOS” application has 2 servers behind a load balancer and they both run the same apps/processes. I’d like this BOS-App icon to show Green if both servers have both processes running (a.exe and b.exe). If any process is not running on either of the 2 servers in this Datacenter, I’d like that icon to go yellow. If any process is down on both servers in this Datacenter, I’d like the icon to go red. The same goes for Datacenter 2 – two different servers running the same 2 processes. See diagram below. The circles next to the application are the ones I’m trying to focus on. Note, that I already used VADA to build out the DAs for the “Sys” column of icons that are linked to SCOM IDs and are working perfectly.
Looking for recommendations on how to build the objects in SCOM or VADA/SCOM to monitor only the application .EXEs for the 2 servers and roll up the status as described. Do I start with building out a DA with the 2 servers in VADA and then create another DA in SCOM to roll the status up to this “parent” DA? In that case, can I disable all system level monitors on these 2 servers in SCOM for these DAs and not affect the other DAs used for the “System” icons? Or do I build groups or DAs in SCOM somehow to monitor the applications? I don’t know how to make this work properly in SCOM, without affecting the systems that are already part of other DAs.
The goal is to be able to quickly see from a single pane of glass if there is an application problem, system problem, or a database problem affecting the application.
Marnix Wolf has a pretty decent series on DA’s and how they should be used:
Why should one consider using DAs?
I mean, you’re already monitoring your servers operating systems, Virtual Infrastructure, network, SQL, IIS, DNS, AD and the lot. So monitoring is already in place. Why add more like DAs for instance? Because it looks sharp? Has a nice ring to it?
No way. Because until now you’re monitoring components. A whole bunch of them. But many times business critical processes, applications and ICT environments are groups of those very same components. Wouldn’t it be nice to know when SQL server A bites the dust that Business Critical Application XYZ is still functional but when SQL Server B bites the dust it’s time for some action in order to prevent a real outage of the same application?
I think this series will cover off what you need, give you an idea of how to structure your DA’s and what you can achieve if done right.
It seems as though this is the sort of layout you are trying to achieve (although your application has far fewer components) from part 3 of the series:
Rich – Looks like you may have your prayers answered (caveat – slide does say “guidance only”)
Looks like VADA vNext will automatically use the components of an application within the DA
Taken from the coffee break (2017 roundup)
The links provided help to some degree. But, not being a SCOM expert, I’m not sure if I can create DAs with the same list of servers and only enable process monitoring within one DA and OS monitoring (disk/cpu/memory) in the other DA.
If so, then that seems like the best approach… Create 2 sets of DAs (one for System monitoring and the other for process or service monitoring).
Focusing on the Application DA, I would then create a single component group with the 2 servers in it. This could be done using VADA.
After importing that DA into SCOM, this is where it gets fuzzy for me. I assume that I can disable all disk/cpu/memory related monitors in this “App” DA and enable the application/process monitors in this DA.
From there, we have the correct monitors setup in the DA and can focus on the rollup behavior by setting the component group health rollup policy to “Worst state of a percentage” and set it such that if >0% and <100% of components fail we get a warning status and if 100%, we get a critical status.
Sound about right or way off base?
You could also put the services in a group and then use Tao Yang’s Group Health Rollup MP to configure the health state of the group. You can mix different types of objects in the group and have their respective health roll up to the top level.