I am trying to build meaningful SLA and DA Health Dashboards for our Production app stack. We have 7 applications, 5 with web apps. All apps have a dependency with another app somewhere with few individuals who are hard to pin down having the knowledge in their heads.
If I create a custom DA and leave out the dependencies from another app, it leaves the need for internal expertise to know how the apps work together
Ex: Understanding if DB1 is down on App1, then the WebApp is broken on App3; for instance.
If I create a full on single ERP DA in SCOM containing all components; I do not know of a way to create rollup dependencies based on relationships to build out the complete Topology health state picture and in turn build meaningful SLA.
Any ideas, advice?
Well what they do is they discover the machines, run Vada for dependencies (of the machines), and then they turn off monitoring of the DA for everything but “client facing” components. So you still get the back end alerts but in regards to the DA, you only receive alerts on the DA if it’s down from the end user perspective.
I have a need it seems for Availability but then also other DAs as you have mentioned for other parts of the application. I know they will eventually expand DA to actual map out dependencies of components (like app pools, databases, etc) so I wonder if it’s better to build their way rather than manual? I’m not sure. For example if we recreate things later then it’ll mess with the SLO.
Do you guys do the Squared Up Vada way where you create das from that based on the whole computers or do you create them from scratch in designer?
I would recommend do build smaller DA:s and then incorporate those into a larger DA. This way you can setup SLA for each DA or just the large DA. It will be easier to maintain the DA that way.
We have for example a solution where one DA is called \”Intranet backend\” and that contains for example all SQL components and disklayer for that application. The other DA is called \”Intranet frontend\” where we have the websites, load balancer and so on. If there is multiple servers/components used to maintain availability. We set a rollup for that component and set it to x % of the objects can be critical and should still be green. Or we set it to \”best state of any member\”.
When then use those smaller DA in another DA called \”Intranet Service\” and target the SLA on that DA.
And since the Intranet frontend is also used for another application we can import that DA in another DA and reuse the config.