2 Management servers and when 1 gets patched half of my distributed apps are unmonitored

Does anyone know how to make 2 SCOM servers actually work in tandem when one has an outage? I’m tired of coming in every Thursday and half my apps are unmonitored. A reboot of the patched server the night before solves the issue.

Thanks,

Gary

Hi Gary!

In short, you need three members in the resource pool for full redundancy.

“Resource pools apply a logic similar to clustering “majority node set”, where (< number of nodes as members of the pool > /2) + 1. At a minimum, there must be three members in the pool to maintain quorum, which must be more than 50% of the quorum voting members in a pool to maintain availability of the pool. If you only have two members of the pool, and one is unavailable, you have lost quorum.”

If you have only two management servers, there is added an third member who is an “observer” that can vote, but does not take any workloads. It’s only purpose is to vote if there is an even number of members in the resource pool (and you loose half av them. The observer is tipping the scale in the scenarios.

 

!! Make sure you have an “observer” enabled i your resource pool with 2 mgmt servers. !!

 
Single management server

  • The default observer is enabled by default and provides no benefit since there are only two members and quorum isn't reached.
  • There is no high availability, because the management server is a single point of failure.
Two management servers
  • The default observer is enabled by default.
  • There is high availability for the pool, because there are three voting members - two management servers and the default observer.
  • If you disable the default observer, you'll lose high availability for the pool.
Three management servers
  • The default observer is enabled by default.
  • There is high availability for the pool, because there are four voting members - three management serves and the default observer.
  • By default you can only have one management server unavailable to maintain quorum. If two management servers are unavailable, you have exactly 50% of voting members and the resource pool no longer functions to manage the monitoring workloads.
  • The default observer doesn't increase the number of management servers that can be down, therefore it doesn't increase pool availability.
  • You can consider removing the default observer in this scenario.
Please read this:

https://docs.microsoft.com/en-us/system-center/scom/plan-resource-pool-design?view=sc-om-2019