Every month we have a 4-hour maintenance window for windows updates, software updates and other maintenance tasks that require downtime.
In order to prevent alert storms and errors in reporting I put all our DA’s and servers (with contained objects) in maintenance mode.
It always takes a long time for the scom enviroment to recover after this window and I’m usually left with some grey agents.
Is there a better way to do mass maintenance mode?
I use the following method, which works perfectly, and allows me to schedule regular maintenance windows.
- Create a group containing the servers you wish to put into maintenance mode (or more than one group if you have a number of schedules )
- Create a windows schedule (or schedules) to run a powershell script.
- 3. Download the powershell script in
And edit the variables as per the article to specify the group name and the number of minutes that you wish the group to be put into maintenance mode.
As you can see from the screenshot above, I have a lot of maintenance mode tasks that are run on a daily basis. Also I have five groups that can be used adhoc to schedule maintenance mode on systems/servers that arise out of our CAB meetings. So far this has worked flawlessly.
It also has the advantage that any issues with the servers during the outage window all alert at the same time, i.e. when the window ends.
We also use Orchestrator, but slightly differently. This specific customer uses scripts patch systems, and before they patch, they place a file with the servername in a specific folder. Orchestrator checks the files in this folder every 30 seconds, picks up the name and puts the specifc server in maintenance mode. If the server is already in maintenance mode, we add 4 hours to that specific maintenance window.
If the server belongs to a cluster, the cluster will be put in maintenance mode as well.
You could make a temporary SCOM group with all the objects you wish to place into maintenance mode. Then you can place all your objects into maintenance at once using the group drill down in Squared Up, I find that SCOM seems to recover quite quickly after doing this.
I have written a blogpost on how to Automate Maintenance Mode during patch windows.
SCOM 2012: AUTOMATIC MAINTENANCE MODE DURING PATCH WINDOWS
We use the SCOM Powershell module, devices appear to recover from maintenance without issue after this.
Look at Get-Help Start-SCOMMaintenanceMode -Full for information.
You’ll be able to iterate through a group of servers using a For Loop and a list of servers in a txt or csv file quite easily.
Personally I did a Orchestrator Runbook that looks at my collection for my maintenance windows and places them into maintenance window on a schedule. It was pretty simple runbook to write and it made it so that as a machine gets added to a maintenance window it automatically gets added to my runbook.
SCOM 2016 is the best answer for this, but all methods above will do what you require 🙂