I’m currently engaged in reducing the alert noise in a very large monitored environment and I’d love to be able to represent the declining number of active alerts over an extended period of time in some sort of (performance) graph format to illustrate the reduction in noise. Frustratingly this appears not to be as simple as it sounds, unless I’m missing something?
The same requirement applies to the number of new alerts per day. Any guidance greatly appreciated.
You can use a SQL tile and query the DWH database to get the data you need.
To track active alerts over time, write a query that counts all alerts that are not resolved or closed, grouped by their raised date.
For new alerts per day, write a query that counts alerts raised each day.
Then you can use these results to create performance graphs showing how alert numbers change over time.
Let me know if you need further help.
What version did the SQL tile become available? SQL was where I ended up as the solution but I don’t have the tile…I’ve inherited these systems so still working my way through the environments. We’re currently on version 6.0.2.10453
What I want to achieve is number of active alerts on a particular day, not by raised date. To give you an example, say I initially have 65k active alerts, many of which go back an age and have enormous repeat counts…I then embark on an alert hygiene process to clear and tune out as many as possible. Ideally, I’d like an Active Alerts total for each day, so a snapshot total not based on when raised but based on resolution state is ‘new’ and active on this date so I can then graph the downward trajectory of Active Alerts by day/date. Does that make sense?
I couldn’t get this working on the DWH because of how resolved alerts are stored there, but I’ve got it working on the OpsDB.
It supports a dynamic timeframe of up to 7 days (due to OpsDB grooming), so if you change the page time to anywhere between 1 hour and 7 days, it will adjust and show the snapshot count for that period.
Hope this helps . You can adjust this if needed.
DECLARE @StartDate DATETIME;
DECLARE @EndDate DATETIME = GETDATE();
-- Dynamic timeframe selector limit to 7 days because of Opsdb grooming
SET @StartDate = CASE {{timeframe.isoDuration}}
WHEN 'PT1H' THEN DATEADD(HOUR, -1, GETDATE())
WHEN 'PT12H' THEN DATEADD(HOUR, -12, GETDATE())
WHEN 'PT24H' THEN DATEADD(HOUR, -24, GETDATE())
WHEN 'P7D' THEN DATEADD(DAY, -7, GETDATE())
ELSE DATEADD(DAY, -7, GETDATE()) -- force max 7 days
END;
;WITH DateSeries AS
(
-- Create one row for each date in the range
SELECT CAST(@StartDate AS DATE) AS DateValue
UNION ALL
SELECT DATEADD(DAY, 1, DateValue)
FROM DateSeries
WHERE DateValue < CAST(@EndDate AS DATE)
)
SELECT
ds.DateValue AS [Date],
COUNT(DISTINCT a.AlertId) AS [Active Alerts]
FROM DateSeries ds
LEFT JOIN dbo.Alert a
ON a.TimeRaised <= DATEADD(DAY, 1, ds.DateValue) -- raised on or before end of day
AND (a.TimeResolved IS NULL OR a.TimeResolved > ds.DateValue) -- still active
AND a.ResolutionState = 0 -- state = "New"
GROUP BY ds.DateValue
ORDER BY ds.DateValue