Dashboard for graphing Total number of Active alerts on a daily basis

Hello hive mind,

I’m currently engaged in reducing the alert noise in a very large monitored environment and I’d love to be able to represent the declining number of active alerts over an extended period of time in some sort of (performance) graph format to illustrate the reduction in noise. Frustratingly this appears not to be as simple as it sounds, unless I’m missing something?
The same requirement applies to the number of new alerts per day. Any guidance greatly appreciated.

Hey,

You can use a SQL tile and query the DWH database to get the data you need.
To track active alerts over time, write a query that counts all alerts that are not resolved or closed, grouped by their raised date.
For new alerts per day, write a query that counts alerts raised each day.
Then you can use these results to create performance graphs showing how alert numbers change over time.
Let me know if you need further help.

Thanks Thanos,

What version did the SQL tile become available? SQL was where I ended up as the solution but I don’t have the tile…I’ve inherited these systems so still working my way through the environments. We’re currently on version 6.0.2.10453

Cheers

Found it!

What I want to achieve is number of active alerts on a particular day, not by raised date. To give you an example, say I initially have 65k active alerts, many of which go back an age and have enormous repeat counts…I then embark on an alert hygiene process to clear and tune out as many as possible. Ideally, I’d like an Active Alerts total for each day, so a snapshot total not based on when raised but based on resolution state is ‘new’ and active on this date so I can then graph the downward trajectory of Active Alerts by day/date. Does that make sense?

Hi Davy,

I couldn’t get this working on the DWH because of how resolved alerts are stored there, but I’ve got it working on the OpsDB.
It supports a dynamic timeframe of up to 7 days (due to OpsDB grooming), so if you change the page time to anywhere between 1 hour and 7 days, it will adjust and show the snapshot count for that period.

Hope this helps . You can adjust this if needed.

DECLARE @StartDate DATETIME;
DECLARE @EndDate DATETIME = GETDATE();

-- Dynamic timeframe selector limit to 7 days because of Opsdb grooming
SET @StartDate = CASE {{timeframe.isoDuration}}
         WHEN 'PT1H'  THEN DATEADD(HOUR, -1, GETDATE())
         WHEN 'PT12H' THEN DATEADD(HOUR, -12, GETDATE())
         WHEN 'PT24H' THEN DATEADD(HOUR, -24, GETDATE())
         WHEN 'P7D'   THEN DATEADD(DAY, -7, GETDATE())
         ELSE DATEADD(DAY, -7, GETDATE()) -- force max 7 days
      END;

;WITH DateSeries AS
(
    -- Create one row for each date in the range
    SELECT CAST(@StartDate AS DATE) AS DateValue
    UNION ALL
    SELECT DATEADD(DAY, 1, DateValue)
    FROM DateSeries
    WHERE DateValue < CAST(@EndDate AS DATE)
)
SELECT 
    ds.DateValue AS [Date],
    COUNT(DISTINCT a.AlertId) AS [Active Alerts]
FROM DateSeries ds
LEFT JOIN dbo.Alert a
    ON a.TimeRaised <= DATEADD(DAY, 1, ds.DateValue) -- raised on or before end of day
    AND (a.TimeResolved IS NULL OR a.TimeResolved > ds.DateValue) -- still active
    AND a.ResolutionState = 0 -- state = "New"
GROUP BY ds.DateValue
ORDER BY ds.DateValue

Thanks Thanos! Really appreciate your efforts! Will have a play and advise.