Whether it’s one analyst monitoring half a dozen KPIs or an automated system keeping track of a million, the end goal of anomaly detection is to discover – preferably quickly – the unusual blips or kinks in the time series data which can reveal either business opportunities or problems that must be dealt with. However, finding the unusual, first requires knowing what counts as the usual. Detecting anomalies in a metric first requires a model of what’s normal for that metric. For some metrics, that model might consist of a nominal value or linear trend, plus or minus a few sigma to account for noise and other normal variations.
The normal model for a metric might also include what’s called seasonality. Seasonality refers to the presence of cyclical patterns in the time series data. The period for these cycles can range from hours to a full year or more. A retailer’s monthly sales over the course of a year, for example, would probably show large peaks during late summer (back to school shopping) as well as the end of the year (holiday season), year after year.
These seasonal patterns are repeated and very much expected changes in the value of the metric, and they must be screened out from the unexpected changes in that data, the anomalies we’re looking for. Simply flagging any changes as anomalies would result in a torrent of false positives, drowning out any correctly identified anomalies. It would also be a mistake to assume that all metrics have seasonality, as some metrics, well…don’t. To add a further wrinkle, a metric may have more than one seasonal pattern present and those patterns may interact with one another.
The challenge of accounting for seasonality
Seasonality presents three challenges for anomaly detection:
- Discovering any and all seasonal patterns.
- If there is seasonality, determining the period for each of the patterns.
- Assessing how much seasonality contributes to the value of the metric at any given point.
According to time series anomaly detection system vendor Anodot, there are two common methods to detect seasonality in time series data: the Fourier Transform and serial correlation (autocorrelation). The former is fast but can be inaccurate, the latter requires more computation but is more accurate (Anodot itself uses a proprietary algorithm based on autocorrelation because its system is designed to operate in real-time at the scale of millions of metrics).
The qualitative task of detecting seasonality is different from the quantitative assessment of measuring how much seasonal influence is present in a slice of real data. Even with advanced math and proprietary algorithms, it can be tricky to precisely quantify how much seasonality is present.
In fact, economists in the United States are currently debating whether the country’s official gross domestic product (GDP) statistics have residual seasonality. That is, a seasonal contribution still present in the data despite economists’ and statisticians’ attempts to correct for it. When plots can determine policy, separating seasonal variations from both long-term trends and significant anomalies is crucial. Public health officials have to respond differently to normal summertime high temperatures than they would to a heat wave on top of normal summertime high temperatures.
Your time series data are more than streams of digits, they are sources of insight into the health of your business and the effectiveness of your actions. The anomalies in that data are crucial signals, sources of actionable intelligence. Take the correct action by getting the correct anomalies. By accounting for seasonality, real anomalies can be distilled from both the ripples and tides present in your data. In other words, there’s really no need to take corrective action on something which doesn’t need correcting.