9.1 Seasonal differencing and SARIMA models

3 min readjuly 22, 2024

and SARIMA models are powerful tools for handling time series data with recurring patterns. They remove seasonal components, making data more stationary and easier to analyze. This technique is crucial for accurate forecasting and understanding underlying trends.

SARIMA models extend ARIMA by incorporating seasonal elements. They capture both short-term and long-term patterns in data, making them ideal for complex time series. Understanding these models helps in making better predictions and decisions based on seasonal data.

Seasonal Differencing and SARIMA Models

Seasonal differencing in time series

Top images from around the web for Seasonal differencing in time series
Top images from around the web for Seasonal differencing in time series
  • Technique to remove seasonal patterns from time series data by computing the difference between observations separated by a seasonal period (12 months, 4 quarters)
  • Denoted as s[D](https://www.fiveableKeyTerm:d)Xt=(1B[s](https://www.fiveableKeyTerm:s))DXt\nabla_s^[D](https://www.fiveableKeyTerm:d) X_t = (1 - B^[s](https://www.fiveableKeyTerm:s))^D X_t, where ss is the seasonal period and DD is the order of seasonal differencing
  • Helps achieve stationarity, a key assumption for many time series models including SARIMA, by removing the seasonal component and leaving only trend and irregular components
  • The order of seasonal differencing (DD) depends on the strength and persistence of the , with higher values necessary for strong and persistent patterns

Structure of SARIMA models

  • Extend the ARIMA framework to incorporate seasonal components, denoted as SARIMA([p](https://www.fiveableKeyTerm:p),d,[q](https://www.fiveableKeyTerm:q)[p](https://www.fiveableKeyTerm:p),d,[q](https://www.fiveableKeyTerm:q))(P,D,QP,D,Q)s_s
    • pp: Non- order
    • dd: Non-seasonal differencing order
    • qq: Non- order
    • PP: Seasonal autoregressive order
    • DD: Seasonal differencing order
    • QQ: Seasonal moving average order
    • ss: Seasonal period
  • Seasonal autoregressive (SAR) component captures dependency between observations separated by seasonal periods, denoted as ΦP(Bs)=1Φ1BsΦ2B2s...ΦPBPs\Phi_P(B^s) = 1 - \Phi_1 B^s - \Phi_2 B^{2s} - ... - \Phi_P B^{Ps}
  • Seasonal moving average (SMA) component captures dependency between error terms separated by seasonal periods, denoted as ΘQ(Bs)=1Θ1BsΘ2B2s...ΘQBQs\Theta_Q(B^s) = 1 - \Theta_1 B^s - \Theta_2 B^{2s} - ... - \Theta_Q B^{Qs}

Order of seasonal differencing

  • Determined by examining the seasonal pattern in the time series
    • D=0D=0 (no seasonal differencing) if the seasonal pattern is constant over time
    • D=1D=1 (first-order seasonal differencing) if the seasonal pattern varies proportionally to the level of the series
    • Higher-order seasonal differencing (D>1D>1) may be necessary in rare cases
  • Visual inspection of the time series plot helps identify the need for seasonal differencing
    • Constant seasonal patterns over time suggest D=0D=0
    • Seasonal patterns that increase or decrease with the level of the series suggest D=1D=1
  • Autocorrelation function (ACF) and partial autocorrelation function (PACF) can also help determine the order of seasonal differencing, with significant spikes at seasonal lags suggesting the need for seasonal differencing

Interpretation of seasonal ARIMA terms

  • Seasonal autoregressive (SAR) terms capture the relationship between observations separated by seasonal periods
    • The order of the SAR term (PP) indicates the number of seasonal lags influencing the current observation
    • Coefficients of the SAR terms (Φ1,Φ2,...,ΦP\Phi_1, \Phi_2, ..., \Phi_P) represent the strength and direction of the seasonal autocorrelation
  • Seasonal moving average (SMA) terms capture the relationship between error terms separated by seasonal periods
    • The order of the SMA term (QQ) indicates the number of seasonal lags of error terms influencing the current observation
    • Coefficients of the SMA terms (Θ1,Θ2,...,ΘQ\Theta_1, \Theta_2, ..., \Theta_Q) represent the strength and direction of the seasonal correlation between error terms
  • Significance of SAR and SMA coefficients assessed using statistical tests (t-tests, confidence intervals)
    • Significant coefficients suggest presence of seasonal patterns in the time series
    • Non-significant coefficients may indicate the corresponding seasonal terms can be removed from the model

Key Terms to Review (18)

Augmented dickey-fuller test: The augmented dickey-fuller test is a statistical test used to determine whether a time series has a unit root, indicating that it is non-stationary. This test is crucial in assessing the stationarity of data, which directly affects the modeling and forecasting processes in time series analysis, especially when dealing with seasonal differencing, cross-validation, integrated ARIMA models, and understanding the trend component.
D: In time series analysis, 'd' represents the degree of differencing required to achieve stationarity in a time series dataset. It is a key component in integrated models, indicating how many times the data needs to be differenced to remove trends or seasonal patterns and stabilize the mean, making it suitable for modeling. This concept is especially relevant in the context of ARIMA and SARIMA models, where determining the appropriate value of 'd' is crucial for effective forecasting.
First seasonal difference: The first seasonal difference is a technique used to remove seasonality from a time series by subtracting the value of a previous season from the current value. This process helps in stabilizing the mean of the time series and making it easier to model. By focusing on the differences rather than the actual values, it allows for more accurate forecasting and analysis using methods like SARIMA, which specifically accounts for seasonal patterns.
Forecasting accuracy: Forecasting accuracy refers to the measure of how closely predicted values align with actual observed values over a specific time period. High forecasting accuracy is essential for effective decision-making and resource allocation, especially in contexts where seasonal patterns or trends are prominent. It involves various techniques and models to assess and improve the precision of forecasts, such as evaluating errors and utilizing statistical measures.
Ljung-Box test: The Ljung-Box test is a statistical test used to determine whether any of a group of autocorrelations of a time series are different from zero, indicating that the time series is not white noise. This test plays a crucial role in assessing model adequacy, especially in regression contexts, and is also significant for time series forecasting and error analysis.
Mean Absolute Error (MAE): Mean Absolute Error (MAE) is a measure of forecast accuracy that calculates the average absolute differences between predicted values and actual values. This metric provides insight into the accuracy of different forecasting methods by quantifying how much the forecasts deviate from the real data, making it essential in evaluating time series models.
Model selection criteria: Model selection criteria are statistical tools used to evaluate and compare different models to find the one that best fits a given dataset while avoiding overfitting or underfitting. These criteria help in determining which model is most effective at capturing the underlying patterns in data, considering aspects such as accuracy, complexity, and predictive power. The chosen model should balance fitting the data well and generalizing to new, unseen observations.
Noise: Noise refers to the random variations or disturbances in a time series that cannot be attributed to the underlying pattern or signal. It is essentially the 'background chatter' that obscures the true trend, seasonality, and cyclical behaviors in data. Understanding noise is crucial when using techniques like seasonal differencing and SARIMA models, as it helps to refine the model and improve its predictive accuracy by isolating meaningful patterns from irrelevant fluctuations.
P: In time series analysis, 'p' refers to the order of the autoregressive part of an ARIMA or SARIMA model. It indicates the number of lagged values of the dependent variable that are included in the model. The value of 'p' helps to determine how many past observations should influence the current value, which is crucial for making accurate forecasts and understanding the underlying data patterns.
Q: In time series analysis, 'q' represents the order of the moving average component in ARIMA models, specifically indicating how many lagged forecast errors are included in the model. This parameter plays a crucial role in capturing the relationship between the current observation and past forecast errors, making it essential for accurately modeling and forecasting time series data. Understanding 'q' helps in defining the structure of both seasonal differencing and integrated models, as it directly influences how past information is utilized to improve predictions.
Root Mean Squared Error (RMSE): Root Mean Squared Error (RMSE) is a widely used metric for measuring the accuracy of a predictive model by calculating the square root of the average squared differences between predicted and actual values. This measure helps assess how well a model performs, particularly when evaluating forecasts in time series analysis. RMSE is sensitive to outliers and gives higher weight to larger errors, making it a crucial metric for fine-tuning models, especially in complex scenarios like seasonal differencing and SARIMA models, evaluating forecast accuracy, and analyzing stock price movements.
S: In the context of time series analysis, 's' represents the seasonal period, which is the number of observations in one complete seasonal cycle. Understanding 's' is crucial for identifying and modeling seasonal patterns within a dataset, especially when applying seasonal differencing and SARIMA models. This parameter helps in capturing the cyclical nature of data, allowing analysts to make more accurate forecasts by accounting for regular fluctuations over time.
Seasonal Autoregressive: Seasonal autoregressive refers to a component in time series analysis that captures the relationship between a variable and its past values at seasonal lags. This concept is critical when modeling data that exhibits periodic fluctuations, as it helps in identifying patterns that repeat over specific intervals, like months or quarters. By incorporating seasonal autoregressive terms into models such as SARIMA, analysts can effectively account for seasonal trends and improve forecasting accuracy.
Seasonal differencing: Seasonal differencing is a technique used in time series analysis to remove seasonal patterns by subtracting the value from a previous season. This method helps to stabilize the mean of a seasonal time series, making it easier to model and forecast using methods like SARIMA. By applying seasonal differencing, one can focus on the underlying trends and cyclical behaviors in the data without the noise created by regular seasonal fluctuations.
Seasonal effect: The seasonal effect refers to predictable and regular patterns that occur at specific intervals within a time series, typically influenced by seasonal factors such as weather, holidays, or other cyclical events. Understanding seasonal effects is crucial for effectively analyzing time series data, as it allows for more accurate forecasting and modeling by capturing these recurring fluctuations. Seasonal effects can significantly impact trends and cycles in the data, highlighting the importance of adjusting models to account for these variations.
Seasonal Moving Average: A seasonal moving average is a statistical technique used to smooth out data in a time series by averaging values over a specified seasonal period, helping to identify trends and seasonal patterns. It plays a crucial role in forecasting models by reducing noise and making underlying trends more apparent, particularly in the context of seasonal data that can fluctuate due to cyclical factors. This method is often used alongside seasonal differencing and SARIMA models to enhance predictive accuracy.
Seasonal pattern: A seasonal pattern refers to predictable fluctuations that occur in a time series at regular intervals, often influenced by seasonal factors such as weather, holidays, or economic cycles. These patterns are crucial for identifying trends and making accurate forecasts since they can repeat annually, quarterly, or monthly. Recognizing seasonal patterns allows for better modeling of data and can help in adjusting predictions to account for these expected variations.
Seasonality: Seasonality refers to periodic fluctuations in time series data that occur at regular intervals, often influenced by seasonal factors like weather, holidays, or economic cycles. These patterns help in identifying trends and making predictions by accounting for variations that repeat over specific timeframes.
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.