Time Series Forecasting of GDP using ARIMA

Sina_席納
5 min readOct 17, 2023

--

Time series with ARIMA(ENG)
Time series with ARIMA(CH)
Time series with LSTM(ENG)
Time series with LSTM(CH)
GitHub

When constructing a stock model, many are drawn to the cross-sectional model because of its effectiveness in dealing with multi-factor data at a specific point in time. However, before exploring the cross-sectional realm, it’s crucial to understand the intricacies of the time series model. This essay will introduce the application of the time series model, particularly in forecasting the future Gross Domestic Product (GDP).

Understanding Time Series Models:

Before diving into the techniques, it’s essential to establish a basic understanding of time series models. These models discern patterns, trends, and cyclic behaviors from historical data to make informed predictions about future values. Central to this modeling universe is ARIMA — the Autoregressive Integrated Moving Average model. It combines autoregressive components (derived from previous values) with moving average components (based on past errors), providing a powerful technique for various time series datasets.

The primary objective of this project is to forecast the future Gross Domestic Product (GDP) using time series analysis techniques. While the temptation to jump directly into modeling is strong, data processing cannot be overlooked. The accuracy of predictions hinges on the quality and precision of the dataset. Though GDP is the focal point, the entire dataset must be prepped, addressing anomalies like missing values and outliers. All data have been sourced from the U.S. Bureau of Economic Analysis (Section 7 — Supplemental Tables).

Visualizing the GDP time series can offer profound insights. EDA (Exploratory Data Analysis) seeks to identify evident patterns, trends, or cyclic behaviors. This initial exploration sets the stage for subsequent analytical decisions, be it recognizing seasonality or detecting anomalies.

From an initial analysis, the GDP over time appears non-stationary. Stationarity, a fundamental assumption for many time series methodologies like ARIMA, implies consistent statistical properties of a process over time. Preliminary visual checks, such as observing stable means or variances, provide a hint. However, statistical tests offer conclusive evidence.

Therefore, let’s see how the growth and the log growth act:

Examining the growth and logarithmic growth reveals significant volatility due to COVID-19, but the data now appear stationary.

Autocorrelation analysis further refines the model’s predictive capabilities. Utilizing ACF (Autoregressive Integrated Moving Average) and PACF (Partial Autocorrelation Function) plots helps decipher the inherent nature of autocorrelation, guiding the selection of AR and MA terms for the ARIMA model.

Recognizing the considerable impact of COVID-19 on GDP, I’ve analyzed both inclusive and exclusive periods of the pandemic. Both ACF and PACF indicate an autoregressive process in the data. The ACF and PACF plots further guide the order of the AR terms.

While ARIMA is renowned for its adaptability, other models like SARIMA, Exponential Smoothing, Prophet, and LSTM have their unique advantages based on the dataset’s characteristics. For this endeavor focused on GDP, the AIC and BIC criteria primarily determined ARIMA’s parameters.

For my analysis, I considered orders = [(1,0,0), (2,0,0), (3,0,0), (1,0,1), (1,0,2), (1,0,3), (2,0,1), (2,0,2), (2,0,3)]. The AR(2,0) model yielded the lowest AIC and BIC, so I’ve opted for 2,0,0 for the subsequent ARIMA model. Forecasting future GDP values necessitates a detailed training-validation split, ensuring the model is honed on familiar data while its precision is tested on unseen data, bolstering its reliability. For my analysis, I’ve divided the data into train = gdp[gdp[‘year’] <= 2022] and validation = gdp[gdp[‘year’] > 2022]. This process provided the framework for the GDP prediction.

Subsequent diagnostic tests, such as the Ljung-Box and Jarque-Bera tests, pinpointed potential areas for improvement and adjustments.

Given the Ljung-Box p-value suggests no significant autocorrelations in the residuals and the AR terms are statistically significant, the model seems reasonably adequate for forecasting. However, the non-normality and possible heteroskedasticity of residuals might be concerns, depending on the application.

We can also analyze the residuals.

Delving deeper into the residuals analysis, a thorough examination of residuals and their density distribution became imperative. Notable prediction deviations, especially those centered around 2020, could signal external economic shifts or potential limitations within the model itself. Moreover, the decreasing trend observed in the logarithmic GDP growth rate post-2022 could point towards unforeseen economic factors or underscore the need for model adjustments.

In conclusion, while the cross-sectional model remains an influential tool in stock modeling, the depth, flexibility, and practicality of the time series model, especially in realms like GDP forecasting, cannot be overstated. As we steer through the ever-shifting economic milieu, these sturdy analytical instruments will be instrumental in shaping informed decisions and strategic planning.

You can find the full codes on https://github.com/SinaChang/Time-series-GDP-forecasting-

Don’t know if anyone would be interested in the formula explaining. If yes, tell me and I write an essay about this.

--

--

No responses yet