Time series with ARIMA(ENG)
Time series with ARIMA(CH)
Time series with LSTM(ENG)
Time series with LSTM(CH)
GitHub
In a previous Medium article, I delved into time series forecasting using ARIMA. In this current analysis, our focus shifts to predicting GDP by the LSTM model, a specialized variant of recurrent neural networks (RNNs), which is exceptionally skilled at deciphering sequences.
1. Data processing
For details on data processing and visualization, they remain consistent with the previous article. Kindly refer to this link for further insights.
2. Feature Engineering:
— Time Series Transformation:
# Create sequences of data suitable for time-series forecasting
def create_sequences(input_data, tw):
"""
Transform a time series into a prediction dataset
X : our feature
y : prediction
tw : training window
"""
X, y = [], []
L = len(input_data)
for i in range(L - tw):
feature = input_data[i:i+tw]
target = input_data[i+1:i+tw+1]
X.append(feature)
y.append(target)
return torch.tensor(X, dtype=torch.float32), torch.tensor(y, dtype=torch.float32)
- The primary transformation involves converting the raw time series data into overlapping sequences. Each sequence, of length
tw
(in this analysis,tw=12 which is 3 years of the quarter data
), serves as the input, and the value immediately following this sequence serves as the target for prediction. - Time series forecasting with neural networks, especially LSTMs, requires input in the form of sequences. Rather than predicting a value based on a single timestamp, the model predicts based on a series of previous observations. This allows the model to capture temporal patterns and dependencies over the given window size.
- Imagine the time series as:
[10, 20, 30, 40, 50]
andtw=2
. The transformation would yield:
- Feature(X): [10, 20], [20, 30], [30, 40]
- Target(y): 30, 40, 50
— Normalization:
# Normalize the data
scaler = MinMaxScaler()
gdp_normalized = scaler.fit_transform(gdp)
- Since Neural networks, including LSTMs, often perform better when input values are within a small scale. Large input values can lead to larger gradients, making training unstable. Normalizing helps in improving the model’s convergence speed and overall performance. Furthermore, it ensures that all features (if more than one feature is used) contribute equally to the model’s training, irrespective of their original scales. Normalizing means adjusting these values to fit within a specified range, here we use [0, 1] in the case of MinMaxScaler.
Values are adjusted to range between 0 and 1:
— Train-Test Split:
# Split data into train and test sets
train, test = gdp_normalized[:train_size], gdp_normalized[train_size:]
- 70% of the data is used for training, and the remaining 30% is for testing.
3. Model:
— LSTM Architecture:
# Define LSTM model
class LSTM(nn.Module):
def __init__(self, input_dim=1, hidden_dim=15, output_dim=1, layer_num=1):
super().__init__()
# Define LSTM layer
self.lstm = nn.LSTM(input_dim, hidden_dim, layer_num, batch_first=True)
# Define output layer
self.linear = nn.Linear(hidden_dim, output_dim)
# Define forward pass through the network
def forward(self, input_seq):
lstm_out, _ = self.lstm(input_seq)
predictions = self.linear(lstm_out)
return predictions
- Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) optimized for time series and sequences. The key advantage of LSTMs over standard RNNs is their ability to avoid long-term dependency issues, making them well-suited for our GDP time series which has patterns spread out over long durations.
- The model created has one LSTM layer followed by a linear layer. The LSTM layer captures the sequential dependencies of the data, while the linear layer maps the LSTM outputs to the desired output shape, providing the predictions.
— Hyperparameters:
# Set hyperparameters
tw = 12
# Initialize LSTM model, optimizer and loss function
model = LSTM()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_function = nn.MSELoss()
- Training Window (tw): This represents how many previous time steps (quarters in our case) are considered to predict the next time step. A value of 15 suggests that the GDP of the next quarter is predicted based on the GDP of the last 12 quarters(that is 3 years). The choice of this window can impact the model’s ability to capture long-term patterns.
- Hidden Dimensions of LSTM defines the number of LSTM cells or neurons in the hidden layer. More neurons allow for a greater capacity of the model to learn, but it can also make the model prone to overfitting if not regularized properly.
- Learning Rate: This parameter controls the step size during gradient descent optimization. A smaller learning rate ensures that the model doesn’t overshoot the optimal point but can be slower to converge, while a larger learning rate can speed up training but risks skipping over the optimum.
- Loss Function: Using Mean Square Error (MSE) for regression tasks is standard as it penalizes larger errors more than smaller ones, ensuring that our model learns to make predictions as close as possible to the actual values. Reporting RMSE, the square root of MSE, makes the error interpretable in the original units of the data..
— Training Strategy:
for epoch in range(epochs_n):
model.train()
for x_batch, y_batch in loader:
model.zero_grad()
pred_y = model(x_batch)
loss = loss_function(pred_y, y_batch)
loss.backward()
optimizer.step()
# Print losses every 100 epochs
if epoch % 100 == 0:
with torch.no_grad():
model.eval()
pred_y = model(train_X)
train_loss = np.sqrt(loss_function(pred_y, train_y))
train_rmse_list.append(train_loss.item())
pred_test_y = model(test_y)
test_loss = np.sqrt(loss_function(pred_test_y, test_y))
test_rmse_list.append(test_loss.item())
print(f"Epoch {epoch}: Training RMSE: {train_loss:.4f}, Testing RMSE: {test_loss:.4f}")
# Check for increasing RMSE
if test_loss > prev_test_loss:
increasing_count += 1
else:
increasing_count = 0
# Update the previous test loss
prev_test_loss = test_loss
# Stop if RMSE increased consecutively for 2 checkpoints
if increasing_count == 2:
print("Stopping early due to increasing test RMSE.")
break
- From the graph below, it’s evident that while the testing RMSE shows a decreasing trend, the training RMSE has periodic increments throughout the epochs.
- The model undergoes training for a maximum of 1000 epochs. An early stopping mechanism is in place: if the test data's Root Mean Square Error (RMSE) rises for two successive checkpoints (evaluated every 100 epochs), the training is halted.
In our final forecast, the predicted values align closely with the actual figures. Yet, there’s a caveat.
When delving into time series analysis, especially in the realm of economics, one cannot overlook the potential impact of external shocks or unforeseen events on the dataset. Sudden oscillations in GDP, be it a steep rise or decline, could be attributed to factors like worldwide economic downturns, breakthroughs in technology, pivotal political decisions, and so on. While LSTMs excel in discerning patterns intrinsic to the data, they lack the innate capability to factor in these external influences. Hence, a fusion of time series models with supplementary data or the incorporation of intricate structures like hybrid models might pave the way for forecasts that are both more precise and resilient.
Reference:
https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
https://machinelearningmastery.com/lstm-for-time-series-prediction-in-pytorch/
https://towardsdatascience.com/pytorch-lstms-for-time-series-data-cd16190929d7