So You Think You Can Predict Stocks with AI? An LSTM Stock Predictor Reality Check.
- Tushar Prasad

- Sep 15
- 28 min read
Updated: Sep 15

Ah, the stock market - the ultimate casino where fortunes are made and lost faster than you can say "diversified portfolio". For decades, predicting its next move has been the holy grail for everyone from Wall Street wizards to your uncle who swears by his "gut feeling." But what if we could do better than just a gut feeling? What if we could use a little bit of math, a dash of data, and a whole lot of neural network magic?
Enter the Long Short-Term Memory (LSTM) network. It sounds like something out of a sci-fi movie, and honestly, it kind of is. LSTMs are a special type of recurrent neural network (RNN) with a knack for remembering things for a long time - a skill that's surprisingly useful when dealing with the chaotic memory of the stock market. Unlike us humans who forget where we put our keys, LSTMs can recall important patterns from the past to make educated guesses about the future.
In this blog post, we're going to roll up our sleeves and build an LSTM model from the ground up to predict stock prices. We'll be using the Dow Jones Index dataset, a classic playground for aspiring data scientists. We’ll walk through everything step-by-step, from wrangling the data into shape to tuning our model until it's a lean, mean, prediction machine. We'll even throw in some other models for a good old-fashioned AI showdown.
So, grab your coffee, fire up your Jupyter Notebook, and let's see if we can teach a machine to make sense of the market.
(Disclaimer: This is for educational purposes only. Please don't bet your life savings on our model. Seriously.)
Our Crystal Ball: The Dow Jones Dataset
Every data science adventure needs a map, and for us, that map is our dataset. We’re not just pulling numbers out of thin air; we're using the Dow Jones Index Dataset (https://archive.ics.uci.edu/dataset/312/dow+jones+index) from the UCI Machine Learning Repository. It’s a tidy little collection of weekly stock data for 30 of the big players in the Dow Jones Industrial Average, like Alcoa (AA), Bank of America (BAC), and Microsoft (MSFT).
Now, before you get too excited, this "crystal ball" only sees a small slice of time: January to June of 2011. It’s not exactly a decade of market history, but with about 750 weekly records across all 30 stocks, it's more than enough for us to teach our LSTM a thing or two.
What's in the Box? A Look at the Features
So, what kind of intel does this dataset give us? Let's unpack it. Each row represents one week of trading for a single company, packed with some juicy details.
Here’s the full manifest:
Column Name | Description | Data Type |
quarter | Fiscal quarter of data (1–4) | Integer |
stock | Stock ticker symbol (e.g., AA, AXP) | Categorical |
date | Week-ending date of the record | Date |
open | Stock price at market open (that week) | Numeric (float) |
high | Highest stock price during the week | Numeric (float) |
low | Lowest stock price during the week | Numeric (float) |
close | Stock price at market close (that week) | Numeric (float) |
volume | Total trading volume during the week | Numeric (float) |
percent_change_price | Percent change from opening to closing price | Numeric (float) |
percent_change_volume_over_last_wk | Percent change in volume relative to the previous week | Numeric (float) |
previous_weeks_volume | Trading volume of the previous week | Numeric (float) |
next_weeks_open | Stock price at market open (next week) | Numeric (float) |
next_weeks_close | Stock price at market close (next week) — Target | Numeric |
percent_change_next_weeks_price | Percent change in price from next week’s open to close | Numeric (float) |
days_to_next_dividend | Number of days until the next dividend is paid | Integer |
percent_return_next_dividend | Percent return from the next dividend relative to stock price | Numeric (float) |
Okay, that's a lot to take in. We've got the basics that any stock chart enthusiast will recognize: open, high, low, close, and volume. Standard stuff.
But look closer. You see next_weeks_open and next_weeks_close? That’s literally data from the future. Using these as input features to predict the future would be like getting a copy of next week's exam. It’s cheating, plain and simple, and it’s a classic machine learning pitfall called data leakage. We're trying to build a genuine prediction model here, not a time machine with an obvious flaw. So, we’ll be making a mental note to ignore those columns when we build our model.
The Target: What Are We Trying to Predict?
Amidst all those features, one stands out. It’s our North Star, our pot of gold at the end of the rainbow.
The Target Variable: next_weeks_close (Numeric)
That's it. Our entire mission, should we choose to accept it, is to take the data from a given week and predict the closing price for the following week. We'll use historical patterns in price and volume to forecast that one single number.
Now that we know our data and our goal, it's time to get our hands dirty. Next up: we'll load this data and whip it into shape with some preprocessing.
Wrangling the Beast: Data Loading and Preprocessing
If data science were a cooking show, this would be the part where we wash our vegetables. It's not glamorous, it's not what the audience came to see, but if you skip it, your final dish will be a disaster. Real-world data is rarely as pristine as it is in textbooks; it comes with weird characters, wrong data types, and a general sense of chaos.
Our first task is to tame this dataset and get it ready for our sophisticated LSTM model.
First, let's load the data using pandas and take a look at the first few rows.
import pandas as pd
# Load file with first row as header
df = pd.read_csv("dow_jones_index.data")
df.head()
See that? The open, high, low, and close columns are littered with dollar signs. It looks nice for a report, but for our Python script, it's a nightmare. Pandas has dutifully loaded them as text (object) types, which means we can't do any math with them. Let’s run df.info() to confirm our suspicions.

Yep, just as we thought. Eight columns are object types when they should be numbers or dates. Also, notice the Non-Null Count for a couple of columns is less than 750, indicating missing values. We'll ignore those for now since we won't be using those features in our initial model.
From Dirty Data to Sparkling Clean: The Preprocessing Function
To fix this mess, we'll create a handy function. This keeps our code clean and lets us re-use the steps if we ever get more, equally messy data.
Here’s our three-step plan to data nirvana:
Exorcise the Dollar Signs: Strip out the $ from all price columns and convert them to numbers.
Time is of the Essence: Convert the date column into a proper datetime format so we can treat it like, well, time.
Get in Line: Sort the data by stock and then date to ensure everything is in perfect chronological order. An LSTM is all about sequence, and feeding it jumbled-up data is a recipe for a very confused model.
Here’s the code that does all the heavy lifting:
import pandas as pd
def preprocess_dow_jones(path):
# Load raw CSV
df = pd.read_csv(path)
# Remove $ and convert to float for price columns
money_cols = ["open", "high", "low", "close", "next_weeks_open", "next_weeks_close"]
for col in money_cols:
df[col] = df[col].str.replace("$", "", regex=False).astype(float)
# Convert date column to datetime
df["date"] = pd.to_datetime(df["date"])
# Sort values for time-series use (per stock, by date)
df = df.sort_values(by=["stock", "date"]).reset_index(drop=True)
return df
df_v1 = preprocess_dow_jones("dow_jones_index.data")Are We There Yet? Checking the Cleaned Data
Now that our function has worked its magic, let's see the result.

Beautiful. The dollar signs are gone, and the date column is now in a standard YYYY-MM-DD format.

Success! All our numeric columns are now float64 or int64, and date is a datetime64[ns] type. Our data is now clean, sorted, and ready for the next stage: a quick exploratory analysis to see what we're working with.
A Quick Peek Under the Hood: Exploratory Data Analysis (EDA)
Welcome to the EDA phase, which is basically the data science equivalent of a first date. Before we commit to building a complex model, we need to ask our data some questions, see how it behaves, and check for any red flags. It’s all about understanding its personality before things get serious.
Who’s Who: A Look at the Stocks
First things first, let's confirm how much data we have for each company. This is like checking if all our players have shown up for the game. We can do this with a quick value_counts() on the stock column.
df_v1["stock"].value_counts()
No surprises here. All 30 stocks are present and accounted for, each with a neat 25 weeks of data. This is good because it's consistent, but it also confirms our earlier suspicion: 25 data points per stock isn't a lot to go on. An LSTM, like a seasoned detective, needs a decent amount of evidence (data) to spot a pattern. This is why we'll later make the assumption to combine them.
Price Check: Visualizing Stock Price Distributions
Now for the fun part. Let's create a visual to see how the closing prices of these companies stack up against each other. Numbers in a table are fine, but a good chart is worth a thousand data points. We'll use a boxplot, which is a fantastic way to see the range, median, and spread of prices for each stock all at once.
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(12,6))
sns.boxplot(x="stock", y="close", data=df_v1)
plt.xticks(rotation=90)
plt.title("Distribution of Closing Prices by Company")
plt.show()
Aha! Now we're getting somewhere. This chart tells a story:
The Stable Majority: Look at all those short, tidy boxes. Most of these corporate giants have weekly closing prices that don't jump around too much. Their prices stay within a fairly narrow and predictable range. Boring? Maybe. Good for our model? Absolutely. Predictability is our friend.
The Drama Queens: Then you have the outliers. Look at IBM and CAT over there on the right. Their boxes are not only higher up (meaning higher average prices), but they're also much taller. That height represents a wider range of prices, indicating greater volatility. These are the stocks that keep investors up at night, with bigger swings up and down. For a model, this volatility can be a challenge, but it’s also where the most interesting patterns might be hiding.
With this quick check-up complete, we have a much better feel for our data. We know its structure, its quirks, and its potential challenges. Now, it's time to build our time machine.
Next up, we dive into the main event: setting up and training our LSTM.
The Main Event: Building Our LSTM for Stock Forecasting
Alright, the moment we've all been waiting for. We’ve prepped our data, scouted the terrain with EDA, and now it's time to assemble our weapon of choice: the Long Short-Term Memory (LSTM) network. This is the part where we move from data janitors to data architects, designing a model that can learn from the past to predict the future.
A Small Assumption for a Big Payoff
Before we write a single line of PyTorch code, let's address the elephant in the room: our data is a bit sparse. With only 25 weekly data points per stock, training a separate LSTM for each company would be like trying to learn a language from a single pamphlet. The model would barely learn anything before running out of data, likely leading to poor performance and overfitting.
So, we're going to make a pragmatic simplification:
We will treat the closing prices of all 30 stocks as a single, continuous time series.
What it means: Instead of feeding your LSTM 30 separate sequences of 25 data points each, you're taking all the closing prices from all 30 stocks and stitching them together into one long sequence of roughly 750 data points (30 stocks * 25 points/stock).
Example (simplified):
Original: [MSFT_week1, MSFT_week2, ..., MSFT_week25], [VZ_week1, VZ_week2, ..., VZ_week25], etc. (30 separate lists)
Simplified: [MSFT_week1, ..., MSFT_week25, VZ_week1, ..., VZ_week25, ..., ZM_week1, ..., ZM_week25] (One very long list)
Is this technically correct? No. In the real world, the price of Microsoft (MSFT) in Week 25 has absolutely no direct sequential relationship with the price of Verizon (VZ) in Week 1. These are distinct companies with independent price movements (even if they share market factors). You are artificially creating a continuous sequence where one doesn't naturally exist. But for the purpose of this tutorial, this approach gives us a much larger dataset (750 points) to work with. It allows us to demonstrate the mechanics of building, tuning, and evaluating an LSTM without getting bogged down by the complexities of multi-stock modeling. Think of it as a training simulation before we tackle the real thing.
Before we get our hands dirty with code, let's quickly demystify the star of our show: the Long Short-Term Memory (LSTM) network.
At its core, an LSTM is a type of Recurrent Neural Network (RNN). An RNN is a neural network designed to handle sequences of data - like sentences, audio clips, or, in our case, a series of stock prices over time. Unlike a standard neural network that treats every input as independent, an RNN has a form of memory. It processes data step-by-step, and at each step, it takes the current input and combines it with a "hidden state" that contains information from all the previous steps. Think of it as reading a sentence: you understand each word based on the words that came before it.
So, what makes an LSTM special?
A simple RNN has a bit of a memory problem. It's great at remembering things from the immediate past (short-term memory), but it struggles to hold onto information from long ago. This is known as the vanishing gradient problem, where the network's ability to learn from earlier time steps fades as the sequence gets longer. For stock prices, where a trend might have started weeks or even months ago, this is a deal-breaker.
LSTMs solve this with a clever architectural upgrade: gates. An LSTM cell contains three main gates that act like tiny, trainable valves controlling the flow of information:
Forget Gate: This gate decides what information to throw away from the cell's long-term memory (called the "cell state"). For example, it might learn that an old, irrelevant price spike is no longer important for predicting the current trend.
Input Gate: This gate decides what new information to store in the cell state. It looks at the current stock price and decides which parts of it are important enough to remember.
Output Gate: This gate determines what information from the cell state should be used to make the current prediction. It filters the memory to produce a relevant output.
By using these gates, an LSTM can selectively remember important patterns over very long sequences while discarding the noise. It’s this ability to maintain both a "long-term" and "short-term" memory that makes it so powerful for time-series forecasting.
Now, with that little bit of theory under our belt, let's get back to the action.
The Art of Looking Back: Tuning the Sequence Length
The first and most critical hyperparameter we need to decide on is the sequence length (or "look-back period"). This determines how many past weeks of data our LSTM will consider before making a prediction for the next week.
If the sequence is too short (e.g., 2 weeks), the model might miss longer-term trends.
If it's too long (e.g., 20 weeks), it might get bogged down by irrelevant old data and take forever to train.
Finding the sweet spot is key. We'll approach this scientifically: by running an experiment.
We'll test a range of sequence lengths from 4 to 8 weeks and see which one gives our LSTM the best predictive power. For each length, we'll perform the following steps:
Create Sequences: We need to transform our flat list of closing prices into overlapping windows of data. For a sequence length of 5, the first window would be [week1, week2, week3, week4, week5] to predict week6. The next would be [week2, week3, week4, week5, week6] to predict week7, and so on.
Split the Data: We'll use the first 80% of our sequences for training and the remaining 20% for testing. Crucially, this is a chronological split, not a random one. We must train on the past to predict the future.
Scale the Data: Neural networks are sensitive to the scale of input data. We'll use a MinMaxScaler to scale all our closing prices to a range of. This helps the model train faster and more stably.
Train and Evaluate: For each sequence length, we'll train a basic LSTM model for 50 epochs and record its performance on the test set using standard metrics:
MSE (Mean Squared Error): Punishes larger errors more heavily. Lower is better.
RMSE (Root Mean Squared Error): MSE in the original units (dollars). Lower is better.
MAE (Mean Absolute Error): Average absolute error. Lower is better.
R² (R-squared): The proportion of variance explained by the model. Higher is better (closer to 1).
Before we analyze the results, let's quickly define the terms we're using to judge our model's performance. Think of these as the judges' scorecards in our AI competition
Epochs: An epoch is one complete pass through the entire training dataset. In our experiment, we're training each model for 50 epochs, meaning it will see and learn from the full training data 50 times. This repetition helps the model gradually adjust its internal weights to minimize errors.
And here are the scores we're looking at:
MSE (Mean Squared Error): This metric calculates the average of the squared differences between the predicted and actual values. By squaring the errors, it penalizes large mistakes much more heavily than small ones. It's a great way to see if our model is making any wild, unacceptable predictions. Lower is better.
RMSE (Root Mean Squared Error): This is simply the square root of the MSE. Its main advantage is that it brings the error metric back into the same units as the target variable. In our case, the RMSE will be in dollars, making it much more intuitive. An RMSE of 5.5 means our model is, on average, off by about $5.50. Lower is better.
MAE (Mean Absolute Error): This is the average of the absolute differences between the predicted and actual values. Unlike MSE, it treats all errors equally, regardless of their size. It gives a straightforward, average "how far off are we?" number. Lower is better.
R² (R-squared): This metric, also known as the coefficient of determination, tells us what proportion of the variance in the actual prices is explained by our model. An R² of 0.91 means our model can account for 91% of the price fluctuations. It's a great measure of how well our model fits the data. Higher is better (closer to 1).
Here's the full code for our tuning experiment. It's a bit of a beast, but we'll break it down.
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import matplotlib.pyplot as plt
# Prepare the data
data = df_v1["close"].values.reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(0, 1))
data_scaled = scaler.fit_transform(data)
torch.manual_seed(SEED)
# Function to create sequences
def create_sequences(data, seq_len=5):
X, y = [], []
for i in range(len(data) - seq_len):
X.append(data[i:i+seq_len])
y.append(data[i+seq_len])
return np.array(X), np.array(y)
# LSTM Model Definition
class LSTMModel(nn.Module):
def __init__(self, input_size=1, hidden_size=32, num_layers=1):
super(LSTMModel, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, 1)
def forward(self, x):
# x shape: (batch_size, seq_len, input_size)
# Get output, final hidden state (hn), and final cell state (cn)
# We only need the final hidden state
_, (hn, cn) = self.lstm(x)
# hn shape: (num_layers, batch_size, hidden_size)
# We use the hidden state of the last layer for prediction
out = self.fc(hn[-1])
return out
# Training and Evaluation Function
def train_and_evaluate(seq_len):
X, y = create_sequences(data_scaled, seq_len)
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)
model = LSTMModel()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
for epoch in range(50):
model.train()
optimizer.zero_grad()
output = model(X_train)
loss = criterion(output, y_train)
loss.backward()
optimizer.step()
# Evaluate
model.eval()
with torch.no_grad():
y_pred = model(X_test).numpy()
y_true = y_test.numpy()
mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
return mse, rmse, mae, r2
# Run experiments for different sequence lengths (NEW RANGE)
seq_lengths = [4,5,6, 7, 8]
results = {"seq_len": [], "mse": [], "rmse": [], "mae": [], "r2": []}
for sl in seq_lengths:
mse, rmse, mae, r2 = train_and_evaluate(sl)
results["seq_len"].append(sl)
results["mse"].append(mse)
results["rmse"].append(rmse)
results["mae"].append(mae)
results["r2"].append(r2)
print(f"Done: seq_len={sl}")
# Plotting function
def plot_results(results):
fig, axs = plt.subplots(2, 2, figsize=(12,8))
axs[0,0].plot(results["seq_len"], results["mse"], marker="o")
axs[0,0].set_title("MSE vs Seq Length")
axs[0,1].plot(results["seq_len"], results["rmse"], marker="o")
axs[0,1].set_title("RMSE vs Seq Length")
axs[1,0].plot(results["seq_len"], results["mae"], marker="o")
axs[1,0].set_title("MAE vs Seq Length")
axs[1,1].plot(results["seq_len"], results["r2"], marker="o")
axs[1,1].set_title("R² vs Seq Length")
for ax in axs.flat:
ax.set_xlabel("Sequence Length")
ax.grid(True)
plt.tight_layout()
plt.show()
plot_results(results)Code Walkthrough: What’s Actually Happening Here?
That was a lot of code, so let's quickly walk through the key parts to understand what each block is doing.
Data Preparation:
data = df_v1["close"].values.reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(0, 1))
data_scaled = scaler.fit_transform(data)First, we grab just the close column, which is our feature for this univariate experiment. We use a MinMaxScaler to squash all the price values into a range between 0 and 1. This normalization is crucial for helping the neural network learn effectively.
Create_Sequences Function:
def create_sequences(data, seq_len=5):
X, y = [], []
for i in range(len(data) - seq_len):
X.append(data[i:i+seq_len])
y.append(data[i+seq_len])
return np.array(X), np.array(y)This function is the heart of our time-series setup. It slides a window of size seq_len across our data. The data inside the window becomes our input (X), and the very next data point becomes our target (y).
LSTMModel Class:
class LSTMModel(nn.Module):
def __init__(self, input_size=1, hidden_size=32, num_layers=1):
super(LSTMModel, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, 1)
def forward(self, x):
_, (hn, cn) = self.lstm(x)
out = self.fc(hn[-1])
return outThis is our model architecture, defined using PyTorch's nn.Module.
nn.LSTM(...): This creates the LSTM layer. batch_first=True is a handy setting that tells PyTorch to expect our data in the shape (batch_size, sequence_length, features).
nn.Linear(...): This is a standard fully-connected layer that will take the final output from the LSTM and map it to our single prediction (the closing price).
forward(self, x): In the forward pass, we feed our input x through the LSTM. The LSTM returns its full output sequence and the final hidden state (hn) and cell state (cn). For a one-step prediction, we only care about the very last hidden state (hn[-1]), which we pass to our linear layer to get the final prediction.
Train_and_Evaluate Function:
def train_and_evaluate(seq_len):
# ... data splitting ...
# Training loop
for epoch in range(50):
model.train()
optimizer.zero_grad()
output = model(X_train)
loss = criterion(output, y_train)
loss.backward()
optimizer.step()
# ... evaluation ...
return mse, rmse, mae, r2This function orchestrates the entire experiment for a given seq_len. It splits the data, initializes the model, and then runs the training loop for 50 epochs. Inside the loop, it calculates the model's prediction (output), compares it to the true values (y_train) to get a loss, and then uses loss.backward() and optimizer.step() to adjust the model's weights to reduce that loss. Finally, it evaluates the trained model on the unseen test data and returns the performance metrics.
Plotting:
def plot_results(results):
fig, axs = plt.subplots(2, 2, figsize=(12,8))
axs[0,0].plot(results["seq_len"], results["mse"], marker="o")
axs[0,0].set_title("MSE vs Seq Length")
axs[0,1].plot(results["seq_len"], results["rmse"], marker="o")
axs[0,1].set_title("RMSE vs Seq Length")
axs[1,0].plot(results["seq_len"], results["mae"], marker="o")
axs[1,0].set_title("MAE vs Seq Length")
axs[1,1].plot(results["seq_len"], results["r2"], marker="o")
axs[1,1].set_title("R² vs Seq Length")
for ax in axs.flat:
ax.set_xlabel("Sequence Length")
ax.grid(True)
plt.tight_layout()
plt.show()
plot_results(results)

The results are in, and the winner is clear. Both the table and the charts point to sequence length 7 as the champion. It achieves the lowest error across all metrics (MSE, RMSE, MAE) and the highest R² score (a whopping 0.914).
This tells us that for our dataset, using the past seven weeks of closing prices provides the optimal balance of historical context for the LSTM to make its most accurate prediction for the next week. It’s not too little information, and it’s not too much noise. It’s just right.
Now that we've found our ideal look-back period, the next step is to tune the LSTM's internal architecture its hidden size and number of layers to see if we can squeeze even more performance out of our model.
Fine-Tuning Our Time Machine: Optimizing the LSTM
We've figured out the perfect look-back window for our model: 7 weeks. That's a great start, but it's only half the battle. Now, we need to tune the internal architecture of the LSTM itself. This is like moving from choosing the right textbook for a student to deciding how many hours they should study and how many subjects they should focus on at once.
We'll be tweaking two key hyperparameters:
Hidden Size: This is the number of neurons (or memory cells) inside each LSTM layer. A larger hidden size means the model has more "brainpower" to learn complex patterns. Think of it as the model's memory capacity. Too small, and it might not be smart enough to capture the market's nuances. Too large, and it might start "memorizing" the training data instead of learning the general trends a classic case of overfitting.
Number of Layers: We can stack LSTM layers on top of each other. A single-layer LSTM is good, but a multi-layer (or "deep") LSTM can learn hierarchical patterns. The first layer might learn simple price movements, the second might learn to combine those into short-term trends, and a third might recognize longer-term market cycles. However, adding more layers makes the model more complex and slower to train.
Our mission is to find the Goldilocks combination: not too simple, not too complex, but just right.
With our sequence length locked in at 7, we'll run a grid search to test various combinations of hidden sizes and layers. This is a brute-force but effective way to find the best architecture.
Hidden Sizes to Test: [32, 64, 128, 256]
Number of Layers to Test: [1, 2, 3]
This gives us a total of 4 x 3 = 12 combinations to evaluate. For each combination, we'll train a new LSTM and measure its performance on our test set.
Here's the code for our grid search. It's very similar to our previous experiment, but now we're passing hidden_size and num_layers as arguments to our training function.
# Grid Search Setup
seq_len = 7 # The winner from our last experiment
hidden_sizes = [32, 64, 128, 256]
num_layers_list = [1, 2, 3]
# Store results
results = {"hidden_size": [], "num_layers": [], "mse": [], "rmse": [], "mae": [], "r2": []}
# Training and Evaluation Function (updated to accept hyperparameters)
def train_and_evaluate(seq_len, hidden_size, num_layers):
X, y = create_sequences(data_scaled, seq_len)
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)
# Model is now initialized with the specific hidden_size and num_layers
model = LSTMModel(hidden_size=hidden_size, num_layers=num_layers)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
for epoch in range(50):
model.train()
optimizer.zero_grad()
output = model(X_train)
loss = criterion(output, y_train)
loss.backward()
optimizer.step()
model.eval()
with torch.no_grad():
y_pred = model(X_test).numpy()
y_true = y_test.numpy()
mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
return mse, rmse, mae, r2
# The actual grid search loop
for hs in hidden_sizes:
for nl in num_layers_list:
mse, rmse, mae, r2 = train_and_evaluate(seq_len, hs, nl)
results["hidden_size"].append(hs)
results["num_layers"].append(nl)
results["mse"].append(mse)
results["rmse"].append(rmse)
results["mae"].append(mae)
results["r2"].append(r2)
print(f"Done: hidden_size={hs}, num_layers={nl}")
# Convert results to a DataFrame for easy viewing
df_results = pd.DataFrame(results)
print(df_results)

Let's dissect these results:
The Sweet Spot: All four plots tell a similar story. The blue and orange lines (1 and 2 layers) consistently outperform the green line (3 layers). The best performance across the board—lowest errors and highest R²—occurs with 2 layers and a hidden size of 64. This model (the orange line's dip at hidden_size=64) hit an R² of nearly 0.92, a noticeable improvement over our initial model.
The Curse of Complexity: Look at what happens with 3 layers (green line). As the hidden size increases, performance gets dramatically worse. The model with 256 hidden units and 3 layers completely falls apart, with a negative R² score! This means it performed worse than just predicting the average price every time. This is a classic sign of an overly complex model that has completely failed to generalize. It's like a student who memorized the textbook but can't answer a single question that isn't a direct quote.
Conclusion: Our optimal architecture is a 2-layer LSTM with a hidden size of 64. A single layer performs reasonably well, but adding a second layer gives the model enough depth to capture more complex patterns. Adding a third layer, however, is overkill for this dataset and leads to disastrous results.
Now we have all the pieces of the puzzle: the right data, the optimal sequence length, and the best architecture. It's time to put it all together, train our champion model, and see how it performs in the final showdown.
The Final Showdown: Training and Evaluating Our Best Model
This is it. We've found the optimal configuration for our LSTM time machine. No more experiments, no more tuning. We're now taking everything we've learned and building our final, definitive model.
Our recipe for success is as follows:
Sequence Length: 7 (the best look-back period)
Hidden Size: 64 (the perfect brainpower balance)
Number of Layers: 2 (deep enough, but not too deep)
We'll train this model for 100 epochs, giving it ample time to learn the patterns in the training data. Then, we'll unleash it on the test data the part of the dataset it has never seen before to see how well it really performs. This is the final exam for our model.
Putting It All Together: The Final LSTM
Here is the complete code to train our final model. It combines all the steps: data preparation, model instantiation with our chosen hyperparameters, the training loop, and finally, the evaluation on both the training and test sets.
Pay close attention to the evaluate_and_print function. After making predictions (which are on the scaled 0-1 range), we use scaler.inverse_transform() to convert them back into actual dollar values. This is crucial for interpreting the results in a meaningful way.
# Final Model Training
torch.manual_seed(SEED)
# --- 1. Data Prep ---
# Get the 'close' price data and scale it
data = df_v1["close"].values.reshape(-1, 1)
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data)
# Use our optimal sequence length
SEQ_LEN = 7
X, y = create_sequences(data_scaled, SEQ_LEN)
# Chronological train-test split
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
# Convert to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)
# --- 2. Model Definition & Instantiation ---
class LSTMModel(nn.Module):
def __init__(self, input_size=1, hidden_size=32, num_layers=2):
super(LSTMModel, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, 1)
def forward(self, x):
_, (hn, cn) = self.lstm(x)
out = self.fc(hn[-1])
return out
# Instantiate with our optimal hyperparameters
model = LSTMModel(hidden_size=128, num_layers=2)
# --- 3. Training the Model ---
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
EPOCHS = 100
for epoch in range(EPOCHS):
model.train()
optimizer.zero_grad()
output = model(X_train)
loss = criterion(output, y_train)
loss.backward()
optimizer.step()
if (epoch+1) % 10 == 0:
print(f"Epoch {epoch+1}/{EPOCHS}, Loss: {loss.item():.6f}")
# --- 4. Evaluation ---
model.eval()
with torch.no_grad():
train_pred = model(X_train).numpy()
test_pred = model(X_test).numpy()
# Inverse transform to get actual prices back
train_pred = scaler.inverse_transform(train_pred)
y_train_real = scaler.inverse_transform(y_train.numpy())
test_pred = scaler.inverse_transform(test_pred)
y_test_real = scaler.inverse_transform(y_test.numpy())
# A helper function to print our metrics nicely
def evaluate_and_print(y_train_true, y_train_pred, y_test_true, y_test_pred):
def _metrics(y_true, y_pred):
mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
return mse, rmse, mae, r2
train_mse, train_rmse, train_mae, train_r2 = _metrics(y_train_true, y_train_pred)
print(f"Train MSE: {train_mse:.4f}, RMSE: {train_rmse:.4f}, MAE: {train_mae:.4f}, R2 score: {train_r2:.4f}")
test_mse, test_rmse, test_mae, test_r2 = _metrics(y_test_true, y_test_pred)
print(f"Test MSE: {test_mse:.4f}, RMSE: {test_rmse:.4f}, MAE: {test_mae:.4f}, R2 score: {test_r2:.4f}")
evaluate_and_print(y_train_real, train_pred, y_test_real, test_pred)
The results are fantastic. Our model achieves an R² of 0.927 on the test data. This means it can explain roughly 93% of the variance in the unseen stock prices. The RMSE of $5.57 tells us that, on average, our predictions are off by about five and a half dollars, which is quite impressive given the price range of some of these stocks.
The Moment of Truth: Actual vs. Predicted Prices
Metrics are one thing, but a picture is worth a thousand numbers. Let's plot our model's predictions against the actual closing prices on the test set. The blue line represents the real-world data, and the red dashed line is what our LSTM predicted.
plt.figure(figsize=(12,6))
plt.plot(y_test_real, label="Actual Close Price", color="blue")
plt.plot(test_pred, label="Predicted Close Price", color="red", linestyle="--")
plt.title("LSTM: Actual vs. Predicted Closing Prices on Test Set")
plt.legend()
plt.show()
Just look at that. The two lines are nearly on top of each other. This is the visual proof of our high R² score.
Observations:
Trend Tracking is Superb: The predicted curve (red) follows the actual trend (blue) with remarkable accuracy. It captures the overall upward and downward movements almost perfectly.
Handles Volatility: Even where there are sharp jumps or drops in the actual price, the LSTM's predictions are not far behind. The deviations are minimal, which shows that its memory of the past 7 weeks is giving it enough context to anticipate these shifts.
This is the power of LSTMs in action. By understanding the sequence, the model isn't just making a wild guess; it's making an informed forecast based on the temporal patterns it learned during training.
But the question remains: is the LSTM's complexity truly necessary? Could a simpler model have done just as well? Let's find out in our next section, where we pit our LSTM against some other contenders.
Is LSTM Really the MVP? A Comparison with Simpler Models
Our LSTM is looking pretty sharp. It’s tracking the stock prices like a seasoned analyst, and the R² score is something to be proud of. But in the world of data science, you should never fall in love with your first model. It's always a good idea to have a reality check. Is the complexity of an LSTM with all its fancy gates and memory cells truly necessary? Or could a simpler, dumber model have achieved the same results?
To answer this, we'll stage a little AI bake-off. We're putting our LSTM up against two other contenders:
The MLP (Multi-Layer Perceptron): This is your standard, bread-and-butter neural network. It's a feedforward network, meaning it has no concept of time or sequence. To make it work, we have to flatten our 7-week sequences into a simple list of 7 numbers. The MLP will treat these as 7 independent features, completely ignoring their chronological order. It's like trying to understand a movie by looking at 7 random frames. To give the LSTM a fighting chance, we'll even handicap the MLP by giving it a smaller hidden layer (16 neurons).
The RNN (Recurrent Neural Network): This is the LSTM's older, simpler sibling. Like the LSTM, it processes data sequentially and has a hidden state to maintain memory. However, it lacks the sophisticated gating mechanism of an LSTM, which, as we discussed, makes it prone to forgetting long-term patterns (the vanishing gradient problem).
This comparison will tell us if the LSTM's special memory architecture is what's giving it the edge.
Here’s the code to train all three models—MLP, RNN, and our final LSTM—on the same data splits and then compare their performance on the test set.
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
# --- Data Prep (same as before) ---
torch.manual_seed(SEED)
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(df_v1["close"].values.reshape(-1, 1))
SEQ_LEN = 7
X, y = create_sequences(data_scaled, SEQ_LEN)
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
# --- 1. MLP Model ---
class FFN(nn.Module):
def __init__(self, input_size, hidden_size=16):
super(FFN, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, 1)
def forward(self, x):
return self.fc2(self.relu(self.fc1(x)))
def train_ffn(X_train, y_train, X_test, y_test, epochs=100):
model = FFN(input_size=X_train.shape[1], hidden_size=16)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
for epoch in range(epochs): # Simplified loop for brevity
model.train()
optimizer.zero_grad()
output = model(X_train_t)
loss = criterion(output, y_train_t)
loss.backward()
optimizer.step()
model.eval()
with torch.no_grad():
test_pred = model(X_test_t).numpy()
return test_pred
# Flatten the sequences for the MLP
X_train_flat = X_train.reshape(X_train.shape[0], -1)
X_test_flat = X_test.reshape(X_test.shape[0], -1)
test_pred_ffn = train_ffn(X_train_flat, y_train, X_test_flat, y_test)
# --- 2. RNN Model ---
class RNNModel(nn.Module):
def __init__(self, input_size=1, hidden_size=32, num_layers=1):
super(RNNModel, self).__init__()
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, 1)
def forward(self, x):
out, h = self.rnn(x)
return self.fc(h[-1])
def train_rnn(X_train, y_train, X_test, y_test, epochs=100):
model = RNNModel(input_size=1, hidden_size=32, num_layers=1)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
for epoch in range(epochs): # Simplified loop
model.train()
optimizer.zero_grad()
output = model(X_train_t)
loss = criterion(output, y_train_t)
loss.backward()
optimizer.step()
model.eval()
with torch.no_grad():
test_pred = model(X_test_t).numpy()
return test_pred
test_pred_rnn = train_rnn(X_train, y_train, X_test, y_test)
# --- 3. LSTM Model (using our final trained model 'test_pred' from the previous section) ---
# Inverse transform all predictions to get actual prices
y_test_real = scaler.inverse_transform(y_test)
test_pred_lstm = scaler.inverse_transform(test_pred) # This is from the previous section
test_pred_ffn = scaler.inverse_transform(test_pred_ffn)
test_pred_rnn = scaler.inverse_transform(test_pred_rnn)
# --- 4. Consolidate and Compare Results ---
def get_metrics(y_true, y_pred, model_name):
return {
"Model": model_name,
"Test MSE": mean_squared_error(y_true, y_pred),
"Test RMSE": np.sqrt(mean_squared_error(y_true, y_pred)),
"Test MAE": mean_absolute_error(y_true, y_pred),
"Test R2": r2_score(y_true, y_pred)
}
results_list = [
get_metrics(y_test_real, test_pred_ffn, "MLP"),
get_metrics(y_test_real, test_pred_rnn, "RNN"),
get_metrics(y_test_real, test_pred_lstm, "LSTM")
]
results_df = pd.DataFrame(results_list)
print(results_df[['Model','Test MSE','Test RMSE','Test MAE','Test R2']])
Analysis:
MLP Takes Bronze: The MLP didn't do terribly an R² of 0.92 is nothing to sneeze at. However, it clearly lags behind the other two. By flattening the data, it lost the crucial temporal information. It learned that certain price levels are related to future prices, but it couldn't learn how the sequence of those prices matters.
RNN is a Strong Contender: The basic RNN performs very well, nearly matching the LSTM. This confirms that for this dataset (which has relatively short sequences), even a simple form of memory is a huge advantage over no memory at all.
LSTM Claims the Gold: Our fine-tuned LSTM comes out on top, albeit by a small margin. It has the lowest error (MSE, RMSE, MAE) and the highest R² score. This slight edge is thanks to its gating mechanism, which allowed it to more effectively learn and apply the patterns from the 7-week sequences. For a dataset with longer and more complex dependencies, the gap between the RNN and LSTM would likely be much wider.
Conclusion: For time-series forecasting, sequence matters. Models that can process data chronologically (RNNs and LSTMs) will almost always outperform models that can't (like the MLP). And while a basic RNN is good, the LSTM, with its superior memory architecture, remains the king of the hill.
More Data, More Power? Multi-Feature Forecasting
Our final experiment asks a simple question: can we improve our single-step prediction by giving the model more information? So far, we've only used the close price. Now, we'll feed it a richer, multivariate time series containing the open, high, low, close, and volume (often abbreviated as OHLCV) for the past 7 weeks.
The hope is that features like volume (how many shares were traded) and the weekly price range (high - low) will provide extra context that helps the model make a better prediction.
The model architecture remains the same, but we now need to adjust the input_size to 5 to accommodate our new features. We also need to scale each feature independently.
# Define the features to use
feature_cols = ["open", "high", "low", "close", "volume"]
data_multi = df_v1[feature_cols].values
target = df_v1["close"].values.reshape(-1, 1)
# Scale features and target separately
scaler_X = MinMaxScaler()
scaler_y = MinMaxScaler()
data_scaled = scaler_X.fit_transform(data_multi)
target_scaled = scaler_y.fit_transform(target)
# Create sequences with multiple features
def create_sequences_multi_feature(X, y, seq_len=7):
Xs, ys = [], []
for i in range(len(X) - seq_len):
Xs.append(X[i:i+seq_len])
ys.append(y[i+seq_len])
return np.array(Xs), np.array(ys)
X_feat, y_feat = create_sequences_multi_feature(data_scaled, target_scaled, SEQ_LEN)
# Train-test split
train_size = int(len(X_feat) * 0.8)
X_train_f, X_test_f = X_feat[:train_size], X_feat[train_size:]
y_train_f, y_test_f = y_feat[:train_size], y_feat[train_size:]
X_train_t = torch.tensor(X_train_f, dtype=torch.float32)
y_train_t = torch.tensor(y_train_f, dtype=torch.float32)
X_test_t = torch.tensor(X_test_f, dtype=torch.float32)
y_test_t = torch.tensor(y_test_f, dtype=torch.float32)
# LSTM model for multi-feature input
class LSTMModelMulti(nn.Module):
def __init__(self, input_size, hidden_size=128, num_layers=2):
super(LSTMModelMulti, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, 1)
def forward(self, x):
_, (hn, cn) = self.lstm(x)
return self.fc(hn[-1])
# Instantiate with input_size=5
num_features = X_feat.shape[2]
model_feat = LSTMModelMulti(input_size=num_features)
# Train and evaluate... (training loop is the same as before)
The result is... underwhelming. The R² score is 0.927, which is almost identical to (and slightly worse than) our single-feature model's score of 0.929.
This is a fascinating result. It suggests that for this dataset and prediction task, the extra features didn't add much new, useful information. The historical sequence of closing prices alone was powerful enough to capture the underlying patterns. The other features, in this case, were mostly redundant noise.
This is a valuable lesson in machine learning: more data is not always better data. Sometimes, a simple, clean signal is all you need.
Wrapping It Up: What Did We Actually Learn?
So, here we are at the end of our journey. We've wrestled with data, tuned hyperparameters, and pitted different AI models against each other in a battle for predictive supremacy. What's the final takeaway? Can we retire to a private island funded by our stock-predicting LSTM?
Well, not quite. But we did learn some incredibly valuable lessons along the way.
The Triumphs and Truths of Our LSTM
LSTMs Are Excellent Time Travelers (for the Near Future): Our final single-step LSTM performed brilliantly, achieving an R² of nearly 0.93. It proved that for time-series data, an architecture that understands and respects sequence is paramount. It successfully captured the trends, dips, and climbs in the stock prices with impressive accuracy.
Hyperparameters Are Not Suggestions, They're Rules: Our tuning experiments were not just academic exercises. We saw firsthand that the right sequence length (7) and architecture (2 layers, 64 hidden units) can make the difference between a good model and a great one. We also saw that too much complexity can be catastrophic, as our 3-layer, 256-unit model completely failed.
Simpler Isn't Always Better... But Sometimes It's Good Enough: The basic RNN came surprisingly close to the LSTM's performance. This reminds us that while LSTMs are powerful, they aren't always strictly necessary for simpler, short-term sequences. However, both an RNN and an LSTM soundly beat the MLP, proving that for time-series forecasting, memory is king.
More Data ≠ Better Predictions: In a counterintuitive twist, feeding our model more features (like open, high, low, and volume) didn't improve its performance. The closing price alone was the most powerful signal. This is a crucial lesson in feature engineering: always question whether more information is actually useful information.
Final Thoughts
So, can you predict stocks with AI? The answer is a resounding "yes, but..."
An LSTM can be an incredibly powerful tool for understanding and forecasting time-series data. It can spot patterns that are invisible to the human eye and make short-term predictions with a high degree of accuracy. However, it's not a magic crystal ball. The stock market is a complex beast, influenced by everything from earnings reports to global events to the whims of human emotion.
What we've built here is a fantastic starting point. A real-world trading model would need to incorporate far more data (like news sentiment, economic indicators, and longer price histories) and would require continuous retraining and validation.
But for today, we can be proud. We've tamed a messy dataset, built a sophisticated neural network, and taught it to see patterns in the chaos of the market. And hopefully, we've demystified a little bit of the magic behind AI-powered forecasting.
Thanks for coming along for the ride.
"Predicting the future will always be a fool's errand. But understanding the past? That's a data scientist's art. With LSTMs, we've shown that the signal isn't just in the numbers, but in their sequence. The journey isn't about finding the perfect prediction, but about building a better question."



Comments