Using Sentiment Analysis to Predict the Stock Market

Does sentiment analysis of financial news headlines (using Python) have predictive power on the stock market movement?

Philippe Ostiguy, M. Sc.
4 min readMar 21, 2021

In my previous article, I explained how to get historical financial news headlines with Python for free. Now that we have these data, we can verify if news headlines can predict the stock market movement.

The difference between this article and many others on the same topics is that we can do our test for up to 1 year because of how we get our data. As discussed in my previous article, many sources or APIs have few historical news headlines or none. Furthermore, in general, we have more than 30 headlines per day, which many sources or APIs don’t allow.

Photo by Pexels on Pixabay

1- Obtaining historical daily closing stock prices

The first step is to obtain the historical daily close to calculate the stock's daily return for the period on which we want to make the tests.

As the expression says :

Garbage in, garbage out

Photo by Henry & Co. on Unsplash

For this reason, having a good data set is an essential part of data analysis, if not the most critical part.

This is why the daily historical stock prices are extracted from the Alpha Vantage API. It’s not specified in the Finnhub API if the daily closes are adjusted close for stock splits and dividends.

To make calls to the Alpha Vantage’s API, you need your unique key that you can obtain here.

A good practice with an API key is to store it as an environment variable (good article here on how to do it). The key variable name in the code is stored as an environment variable in the .env file, like that :

We can import the required module to get the daily historical stock price from the Alpha Vantage API.

Then we can build a class to initialize the required attributes for the Alpha Vantage API.

Then we can create a class that will get the data from the Alpha Vantage API and calculate the daily return (next two gists)

2- Run sentiment analysis and calculate a score

We run the financial news headlines' sentiment analysis with the VADER sentiment analyzer (nltk.sentiment.vader).

For each headline, the compound score returns a normalized value between -1 (the most extreme negative headline) and 1 (the most extreme positive headline). Then, we create a daily score by averaging all the scores for the individual headline that we obtained that day.

To be as accurate as possible, the script analyses the news headlines from 9:30 (EST) until 9:29 am (EST), just before regular trading begins (North American markets).

First, we import the required modules :

Then, we create a class for the VADER sentiment analyzer. First thing we do is to initialize the attributes and parameters.

We can now perform the sentiment analysis on the news headlines using VADER.

Note that within this method, we have two decorators @init_sql, which open, save and close the Database and @iterate_day, which performs the sentiment analysis on the news headlines for the desired period.

The method check_size() makes sure that we have enough headlines to run sentiment analysis so that it has a statistical significance (default minimum size is 30)

3- Correlate lagged score index against the daily close return

This is the last step. Note that, as mentioned previously, this article is following this article in which we obtained the historical financial news headlines. The current code is written using the directories and database (SQL) from the previous article. If you decide to use a different source to obtain the financial news headlines, make sure to pay attention to that.

First, we need to import the required module.

We create a class to initialize the “global” values required for the project.

Finally, we plot the stock's daily return (Amazon in that case) against the sentiment analysis of financial news headlines on Amazon from the previous day :

init_.pd_data.plot(x =init_.sentiment_name,y=init_.daily_return,style = "o")

We also calculate the correlation between these two :

print(init_.pd_data[init_.daily_return].corr(init_.pd_data[init_.sentiment_name]))

This is the chart we obtain that shows very little to no correlation between the daily return and the news headlines' sentiment score.

The correlation as shown below was close to 0, showing almost no correlation.

I hope you find this helpful. Please leave comments, feedback, and ideas if you have any. I would appreciate it.

The link to the project repo is here.

Liked this article? Show your support!

👏 Clap it up to 50 times

🤝 Send me a LinkedIn connection request to stay in touch

Your support means everything! 🙏

--

--

Philippe Ostiguy, M. Sc.
Philippe Ostiguy, M. Sc.

Written by Philippe Ostiguy, M. Sc.

🤖 Principal Data Scientist | 🎯 Leveraging Data Science for Measurable Outcomes | Connect and discuss opportunities 👉 bit.ly/40NwZxF