Blog

FT Alphaville Disproportionally Interesting Compared to General News in Predicting Stock Returns

Posted: 21st July 2010
By: CHRIS

At Recorded Future, we’re using the Web to find new and interesting predictive signals. Central to our research is the concept of a Temporal Analytics Engine – effectively a massive database of structured news data. We’ve harvested articles from the thousands of blogs and news sources for the last several years, done some detailed processing of this data, and put it into our database, ready for further analysis.

Recently, we’ve made this data available to our customers via an advanced news analytics API. With this web-based interface, analysts can query our index and make use of our hard work to find their own signals embedded in the web data – from traditional sources, mainstream media, blogs, SEC filings, niche publications and other data sources.

Let’s look at an example: sparing some of the technical details, it’s pretty easy to ask our API to “Give me all the news, along with sentiment information, related to Goldman Sachs, Amazon.com, Exxon, or Microsoft, published on FT Alphaville, for the month of May 2010.” What you’ll get back is a highly structured data set containing fragments of article text, publish dates, sentiment data (good news or bad news?), a measure of news importance, and lists of other entities involved in each particular document. This data set is machine-readable, and without too much work should be ready to load into your favorite analytics software.

For the purpose of this article, I’ve taken a look at something similar to the above query, but for the entire S&P 500, for the period from January 2009-May 2010. I’ve also asked our API to consolidate the data to make it a bit more manageable.

I loaded this into the popular R statistics software, and merged the dataset with some stock price data, to investigate the relationship between news sentiment in FT Alphaville and future stock returns. For the sake of comparison, I’ve also run the same query against ALL news sources in our database of thousands of internet sources such as government filings, blogs, news, niche media, etc.

For my analysis, I set up a model regressing a “big news” signal for a given day on 1-week ahead market-relative total returns, in an effort to find out whether “big news” is a predictor of excess performance. If a company has highly important news associated with it, and that news is backed by good sentiment, it is scored strongly positive. If the news is bad and significant, it is scored negatively for that day.

The results of two related regressions are presented below. The first reflects data on S&P 500 companies covered by FT Alphaville over the period mentioned above. The second reflects data on all S&P 500 companies appearing anywhere in our database over the same period. One-week forward market relative returns are calculated as the arithmetic difference between the stock in question’s next week returns and returns of the S&P 500.

Against FT Alphaville news Only:

Call:
lm(formula = relret5d ~ Bignews, data = seriesdf_alphaville)
Residuals:
Min        1Q    Median        3Q       Max
-0.489444 -0.027078 -0.004297  0.017999  0.924871
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0001799  0.0026073   0.069 0.944997
Bignews     0.4461833  0.1229725   3.628 0.000298

Signif. codes:  0 ‘
’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.08602 on 1124 degrees of freedom
Multiple R-squared: 0.01158,  Adjusted R-squared: 0.0107
F-statistic: 13.16 on 1 and 1124 DF,  p-value: 0.0002981
Against All News: Call:
lm(formula = relret5d ~ Bignews, data = seriesdf_full)
Residuals:
Min        1Q    Median        3Q       Max
-0.543658 -0.021138 -0.002031  0.018349  2.305285
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.002077   0.000184  11.285   <2e-16
Bignews     -0.035947   0.030810  -1.167    0.243

Signif. codes:  0 ‘
’ 0.001 ‘
’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.05083 on 81148 degrees of freedom
(70 observations deleted due to missingness)
Multiple R-squared: 1.678e-05,      Adjusted R-squared: 4.453e-06
F-statistic: 1.361 on 1 and 81148 DF,  p-value: 0.2433

The results of my experiments prove fairly interesting. In particular, our “big news” signal is statistically significant at the 0.001 level. This indicates that on average, good news on FT Alphaville is associated with 1-week ahead outperformance. Conversely, on average, bad news for a company on FT Alphaville is associated with  1-week ahead under-performance. The same effect is not evident in our database as a whole, indicating that there is something “special” about the news that comes from FT Alphaville.

Of course, we need to consider potential bias in the results. Let’s look at a breakdown of content from FT Alphaville’s articles by S&P 500 stock. This data is also pretty easy to get via our API.

Ticker item count
GS 1395
C 277
MCO 244
MS 239
BAC 238
JPM 237
KFT 195
AIG 151
GOOG 116
PRU 89
NYT 67
WFC 52
SLM 48

You can see that FT Alphaville has had a definite slant toward covering financial companies over the last 18 months. This shouldn’t be a surprise, given the aftermath of the financial crisis and parties involved in it. Further research into this effect should account for FT Alphaville’s bias toward covering financial institutions.

The timing of the coverage is fairly spread out, but a great deal of these mentions happened at the bottom of the market, and these types of companies rallied off the bottom harder than most, which could be significantly skewing the model’s estimates.

Of course, one thing that jumps out is the very low R-squared of the model. Finding factors that predict excess returns is tough, and this model is a first pass at that effort. The purpose here is to show that our API enables this sort of exploration, and that the ability to quantify the impact of news on stock returns is potentially quite powerful for practitioners.

This research, however preliminary, indicates that there may be a relationship between occurrence of significant news in FT Alphaville and excess stock returns. While this may not be surprising to readers, the real value added here is the ability to quantify the news and analyze it in a statistical fashion.

Further research could include refinement of this signal, further analysis into the data driving its predictive power (are there other sources that are more reliable or timely, or are there market cap/style/segment effects?) Once satisfied with the construction of the signal, one could incorporate it into a backtest and get an estimate at its effectiveness in a trading strategy.

If you’re interested in Recorded Future’s statistical applications, predictive analysis and trading signals, visit our Predictive Signals quant blog. Or visit our site to learn more about Recorded Future’s news and media analytics capabilities.

Related