In my last post I showed research on how optimization results can be mean reverting. Sometimes, my research keeps getting side tracked as I think of random ideas to look at. In this post, we look at the random walk my research took starting from my mean reverting optimization research. I will show how changing the start date can have a big change in the results, correlation of 1990’s to now, and random data and how it correlates.
One thing as researchers we sometimes forget is how the date range of our test can change our interpretation. If we confirm what we are expecting, we rarely think about other dates. In the previous post, I picked starting in 2000 randomly. My first random thought was what happens with different start dates. Meaning I had no good reason for picking that date. I could have just as easily picked some other date. Let us see what kind of results we get with two other dates.
Here are the original results from 2000. I have also added the correlation of 2000-2012 vs 2013-2015.
We can see the correlation of .74. So it appears that decile ranking in 2000-2012 is doing a good job for predicting the results in 2013-2015.
What happens if we start in 2003? That would give us 10 years of data. A very reasonable start date and one I am not surprised that I did not pick first.
Oh, oh. We get a negative correlation. If I had started with this, my conclusion would have been much different. Or depending on the conclusion I wanted to reach, I could have showed one and not the other. Keep this in mind when reading any research. A simple change in starting date could reverse the conclusion.
My next random thought was what about if I start in 1995? More data is better. Right? I have always argued against using the late 90’s data because those markets do not exist anymore: a wild bull market, before decimalization and before high frequency trading.
My jaw dropped when I saw the correlation number. After verifying the results several times, I had to conclude that the number is right. One rarely sees a number that high. Imagine my original post if I had used this as my date range.
But now came my next random thought. Wait, why is the data getting better as we go farther back? I decided to look at three year windows and see what the correlation was to the 2013 to 2015 timeframe. Below are a couple of the results. To see all the timeframe, get the spreadsheet.
The timeframe with the highest correlation was from 1996 to 1998 with insane .99 correlation. Not sure what the story is here. But I know I could not use the information from 20 years ago to trade now. What about you?
The 2003 to 2005 has a high negative correlation which I found interesting because that was another bull market timeframe. Look at the major change in that bottom decile.
Random Strategy Results
My next random thought was what if I created a random strategy. What kind of correlation numbers would I see? Using the original strategy, I created a random strategy such that it had similar number of trades, average hold and exposure. This may come as surprise but creating a random strategy to use as a comparison is not a simple task. My expectation for the results was low correlation leaning towards the negative side. Here is what I got for the same ranges that I tested above. Click on the images to see larger versions.
Well that at least met my expectations. Mostly low to negative correlations.
Fill the form below to get the spreadsheet with lots of more information.
You can see how my mind likes to wander from one question to the next with no real goal in mind other than finding out interesting information and trying to understand it. The thing that I am still puzzling over is the high correlation to the late nineties to the recent three years. I have no good story for this. Do you? Should I or you be using data from the late nineties? I am not sure at this point.
Good Quant Trading,
Fill in for free spreadsheet: