Start Dates, Correlation and Random Strategy

In my last post I showed research on how optimization results can be mean reverting. Sometimes, my research keeps getting side tracked as I think of random ideas to look at. In this post, we look at the random walk my research took starting from my mean reverting optimization research. I will show how changing the start date can have a big change in the results, correlation of 1990’s to now, and random data and how it correlates.

Start Date

One thing as researchers we sometimes forget is how the date range of our test can change our interpretation. If we confirm what we are expecting, we rarely think about other dates. In the previous post, I picked starting in 2000 randomly.  My first random thought was what happens with different start dates. Meaning I had no good reason for picking that date. I could have just as easily picked some other date. Let us see what kind of results we get with two other dates.

Correlation

Here are the original results from 2000. I have also added the correlation of 2000-2012 vs 2013-2015.

160803a

We can see the correlation of .74. So it appears that decile ranking in 2000-2012 is doing a good job for predicting the results in 2013-2015.

What happens if we start in 2003? That would give us 10 years of data. A very reasonable start date and one I am not surprised that I did not pick first.

160803b

Oh, oh. We get a negative correlation. If I had started with this, my conclusion would have been much different. Or depending on the conclusion I wanted to reach, I could have showed one and not the other. Keep this in mind when reading any research. A simple change in starting date could reverse the conclusion.

My next random thought was what about if I start in 1995? More data is better. Right? I have always argued against using the late 90’s data because those markets do not exist anymore: a wild bull market, before decimalization and before high frequency trading.

160803c

My jaw dropped when I saw the correlation number. After verifying the results several times, I had to conclude that the number is right. One rarely sees a number that high. Imagine my original post if I had used this as my date range.

But now came my next random thought. Wait, why is the data getting better as we go farther back? I decided to look at three year windows and see what the correlation was to the 2013 to 2015 timeframe. Below are a couple of the results. To see all the timeframe, get the spreadsheet.

160803d

The timeframe with the highest correlation was from 1996 to 1998 with insane .99 correlation. Not sure what the story is here. But I know I could not use the information from 20 years ago to trade now. What about you?

The 2003 to 2005 has a high negative correlation which I found interesting because that was another bull market timeframe. Look at the major change in that bottom decile.

Random Strategy Results

My next random thought was what if I created a random strategy. What kind of correlation numbers would I see? Using the original strategy, I created a random strategy such that it had similar number of trades, average hold and exposure. This may come as surprise but creating a random strategy to use as a comparison is not a simple task. My expectation for the results was low correlation leaning towards the negative side. Here is what I got for the same ranges that I tested above. Click on the images to see larger versions.

160803f

Well that at least met my expectations. Mostly low to negative correlations.

Spreadsheet

Fill the form below to get the spreadsheet with lots of more information.

Final Thoughts

You can see how my mind likes to wander from one question to the next with no real goal in mind other than finding out interesting information and trying to understand it. The thing that I am still puzzling over is the high correlation to the late nineties to the recent three years. I have no good story for this. Do you? Should I or you be using data from the late nineties? I am not sure at this point.

Backtesting platform used: AmiBroker. Data provider:Norgate Data (referral link)

Good Quant Trading,

Fill in for free spreadsheet:

spreadsheeticon

 
Visited 33 times, 1 visit(s) today

Click Here to Leave a Comment Below

Pablo - August 3, 2016 Reply

Hello Alvarez, great post. When you say “But I know I could not use the information from 20 years ago to trade now. What about you?”, just a thought, maybe couldn’t it be used for an average prediction on the current market next move? I mean, if both markets intervals are so correlated, maybe are so the next months moves, and maybe the could repeat an overall cyclical move. Does it make sense?

    Cesar Alvarez - August 3, 2016 Reply

    That is an interesting thought. If that is right, that is not a good sign because after are large run up was a big bear.

Neha - August 3, 2016 Reply

Greetings of the day Sir,

I have always enjoyed reading your posts. This one is great too. Shows how statistics can easily take you for a ride. Gotta be more aware, as it is human mind likes to see what it wants to and here is a scientific way of fooling your self. I also like the cyclical comment Mr. Pablo made, that actually might be one of the ways of looking at it; however given the number of changes that the markets have undergone since then, what ever the results are going to be, they are going to be interesting to note.

Thanks.
Neha

    Cesar Alvarez - August 4, 2016 Reply

    I agree it is really easy to fool ourselves with statistics.

ted thedog - August 4, 2016 Reply

When train set results correlates well with test set results then this means that the market conditions in train set and in test set were similar *as seen by the algorithm*. That statement is a tautology, but I bet ‘similar as seen by algorithm’ probably also corresponds with some human intuition of similarity. Perhaps the high correlation occurs mainly for ‘bull periods’, or mainly for ‘bull periods with low vol’, or something like that. But this is hard to see because train set and test set are of different lengths and the train set covers many market regimes which are getting averaged, while the test set mainly covers one market regime. It might be easier to find correspondence if train set and test sets were matched in length. So perhaps do all three year train sets matched against three year test sets? The bet is that the subset of three year periods where the algorithm matched well on train and test also have a simple human-obvious similarity (although this isn’t guaranteed).
BTW, how did you arrive at 2520 variations?

    Cesar Alvarez - August 4, 2016 Reply

    I did also test using equal 3 period lengths. I guess I did not make that clear in the post. Yeah, I was thinking about quite bull markets but then the correlation was not there in the 2003-2005 timeframe. Interesting either way. I should have linked to the original post because there I cover a little bit on how I got to 2520 variations. I wanted each decile to have at least 100 variations to give me a big enough sample size to compare with. So I expanded my parameters to get above this. After my first stab, I got to 2520. I figure I could try and drop it back down to around 1000, but decided it should be fine as it.

Chet - August 4, 2016 Reply

Cesar, its hard to say without examining the systems so I haven’t looked into it closely. My gut feeling is that the market in both time frames highlighted were similar in one aspect. The Alan Greenspan speech in 1996 known as “the irrational exuberance speech” was pointing to concerns that the markets were getting overstretched. We all know what happened three years later. 2013-2015 also is a period where markets behaved in ways that fundamentalists scratch their heads over. Both periods had the feeling of ‘manufactured’ market elevation. As someone who previous commented mentioned, that correlation could be an ominous coincidence. Roughly three years after Greenspan’s speech the dot-com bubble was no more. That puts us to what, 2016? 🙂

    Cesar Alvarez - August 5, 2016 Reply

    I like your idea. It should interesting revisiting this post in a year or so.

Ola - August 9, 2016 Reply

Hi Cesar,
thanks for an interesting post.
I have found that a good way to verify back test results is to move your starting date to a different day. Of course, it depends on the type of strategy you are testing, but sometimes a small change can have surprising results e.g. you get on a quite different set of trades.
Cheers,
Ola

    Cesar Alvarez - August 9, 2016 Reply

    I find this method of changing the starting date by a month or two can have huge impact with strategies that have long hold periods. More than a couple of months.

Enrique Romualdez - March 18, 2019 Reply

Hi Cesar,

Fantastic research. I’m a huge fan of these kinds of insights.

I know another reader mentioned something similar to this in a previous comment above, but what are your thoughts on the ability of previous market data (regardless of the period, seeing as markets are Fractal as per Benoit Mandelbrot) that’s highly correlated to the current market to predict future market returns?

The reason why I ask is because this really reminds me of the premise in how Paul Tudor Jones and Peter Borish anticipated the 87′ crash via a method similar to this. One of the scenes in the 86′ PBS documentary “Trader” shows Peter Borish talking about how the years leading up to the Great Depression (1929+) was something along the lines of 93-94% correlated to the market of 1986. They then used this premise to justify their views for the timing of the 87′ crash.

Thoughts?

    Cesar Alvarez - March 18, 2019 Reply

    I have never been a fan of this particular method. The problem is that markets often rhyme but do not necessarily work out as in the past. I often see articles about how the market is like some crash in the past but then it never happens. We don’t see all the wrong predictions. Just the right one.

Enrique Romualdez - March 29, 2019 Reply

Hi Cesar,

I still think about this post, even after a couple of weeks.

I’m really concerned about the varying results when one changes the start date for backtests. What are your thoughts on using a fixed backtest window (say 10 years) and simply rolling that window forward with each new backtest as one updates a system?

As of right now, I’ve been backtesting with a fixed start date and update the system with each day’s new data. I’ve set my fixed start date based on my desire to retain a certain investment universe (I needed enough historical data to include them).

Thoughts?

    Cesar Alvarez - March 30, 2019 Reply

    I like your rolling 10 year idea better than the fixed start date. But a lot depends on what type of strategy you have. The longer the hold period, the more important the start date is.

Rob - April 15, 2020 Reply

Cesar,

Can you re-do This research with global warming data?
And publish your results?

Rob

Leave a Reply: