April 21, 2014

Backtesting is Hard

Why don’t I make more frequent posts? The easy answer is backtesting is hard.

A test has three parts to it. First, coming up with the idea. I have more ideas than I can test. I have a notebook full of ideas. The hard part here is picking one. Second, writing the code and running it. This takes me a couple of hours to a couple of days to do. Writing code is the fun and mostly easy part, though sometimes it can be insanely hard. Third, is verifying the result are correct. It is the last step that can takes days to weeks to do. Then writing the post takes a couple of days.

Potential errors in a backtest include: looking into the future, trades entered that should not be entered, trades not entered that should have been entered, bad data, over-fitting the data, picking a date range that makes the test look good, and many more.

Step One: The Idea

With lots of ideas to test, how does one pick one? This is a balance between how long it will take to code and verify vs how likely the results will be interesting. For the blog, I see what others have been writing about or what posts produced interesting comments and ideas. For my personal work, I am looking for new and different ideas.

Step Two: The Code

Having spent the last 15 years writing code in AmiBroker, with nine of those years it being my job to write code, this step is the fun part. With all this experience, I have developed a large bag of tricks to perform most tests. This allows me to quickly write the tests and quickly have confidence in the results. But even with all my experience sometimes I get a serious challenge like happened a few weeks ago when a consulting client came with an idea that was unique trading idea.

Step Three: The Verification

Verification is the hard and time consuming part. I often comment to my old programming friends from Microsoft that writing code for backtesting is harder than it was for Excel. When I wrote code for Excel charting, if the code did what I wanted to do, I knew I could have some confidence in the code. The problem with backtesting results is they can look right, feel right but still be horribly wrong. There are so many errors that one must be on the look at for.

1-      Data errors – One can pay for a good reliable data provider but they can still have errors. I am always checking my data for large price moves. As an example my data provider messed up the split for GOOG when it became GOOGL. On the day it happened, the price goes down 50% and then the next day up 100%. Oops. Fortunately they quickly corrected. Imagine what that could have done to test results.

2-      Future looking errors – This is what I call the Holy Grail bug. Whenever I get results that are too good to be true, they fall is this category. For example, one of your rules is that the set up day is an up day. But instead you look one day into the future. Now the day you enter is an up day and all your trades automatically start off as winners. In general these tend to be easy find.

3-      Trades entered that should not have entered

4-      Trades not entered that should have entered

5-      Position Sizing Errors – For portfolio tests, is my position size of entered trades correct

How I Try To Avoid Errors

When I worked for Connors Research, the way we verified results was by giving another researcher the English version of the test rules. He would then code his own version using a duplicate database. We would then compare results. It would often take us several iterations. Most errors were in the English rules not being clear but we had errors in our code. In the end our results would match closely and we publish them. Did this guarantee our results were error free? No. Only once did both researchers make the same error and we publish ‘wrong’ results. Fortunately the error was minor and barely changed the final results.

Without a second researcher, how to do I verify results? If I get results that tell me it is time to retire, then something is wrong. But what about results that look good. First is the gut check. Do they feel right? I am surprised how often my gut is right about results being wrong. I have a three step process as follows.

Trade Review

I will randomly look at 10-20 trades. Should these trades have been entered? Did they enter when and at the price expected? Did they exit at the time and price as expected? Is the position size as expected? Look at the biggest winner and losers. Here I find errors 1, 2, 3 & 5.

Signal Review

If I am running a portfolio backtest, I then look at all the signals for a given day. Am I entering the stocks I expect based on my ranking? Is the position size correct? Here I find errors 3 & 4.

Code Review

I walk through my code several times looking for errors in logic or typos. Here I find errors 1, 2, 3, 4 & 5.

If I find any errors during above process, I need to start all over. I give myself day or two break before looking again for errors, so I have a pair of fresh eyes looking for mistakes. This can sometimes take several iterations before I feel comfortable with the results.

 

Closing Comments

I fear the day when I make a post with an error in it. It cannot be avoided. It almost happened with the “DTAYS Post.” It took me several more iterations than normal to get it right. I was getting results that looked good but my gut said no. I came within a day of publishing the results before finding my error.

Whether you do our own backtesting or have someone else do them for you or read a blog about someone else’s tests, you should always have in the back of your mind what have they done to root out errors.

Back to Research

Now that I have settled back in from my vacation to Madrid, it is back to doing posts on stock research.

Click Here to Leave a Comment Below

mhp - April 21, 2014 Reply

Excellent and informative! I’d be glad to see more posts on your work process, especially about how you look for “new and different ideas”.

A subtle error I’ve made that set off a “holy grail” alarm: entering positions using stop orders, and including today’s volume in the position score.

For example: Stop = Ref(HHV(H,20),-1); Buy = Cross(H,Stop); BuyPrice = Stop; PositionScore = V / MA(V,20);

In fact, I’ve never found a way to correctly model this type of strategy using daily bars.

    Cesar Alvarez - April 21, 2014 Reply

    I am glad you liked it. I was not sure if people would like this or not.

    Yes, I have made the same error too using positionscore and volume. Are you trying to model that breakout day has large volume? If so, not possible with daily data.

      mhp - April 22, 2014 Reply

      I solved the problem by writing software that integrates daily and 1-min bars, and compares volume as of a given minute to average volume as of that same minute.

Mike - May 17, 2014 Reply

Hi Cesar,

I really enjoyed reading this article and also found it very useful. Thanks for sharing such great advice.

Cheers

Mike

Marco - September 29, 2014 Reply

This is completely true. Nearly everything is based on backtesting. And it takes a huge amount of time and effort…
http://nightlypatterns.wordpress.com

Five ETF Monthly Rotation Strategy | Alvarez Quant Trading - November 23, 2015 Reply

[…] I was working on another blog post when I saw this post Inferences From Backtest Results Are False Before Proven True on Price Action Lab. Mike has a challenge to replicate a very simple test. I often get email from people trying to replicate results from one of my blog posts and thought this would be fun to do. I cover some of this topic on my post Backtesting is Hard. […]

Backtesting is Hard | SAMUELSSONS RAPPORT - December 10, 2015 Reply

[…] Källa: Backtesting is Hard | Alvarez Quant Trading […]

Helena - June 14, 2017 Reply

You are soooo right about this. Been working for 6 weeks on a new system (with plenty of walk-aways). Trigger fantastic, exit great but adding code to stop getting into lousy trades is a Herculean undertaking. I’m end of day so looking forward is not an issue as I get in next day – just as well. But the checking and rechecking is heart-breaking. Makes discretionary trading so appealing! Glad you posted this – I don’t feel totally inadequate now.

    Cesar Alvarez - June 15, 2017 Reply

    Glad it helped. At times if feels like 99% of the ideas I test are complete garbage.

Leave a Reply: