- in General by Cesar Alvarez
Backtesting is Hard
Why don’t I make more frequent posts? The easy answer is backtesting is hard.
A test has three parts to it. First, coming up with the idea. I have more ideas than I can test. I have a notebook full of ideas. The hard part here is picking one. Second, writing the code and running it. This takes me a couple of hours to a couple of days to do. Writing code is the fun and mostly easy part, though sometimes it can be insanely hard. Third, is verifying the result are correct. It is the last step that can takes days to weeks to do. Then writing the post takes a couple of days.
Potential errors in a backtest include: looking into the future, trades entered that should not be entered, trades not entered that should have been entered, bad data, over-fitting the data, picking a date range that makes the test look good, and many more.
Step One: The Idea
With lots of ideas to test, how does one pick one? This is a balance between how long it will take to code and verify vs how likely the results will be interesting. For the blog, I see what others have been writing about or what posts produced interesting comments and ideas. For my personal work, I am looking for new and different ideas.
Step Two: The Code
Having spent the last 15 years writing code in AmiBroker, with nine of those years it being my job to write code, this step is the fun part. With all this experience, I have developed a large bag of tricks to perform most tests. This allows me to quickly write the tests and quickly have confidence in the results. But even with all my experience sometimes I get a serious challenge like happened a few weeks ago when a consulting client came with an idea that was unique trading idea.
Step Three: The Verification
Verification is the hard and time consuming part. I often comment to my old programming friends from Microsoft that writing code for backtesting is harder than it was for Excel. When I wrote code for Excel charting, if the code did what I wanted to do, I knew I could have some confidence in the code. The problem with backtesting results is they can look right, feel right but still be horribly wrong. There are so many errors that one must be on the look at for.
1- Data errors – One can pay for a good reliable data provider but they can still have errors. I am always checking my data for large price moves. As an example my data provider messed up the split for GOOG when it became GOOGL. On the day it happened, the price goes down 50% and then the next day up 100%. Oops. Fortunately they quickly corrected. Imagine what that could have done to test results.
2- Future looking errors – This is what I call the Holy Grail bug. Whenever I get results that are too good to be true, they fall is this category. For example, one of your rules is that the set up day is an up day. But instead you look one day into the future. Now the day you enter is an up day and all your trades automatically start off as winners. In general these tend to be easy find.
3- Trades entered that should not have entered
4- Trades not entered that should have entered
5- Position Sizing Errors – For portfolio tests, is my position size of entered trades correct
How I Try To Avoid Errors
When I worked for Connors Research, the way we verified results was by giving another researcher the English version of the test rules. He would then code his own version using a duplicate database. We would then compare results. It would often take us several iterations. Most errors were in the English rules not being clear but we had errors in our code. In the end our results would match closely and we publish them. Did this guarantee our results were error free? No. Only once did both researchers make the same error and we publish ‘wrong’ results. Fortunately the error was minor and barely changed the final results.
Without a second researcher, how to do I verify results? If I get results that tell me it is time to retire, then something is wrong. But what about results that look good. First is the gut check. Do they feel right? I am surprised how often my gut is right about results being wrong. I have a three step process as follows.
Trade Review
I will randomly look at 10-20 trades. Should these trades have been entered? Did they enter when and at the price expected? Did they exit at the time and price as expected? Is the position size as expected? Look at the biggest winner and losers. Here I find errors 1, 2, 3 & 5.
Signal Review
If I am running a portfolio backtest, I then look at all the signals for a given day. Am I entering the stocks I expect based on my ranking? Is the position size correct? Here I find errors 3 & 4.
Code Review
I walk through my code several times looking for errors in logic or typos. Here I find errors 1, 2, 3, 4 & 5.
If I find any errors during above process, I need to start all over. I give myself day or two break before looking again for errors, so I have a pair of fresh eyes looking for mistakes. This can sometimes take several iterations before I feel comfortable with the results.
Closing Comments
I fear the day when I make a post with an error in it. It cannot be avoided. It almost happened with the “DTAYS Post.” It took me several more iterations than normal to get it right. I was getting results that looked good but my gut said no. I came within a day of publishing the results before finding my error.
Whether you do our own backtesting or have someone else do them for you or read a blog about someone else’s tests, you should always have in the back of your mind what have they done to root out errors.
Back to Research
Now that I have settled back in from my vacation to Madrid, it is back to doing posts on stock research.