November 16, 2016

Is synthetic XIV/VXX data safe to use?

I have done several posts about trading XIV & VXX. In these posts (here, here and here) I refer to using synthetic data before these ETFs started trading. I supported the use of the data due to the very high correlation of daily returns during the overlap period. With a correlation of .97, I thought great the data should be good to use for backtesting.

Then the head slapping moment. Run the strategy during overlap period using both the real data and the synthetic data. If the synthetic data was good substitute, I would expect only a small change in the results.

Overlap Period

The overlap period where I have real VXX & XIV data and synthetic data is from 12/1/2010 to 12/12/2014. Four years should generate enough trades to see how running a strategy on each compares. A downside of synthetic data is that we only have closing prices. Because of this, the strategies will be tested entering/exiting on the close of the signal day with the rules only using the closing price.

First Strategy

The first strategy tested is the one from VXX & XIV Strategies. See that post for the rules. An interesting point is that the strategy uses the SPY and VIX to trigger signals. Using the values that I focused on the earlier post, we get the following.

161116a

This is a good sign. The CAR and MDD are very close to each other. But there are not too many trades. Using the optimization results from the earlier post, I filtered the spreadsheet to only focus on variations with more than 50 trades and have a CAR greater than 20. This gives 155 variations which to compare.

161116b

Comparing the CAR and MDD we can see that on average difference is very small. This is great to see and gives me confidence that using the synthetic data in this case is OK. But remember we are using the SPY & VIX to trigger trades. What if we are using the VXX & XIV to trigger trades?

Just another Volatility SPY strategy

Someone sent me a link to this strategy. I am always looking for something new test and figured this would also be good to use. I was able to closely replicate the results. Next, I did an optimization around the parameters that they have to see if they were stable and if there are better ones to pick. What I discovered is that the parameters used were the best ones from my optimization. I never like trading the best ones.

Original Rules

  • WVF = Using VXX, 100 * (the highest Close in the last 28 days minus today’s Low) divided by the Highest Close in the last 28 Days
  • Buy XIV when WVF crosses above 14. Exit VXX.
  • Buy VXX when WVF crosses below 14. Exit XIV.

The original method invests different amounts in XIV (30%) and VXX (10%) when they signal. To make the strategy work for this test, I changed the rules slightly. In the calculation of WVF they use the low of VXX. Instead I used the close. Invest 100% in either ETF.

These are the results of the strategy with my changes

161116c

Now if that does not make you take a double take, nothing will. These results are not even close to each other. The CAR drops 40 points and the drawdown gets 36 points worse. Even the number of trades changes dramatically.  Using the synthetic data to signal is not something I would use. Maybe it is just this variation that did poorly. These are the optimization results with variations with more than 50 trades and CAR greater than 20. This gave 117 variations.

161116d

The difference in the results is still huge. In this case using synthetic data is clearly wrong.

Spreadsheet

Fill the form below to get the spreadsheet which contains all the variations tested and additional statistics for both methods. You can compare and see how each of the yearly results differ.

Final Thoughts

I am developing another VXX/XIV strategy to trade which uses VXX/XIV to signal. When I compared the real vs synthetic data, the results are dramatically different. Be careful when using synthetic data, you may throw out a good strat3egy or worse decide to trade a bad strategy.

Backtesting platform used: AmiBroker. Data provider:Norgate Data (referral link)

Good Quant Trading,

Fill in for free spreadsheet:

spreadsheeticon

 

Click Here to Leave a Comment Below

Sergey V. - November 17, 2016 Reply

Hi Caesar,

Thanks for interesting post. Sure credibility of backtest on simulated data depends on how precise your synthetic data is and how quickly your signal changes.

For 1-yr momentum there is one story, and you may use less precise data, and for 5-days reversion – completely different story, and you need much better data to test this.

BTW, six figs. investment have OHLC data on volatility ETPs: https://sixfigureinvesting.com/2014/09/simulating-open-high-low-vxx-vixy-tvix-uvxy-xiv-svxy/, maybe you could use this to trade not on closes of the same day (which may be not that realistic, given wild nature of the instruments involved)

    Cesar Alvarez - November 17, 2016 Reply

    I am aware of the OHL simulated data but the amount of error he decribes is too much for me. The main thing I want to make sure people are clear is that the data may or may not work for you depending on the strategy. Just be careful using this data.

Michael - November 18, 2016 Reply

hi cesar, would you consider adding a search functionality to your blog so we can easily look up past blogs or topics?

    Cesar Alvarez - November 18, 2016 Reply

    I can see when I am logged in as my WordPress admin but when I look at the site logged out I can’t see the search feature. I will have to look around and figure out how to get it back. Thanks for pointing this out.

michael - May 24, 2017 Reply

hi cesar, did you build your own synthetic data to run your tests? i recently ran some tests using the data from six figures investing. although the results over the overlap period were qualitatively similar, good years were good and worse years were worse etc, quantitatively they were very different with variations of 40% or more at times. what do you think?

    Cesar Alvarez - May 24, 2017 Reply

    No, I used the data from Six Figure Investing. I found that it really depends on the strategy whether one can use this data or not.

Leave a Reply: