Stock Exchange: How to Use Backtests Effectively


Everyone interested in managing their own money ought to keep an old idiom in mind: if it seems too good to be true, it probably is. Unfortunately, greed is a powerful motivator. It’s tempting to see a new model with an incredible backtest and think this could be the answer.

Experienced investors know that there’s often a drastic change between a model’s backtest and its first live run. You can usually find that point by checking for where the 45-degree angle increase in value drops off into sideways movement (and generally underperforms the market).

This week, we’ll take a deeper dive into how you can minimize these problems using professional techniques.


Our last Stock Exchange revisited a common theme: making stock picks according to a set time frame. Our models suggested finance and software stocks in the short-term, and energy for the long-term.

Let’s turn to this week’s ideas.

This Week— *crickets*

We usually arrive to find the gang happily enjoying their weekly poker night. Instead, all we’ve got are Felix and Oscar’s weekly rankings with a “gone fishin’” note on the counter. Strange behavior.

We decided to give Vince Castelli a call to investigate. Vince is our modeling guru, a brilliant scientist who spent the bulk of his career as a civilian employee for the U.S. Navy. During his time there, he’s had hands-on experience with modeling techniques vital to national security – not something you can find in the classroom. He knows these models better than anyone; after all, he designed them.

Jeff: Vince! What is this about giving everyone the week off?

V: I didn’t give them the week off. There were new no new fresh signals.

J: Is there something wrong with the gang? They encompass five different methods. How can there be no fresh ideas?

V: A key feature of all models is recognizing the best times to trade. When volatility increases trades become less predictable.

J: What do you mean? The VIX is lower this week. Volatility is down.

V: That measure is for amateurs. Trading volatility includes both upside and downside. That bogus fear gauge emphasizes only the downside. Predictions are affected by extreme movements in either direction.

J: Since we do not have current picks to feature, maybe we could discuss how you developed these models.  There was an excellent recent article from Ben Carlson about what you cannot learn from a backtest. It reminded me of the TV ads suggesting that anyone can discover a system and trade their way to a fortune.

V:  If only it were that easy.

J:  Ben’s description made me think of some suburbanites wandering into the woods with massive chain saws.  The power of the tools far exceeds their skill in using them.

V: I see it all of the time.

J:  I would like to take up a few of Ben’s points and get your reaction.  How about this:  How many bad backtests came before the good ones?

V:  This is a great question.  Most people do not know the right questions to ask the model developer.  That one is crucial.

J:  How would you answer?

V:  Our method preserves multiple out-of-sample periods.   We develop models on our development data, saving pristine time periods for the test.  We verify that the strength of results continues.  You cannot just look at backtest results; you must know the developer’s method.

J:  Here is another good point — Data availability at the time.   Isn’t it easy to “peek ahead” or to exclude data from failing company?

V:  It certainly is.  You need to have data that includes the failed and merged companies.  The average person at home will not pay up to get this.  It introduces a deceptive, positive bias.

J: Ben also raises an interesting point about friction.  He writes:

It’s almost impossible in a backtest to completely account for costs and frictions such as taxes, commissions, market impact from trading, market liquidity, etc. Sure, you can estimate these frictions, but you never truly understand how these things will affect your bottom line until you actually have to execute buy and sell orders.

V:  This is the first point where I really disagree with him.  Why is it almost impossible?  You should definitely include commissions and a slippage factor.  If your trades are a small percentage of the market volume, the impact from trading is negligible.  Taxes vary by the type of account and the investor.

J:  Interesting point!  “Almost impossible” is strong language.  For someone who knows the ropes, this kind of test might represent real edge.

V:  That is what I do with each of my creations!

J:  Some of Ben’s other points relate to psychological factors.  The trader bailing out of the system in the face of losses.  Or concern about real money.

V:  That is strictly a matter of confidence in the system.  If it has been developed properly, you should not do a lot of fretting.

J: Thanks for joining us, Vince. I’m sure your comments will help readers make more sense of our series.

V: Any time!


One important point was not mentioned in Ben’s article – simplicity.  The temptation for the untrained modeler is to introduce as many variables as possible, hoping to find correlations that others have missed.  What they find is misleading. Computers are powerful enough to discover apparent links between variables when there is actually no relationship. A great model uses as few variables as possible.  The backtest may not seem as good, but the real-time trading will be much better.

Quantitative modeling is an extraordinarily complicated field. In some ways, the way to find success here is similar to finding success in the investment world as a whole. Find the right experts, learn their methods, and try to make sense of the data for yourself. Backtesting can be effective or dangerous – it depends on the skill of the developer.

Background on the Stock Exchange

Each week Felix and Oscar host a poker game for some of their friends. Since they are all traders they love to discuss their best current ideas before the game starts. They like to call this their “Stock Exchange.” (Check it out for more background). Their methods are excellent, as you know if you have been following the series. Since the time frames and risk profiles differ, so do the stock ideas. You get to be a fly on the wall from my report. I am the only human present, and the only one using any fundamental analysis.

The result? Several expert ideas each week from traders, and a brief comment on the fundamentals from the human investor. The models are named to make it easy to remember their trading personalities.


If you want an opinion about a specific stock or sector, even those we did not mention, just ask! Put questions in the comments. Address them to a specific expert if you wish. Each has a specialty. Who is your favorite? (You can choose me, although my feelings will not be hurt very much if you prefer one of the models).

Getting Updates

We have a new (free) service to subscribers to our Felix/Oscar update list. You can suggest three favorite stocks and sectors. Sign up with email to “etf at newarc dot com”. We keep a running list of all securities our readers recommend. The “favorite fifteen” are top ranking positions according to each respective model. Within that list, green is a “buy,” yellow a “hold,” and red a “sell.”  Suggestions and comments are welcome. Please remember that these are responses to reader requests, not necessarily stocks and sectors that we own. Sign up now to vote your favorite stock or sector onto the list!

Why You Never See the Best Employment Data

On the first Friday of each month the Bureau of Labor Statistics releases the Employment Situation Report. The data – especially the payroll employment change – is the subject of much speculation, forecasting, and spinning once it is announced. Most sophisticated analysts (like me) regularly report that the sampling error is +/- 120K jobs or so. And that is after the second revision. Few realize that the revisions mostly “top off” the sample responses. There is also non-sampling error, of course, if the current universe of employers is not representative.

The BLS method involves attempting a “count” of the total number of jobs, via a survey, in one month and subtracting it from the prior month. It is not a direct count of change in the number of jobs. ADP attempts a similar estimate using payroll data from their private clients. Today they reported a gain of 246K private jobs. Both are estimates – and only estimates!

The most accurate employment report comes from a source you never hear about, the quarterly Business Dynamics Report. It is based upon the Quarterly Census of Employment and Wages (QCEW), the authoritative final count of all things labor. The QCEW is the basis for the final benchmarking of all the major BLS reports. Why? The data is drawn from local employment offices, not surveys. Businesses are legally required to report all workers. It is the basis for employment insurance, and there is obviously no incentive to overstate employment.

Why Don’t We Hear About This?

No one reports the results of the Business Dynamics Report or the QCEW because we do not have this great and accurate data until eight months later. From the Wall Street perspective, it is “old news.” Here is an important table from the last report.

For our current purposes, the key number is the net employment change of 307,000. I am going to compare that to the estimates made at the time of the original releases.

We should also observe that overall job creation in the quarter was almost 7.5 million jobs. This is very important, but no one seems to know it. Jobs destroyed were over seven million, leaving the net of 307 thousand. This is around 100K per month, and that is all you will hear about.

Please also note that the new jobs come from both additions at current establishments and opening establishments. New jobs from new businesses were 1.4 million for the quarter. The data from this series proves that those complaining about the BLS birth/death adjustment are wrong now, and always have been.

The Estimates

If we fire up the Wayback machine, we can look at the reported employment data from this period. To understand the data, we must realize that the BLS, ADP, (and others) are all making an estimate of the “true job growth.” Their estimates represent different methods, all with pluses and minuses. Let’s see how the two estimates did against what we now know to be “the truth.”

We do not have monthly data for the BED series, but we can see how the two sources did for the entire three-month period. “Truth” was a gain of 307K. Both estimating sources were a bit too high, with the BLS doing better for this round. I have occasionally done this comparison, concluding that the ADP method should also be considered. It would be useful to do this analysis over a longer period. It takes a lot of careful work. (Perhaps if I get a good summer intern, this will be one of the projects. Applications welcome).

Implications for Investors

I understand that investors generally tune out educational posts, especially when a “deep dive” is involved. This is discouraging, since one of my missions is to help people “navigate the noise.” In the case of employment data, it is nearly all noise!

Here are conclusions I have reached, and which you might consider:

  • BLS and ADP both provide useful estimates of employment change. It is a mistake to regard (as most do) the BLS as the “official” result.
  • We should expect variation in the monthly BLS numbers. The survey has a confidence interval of 120K! If the data are real, then the reports should fluctuate around truth.
  • Traders focus on the BLS. They must, since that will be the trading flow. If you are a trader and want to game that announcement, you are on your own. If you are an investor, you should include both reports in your thinking.
  • Do not be bamboozled by those who claim that seasonal adjustments or estimates of new jobs are misleading. I have studied dozens of these claims. None of the writers show any real expertise in data analysis or a proven track record. They are all men on a mission or women on the warpath.
  • The overall path of employment growth remains solid. That will be true even if we get a “weak” payroll employment number on Friday.

And Finally

This topic is (yet another) example of how difficult it is to find real experts. It takes real skill and knowledge. You cannot just read the newspaper.

Other Reading

Your Employment Report IQ – No one knows even 25% of these answers, despite the importance. My favorite prof and greatest teacher introduced me to labor economics. He “approved this message” and said that everyone should read it. While I appreciate the encouragement from a great mentor, the viewership was about 10% of my WTWA pieces – and far less than other pseudo-experts. Trying to help people is an uphill battle!

My best single piece on the monthly employment report. Guessing beans in a jar?

The Quest for Investing Excellence and the Lesson of Dow 20K

The new movement to passive investments is a sharp break from the historical quest for excellence. Many articles claim that no one can do better than the market average. If that is true, you should just throw out your investment library and skip the popular lists of “best investment books.”

This post will suggest a short list of books that would have needed quite different titles. They also would not have become best-sellers! In the conclusion, I will provide some ideas about why this is important for your investment decisions. Here are the hypothetical titles followed by a cover shot of the real book. Suggestions for more examples are quite welcome!


In Search of Mediocrity

Market Sheep

The Average IQ Investor

The Little Book that Equals the Market

Common Stocks and Average Profits

Buffett: The Making of a Lucky Investor

Stay Even with Wall Street


In this series on investment expertise I have (so far) covered the following:

  • There are indeed experts. Sometimes it is obvious, and sometimes they are difficult to find. Consider the case of Phil Mickelson.
  • Forecasting is not always folly. I provide specific examples of expertise, and a checklist for finding the best modeling experts.
  • Dow 20K. The round-number milestone has finally been achieved – at least for today! There are many who are stepping up to claim some credit for their prediction on this front. Some were way too early, and others made the call as we got much closer. Each prognosticator had a method.

My own Dow 20K forecast came when the Dow was at 10,000 and many prominent pundits were calling for Dow 5000! My opinion was controversial at the time. Check out the history of the forecast to remind yourself of how bad things were (unemployment over 10%, and I was ridiculed for suggesting it might fall to 8%).

While it is nice to get some recognition (like this spot from CNBC when we got close to the milestone last month), I see it more as a validation of my methodology. I seek out the best experts. I am constantly looking for excellence. I know that I do not have all of the answers, but my background taught me how to search and to learn. Following superior methods helped to keep my readers and clients on the right side of the market through a long rally hated by most of the punditry and many traders.

There are many paths to trading and investment success. Mine was not the only way, but it was a good way. Having strong evidence and indicators is crucial for confidence.

What Now?

Most of the key factors I see as important are still in place. I summarize them each week. The list of worries has changed a lot but it is still there. The time will come to pull back – but it is not here yet.