ai
STANDARDS BOARD FOR ALTERNATIVE INVESTMENTS
Backtesting
Toolbox
Key Questions for Investors to Ask
Introduction
Some investment strategies follow a mechanistic rule-based implementation which is fixed in time, for
example in the area of alternative risk premia strategies (aka dynamic beta). These strategies are usually
developed using backtesting (historical simulations), which applies the strategy to historic data to assess
how the strategy would have performed in the past. The output of such a backtest is a time series of
profits and losses for the strategy, which can then be summarised by risk adjusted return metrics (such
as Sharpe Ratio). Also, the correlation with the return streams of other asset classes can be calculated.
A key challenge with using backtesting in strategy development is “statistical overfitting bias”: This arises
when a strategy is fitted too closely to the underlying historic dataset but might not work well in the future.
Overfitting vs. underfitting bias
In particular in situations where computers can test thousands or millions of different strategy
configurations on a given sample dataset to find the optimum approach (e.g. highest risk adjusted return),
it is very likely that the chosen strategy configuration will be overfitted, but with no superior predictive
power in the future. Similar issues can arise where deep learning techniques/neural networks are
employed (in conjunction with classic Fama French type factor models) to extract “deep factors” hidden
by unexplained alphas of the benchmark model.
Therefore, managers and broker dealers using backtesting in strategy development need to develop
frameworks and methods to vet strategies to prevent backtest overfit, to ensure that only those strategies
that are deemed to have significant predictive power going forward are being implemented. From an
investor’s perspective, in situations where backtests are presented for a given investment strategy, it is
important to assess the validity of the backtesting results, including the underlying assumptions and
approach to identify potential overfitting bias.
_______________________________
The SBAI Toolbox is an additional aid to complement the SBAI’s standard-setting activities. While alternative investment fund
managers sign up to the Alternative Investment Standards on a comply-or-explain basis, the SBAI Toolbox materials serve as a
guide only and are not formally part of the Standards or a prescriptive template.
SBAI Toolbox – Backtesting: Key Questions for Investors to Ask – [15 July 2020] 1
Framework to assess backtesting results (including key questions for investors to ask)
1: Backtesting results ≠ past performance
Backtesting results constitute hypothetical performance information, not actual past performance
• Managers/broker dealers have to provide clear disclosure explaining how the backtesting results
were derived, that the result is not the performance of any actual account and that it is not a
guarantee of future results1
• Investors need to distinguish between historical information and backtesting results – they cannot
be compared
• For funds with an actual performance history, managers should be careful to clearly delineate
backtest results from actual performance in both graphical and textual presentations. After a fund
has a live performance history of a sufficiently long duration, managers should consider whether
backtest results should be presented in marketing materials at all.
Observation: “When evaluating a trading strategy, it is routine to discount the Sharpe ratio from a
historical backtest. The reason is simple: there is inevitable data mining by both the researcher
and by other researchers in the past.” 2
2: Detecting statistical overfitting bias
Assessing whether the strategy configuration has been fitted too closely to the sample data
• Has the strategy been tested on out-of-sample data? (applying the strategy to data that has not
been used in the initial backtesting phase)
• What backtesting techniques have been employed to avoid overfitting (train test split, multiple
train text split, rolling window approach…)?
• How many trials have been undertaken to come up with the strategy?3 (incl. looking at minimum
backtest length for a given number of trials)
• Calculate metrics such as probability of backtest overfitting, performance degradation and
probability of loss, stochastic dominance, etc 4
• Assess performance (and risk) impact of strategy enhancements (“naked” versus enhanced
strategy back-testing performance) to assess the risk of overfitting due to excessive complexity or
parameters too closely fitted to the specific sample set
• What adjustments to the backtest are undertaken (e.g. applying a haircut to Sharpe Ratios or
introducing a / “profit hurdle” for strategies (to be deemed “significant”) 5
• What governance arrangements are in place to prevent statistical overfitting (e.g. “Index
Validation Committee”)?
3: Assessing underlying assumptions
It is important that the backtest be run using realistic “real life” trading assumptions
1
See [Link] on advertising actual vs.
model vs. backtested performance; specific SEC advertisement prohibitions regarding modelled and actual results: No-Action
Letter, Clover Capital Management, Inc. October 28, 1986
2
See Backtesting, Harvey, Liu, 2015
3
How to spot backtest overfitting? [Link] , The Probability of Backtest
Overfitting: [Link] and Pseudo-mathematics and financial charlatanism:
The effects of backtest overfitting on out-of-sample performance, Baley, Borwein, Prado, Zhu, 2013
4
Predicting and preventing overfitting of financial models, Chalana, 05/2017
[Link]
5
See Backtesting, Harvey, Liu (07/2015), also see Appendix A
SBAI Toolbox – Backtesting: Key Questions for Investors to Ask – [15 July 2020] 2
• Does the strategy assume that trades can be implemented at the same closing process as the
one generating the trading signal, or is some delay/“slippage” accounted for?6
• What other assumptions are being used? (i.e. inclusion of transaction cost, fees, financing cost,
stock lending fees, etc.)
• What transaction fees have been used in the backtest?
4: Backtesting time series
Assessing length of backtesting time series and cross-sectional approach
• Has the longest available dataset been used? If not, why not?
Observation: there may be a lot of ways to define what the longest available dataset is (data for all
markets available vs. some markets being available) and using the absolute longest may not always
be the most representative approach.
• Have all the available markets in the asset class been tested? If not, why not?
Observation: All the markets in an asset class should adhere to the risk premium (or at least not
be counter indicative) irrespective of the liquidity level.
• Was a hypothesis based on an economic rationale that had been formed prior to backtesting? If
not, why not?
Observation: There should have been ranges of parameters that are reasonable that have been
formed prior to backtesting.
• What is the sensitivity of the model to changing the parameters/markets/history 7?
• Does the backtest use any proxy data? If so, what assumptions and adjustments have been made
(and potential impact of such versus actual data)?
5: Backtesting results versus actual performance
• Disclosure of back-testing results (for launch, and subsequent strategy adjustments)8 – see
orange boxes in illustration below
• Disclosure of realised track record (between adjustment intervals) 9 – see blue boxes in illustration
below
• Disclosure of “ghost” performance (for previous strategy implementations) 10 – see green boxes in
illustration
• Are the environments when strategy performed well/badly in backtests similar to those while the
strategy is “live”?
6
An uncertainty quantification framework for the achievability of backtesting results of trading strategies Raymond Hon-Fu Chan,
Alfred Ka-Chun Ma and Lanston Lane-Chun Yeung ([Link]
quantification-framework-for-the-achievability-of-backtesting-results-of-trading-strategies )
7
E.g., a strategy looks good as a general average, but returns of the strategy occurred mainly in the distant past, but not more
recently, or time varying market betas of factors.
8
Where significant alterations in the strategy / overall objectives /design are being made, new back tests are required, but not for
gradual (small) alterations
9
Where a realised track record constitutes a carve out return (subcomponent of a fund), the carve out returns are not necessarily
achievable on a stand-alone basis (e.g. different approach to risk management such as different draw down controls, different
diversification benefit)
10
Caveat: where certain input factors cease to be available, it might not be possible to keep a strategy running (e.g. LIBOR,
discontinued markets or other input factors, etc.)
SBAI Toolbox – Backtesting: Key Questions for Investors to Ask – [15 July 2020] 3
Illustration: Time series of relevant back-testing, actual performance and ghost
performance results
SBAI Toolbox – Backtesting: Key Questions for Investors to Ask – [15 July 2020] 4
Key questions for investors to ask
1. Disclosure
• Has the provider explained how the backtesting results were derived, that the result is not the
performance of any actual account and that it is not a guarantee of future results?
• Has the track record been separated between simulated/theoretical performance and realized/live?
• Have all the strategy adjustments been correctly disclosed?
• If the strategy has been adjusted since launch, which strategy implementation has been used for
showing the simulated track record (prior to launch) and why?
• How do the strategy implementations differ in history? (backtest, live and ghost performance)
• Is any of the historic realised performance based on carve out returns?
2. Back-testing process
Assumptions
1. Was a hypothesis formed prior to backtesting? If yes, what is it (and list academic
references/rationale if relevant), if not why?
2. Does the strategy assume that trades can be implemented at the same closing price as the one
generating the trading signal, or some delay/“slippage” accounted for?
3. What other assumptions are being used? (i.e. inclusion of transaction cost, fees, financing cost,
stock lending fees, etc.)
4. How are the transaction costs incorporated in the backtest (e.g. which bid/ask spreads are used)?
Data
1. Have all the available markets in the asset class been tested? If not, why? What is the impact on the
strategy by adding all of the markets?
2. Has the longest available dataset been used? If not, why? What is the performance of the strategy if
it is back-extended?
3. What is the sensitivity of the performance by changing the parameters/markets/history?
4. Are any of the datasets proprietary or otherwise exclusive to the manager?
5. Are any of the datasets proprietary to third parties, such that they could become unavailable in the
future?
Approach
1. Has the strategy been tested on out of sample data? (applying the strategy to data that has not been
used in the initial backtesting phase)
2. What backtesting techniques have been employed to avoid over-fitting (train test split, multiple train
text split, rolling window approach…)?
3. How many trials have been undertaken to come up with the strategy? (incl. looking at minimum
backtest length for a given number of trials)
4. Have the metrics such as probability of backtest overfitting, performance degradation and probability
of loss, stochastic dominance, etc been evaluated?
5. Has the performance (and risk) impact of strategy enhancements (“naked” versus enhanced strategy
back-testing performance) been assessed?
6. How many degrees of freedom does the model contain11 and what is the risk of overfitting due to
excessive complexity or number of parameters?
7. Is the strategy expected to perform the same at scale? Are there capacity limitations? How was this
evaluated?
11
The model’s degrees of freedom correspond to the number of coefficients estimated minus 1
SBAI Toolbox – Backtesting: Key Questions for Investors to Ask – [15 July 2020] 5
3. Interpretation
• What is an appropriate discount factor to use for the particular back-tested track record?
• What were the environments where the strategy performed well/badly in the backtests? (Same
question for realized/live)
• If realised/live performance deviates from backtest results in similar market environments, what
accounts for the difference?
• Is there anything significantly different in the current market conditions compared to the backtest
which could have an impact on the strategy going forward?
• What is the back-tested track record of the model when the investor provides any set of parameter
values?
SBAI Toolbox – Backtesting: Key Questions for Investors to Ask – [15 July 2020] 6
Appendix A
Other areas of asset management / finance where back-testing is being used
Back-testing is being used in many areas of finance. Existing industry practices, regulatory guidance and
academic literature provide insights about how good back-testing practices should look.
Key areas of focus
• Prevention of deceptive communication (e.g. mixing back-testing results with actual performance),
requirement to clearly label back-tests
• Approaches to discounting back-test Sharpe Ratios
• Prevention of false assumptions regarding “tradeable prices” (i.e. using a closing price as a trading
signal and simultaneously tradeable prices)
• Assessing / understanding dispersion of indices which seek to model similar underlying risk premia
• Ongoing comparison of model projections against realised values (applicable in context of banking
risk models)
See below for overview of regulations and academic papers.
Regulations/practices
Source Content/approach
US: Advisers Act Rule Prohibition of fraudulent, deceptive, or manipulative
206(4)-1 (communication) practices
No-Action Letter, Clover Guidelines for advertising with actual and model performance. The
Capital Management, Inc. letter specifically lays out the standards the SEC staff uses to
October 28, 1986 determine whether the advertising is fair and not misleading.
Prohibits mixing models/back-tests with actual performance.
CFA Institute: GIPS • Hypothetical and back-tested composite returns do not satisfy
Standards the requirements of the GIPS standards
• To be GIPS compliant, performance data must only contain
actual portfolios managed by the firm
• Hypothetical or back-tested results can only be included when
clearly labelled as supplemental information
Basel II: Sound practices for Focus on the quantitative comparison of the IMM12 banks models’
backtesting counterparty forecasts against realised values.
credit risk models
Select research papers
Source Content/approach
Alternative Risk Premia: Is the • Many ARP indices have been proposed by different providers
Selection Process Important? that claim to capture the same underlying risk premia. Some of
The Journal of Wealth these categories of indices show risk-return characteristics that
Management, 22 (1) 25-38 are rather homogeneous, others are highly heterogeneous.
(Summer 2019). Hence, performance is provider dependent making the choice
Francesc Naya, Nils of an index an important component of the allocation process
Tuchschmid • A proposed index may not automatically mimic an existing risk
premium whose performance is sustainable or persistent:
Differences between simulated past results and live data for
individual indices suggest significant overfitting bias. Once
12
Internal Model Method
SBAI Toolbox – Backtesting: Key Questions for Investors to Ask – [15 July 2020] 7
launched, the performance of ARP indices dropped
significantly
• Conclusion: When it comes to allocating capital to ARP, an
extensive due diligence/selection process is required
Backtesting, The Journal of • Paper develops an analytical way to determine the magnitude
Portfolio Management, 42 (1) of the haircut to be applied to back-test results (Sharpe Ratios)
13-28 (Fall 2015) • It suggests that the “common” practice of discounting reported
Campbell R. Harvey, Yan Liu Sharpe Ratios of trading strategies by 50% (rule of thumb) is
not adequate and should be replaced by a non-linear approach
that only moderately penalises the highest Sharpe Ratios while
the marginal Sharpe Ratios are heavily penalised
An uncertainty quantification • Back-testing has always been indispensable in analysing the
framework for the achievability profitability of trading strategies in the empirical finance
of backtesting results of literature. When measuring return, while most of the literature
trading strategies implicitly assumes that a trade can be implemented at the
(Raymond Hon-Fu Chan, same closing price as the one generating the trading signal,
Alfred Ka-Chun Ma and some empirical evidence has been found suggesting that this
Lanston Lane-Chun Yeung, assumption presents a significant challenge to the robustness
(September 2012) of their results
• The results show that a significant number of technical trading
strategies with positive returns are found to be unviable in the
presence of implementation uncertainty
Quantifying Backtest • Assessment of the biases in the back-tested performance of
Overfitting in Alternative Beta “alternative beta” strategies using a sample of 215
Strategies commercially promoted trading strategies across five asset
Journal of Portfolio classes
Management Vol. 43, Nr. 2 • Results lend support to the cautions in recent literature
(Winter 2017) regarding back-test overfitting and lack of robustness in trading
Antti Suhonen (Aalto strategy performance during the “live” period (out of sample)
University School of Business), • Median 73% deterioration in Sharpe ratios between back-
Matthias Lennkh tested and live performance periods for the strategies in our
(Clear Alpha Limited), Fabrice sample
Perez • Establishment of a link between performance deterioration and
strategy complexity, with the realized reduction in live vs. back-
(Clear Alpha Limited)
tested Sharpe ratios of the most complex strategies exceeding
those of the simplest ones by over 30 percentage points
• Robustness of strategy exposure to risk factors varies between
asset classes and strategies, and appears reasonable in equity
volatility and FX carry strategies, but quite weak in the equity
value strategy in particular
Alice’s Adventures in • Paper assesses problems that might be underappreciated by
Factorland: Three Blunders investors (factor performance expectations, downside risks,
That Plague Factor Investing diversification)
(Arnott, Harvey, Kalesnik,
Linnainmaa)
SBAI Toolbox – Backtesting: Key Questions for Investors to Ask – [15 July 2020] 8
Appendix B
Working group members
Name Title Organisations
Iivo Paukkeri Portfolio Manager Aalto University Foundation
Duncan Moir Senior Investment Manager, Aberdeen Asset Managers Limited
Alternative Investment Strategies
Avgustina Sarkizova Partner, Dynamic Beta Albourne Partners
Evelina Klerides Partner, Dynamic Beta
Walter Cegarra Founder Arch Ventures
Deepak Gurnani Founder ARP Americas
Christopher Reeve Director of Risk Aspect
Andre Breedt Research Associate Capital Fund Management
Apostolos Katsaris CIO CdR Capital Ltd
Melissa Hill Co-Founder Eleos Capital Advisors Limited
Nicolas Papageorgiou CIO, Public Markets Fiera Capital
Hugues Bessette Chief Investment & Risk Officer Innocap
Steven Desmyter Global Co Head Sales & Marketing, Man Group
Man Group and Global Co Head of
Responsible Investing
Lisa Fridman Portfolio Manager Martlet Asset Management
Scott Treloar CEO Noviscient
Matt Talbert Senior Investment Manager Teacher Retirement System of Texas
Jerome Teiletche Head of Cross Asset Solutions, Unigestion
Managing Director
Samantha Foster Managing Director, Investments USC University of Southern
Office California
Dr. Sushil Wadhwani CIO QMA Wadhwani
Neal Howe Partner & Director of Investor Welton Investment Partners
Solutions
Rodney Livingston Senior Investment Officer West Virginia Investment
Management Board
Thomas Deinet Executive Director SBAI
SBAI Toolbox – Backtesting: Key Questions for Investors to Ask – [15 July 2020] 9