The Red, Amber, and Green of Performance Tests

It is a central tenet of this site (and the affiliated due diligence course) that performance distorts the manager selection process more than anything else.  In one sense that’s not surprising, since performance is quantifiable (at least after the fact) and it is what everyone most desires in their heart of hearts.

Extrapolating performance is the easiest thing there is to do.  Allocators, even experienced ones, have trouble ignoring the past numbers and focusing on the attributes that deliver future performance.

Therefore, performance tests — both explicit and implicit ones — are common.  And problematic.

United Kingdom

In 2019, the Financial Conduct Authority (FCA) instituted rules that required the managers of investment vehicles to conduct an annual value assessment for each of their products.  The analyses don’t just cover performance; here are the dimensions that must be reviewed:

Quality of service

Performance

General costs

Economies of scale

Comparable market rates

Comparable services

Charges across different vehicle classes

Last year, the FCA summarized the first assessments that were produced by managers, indicating the range of output that it saw.  The performance-related shortcomings fell into predictable categories:  gross rather than net performance; higher fee classes not being properly assessed; benchmarks being used that didn’t reflect a fund’s investment policy and strategy; questionable comparisons (as when “absolute return” funds got automatic credit because of a rising market); underperformance being attributed to a style that’s “out of favor;” and comparisons being made to peers rather than an appropriate index.

In combining the categories for an overall assessment of value, “some [manager] frameworks gave a much heavier weighting to a fund’s performance than to the other 6 considerations,” although the FCA “saw examples where little emphasis was placed on poor fund performance.”  In a self-test, you can assume that the relative weight of performance used in the overall assessment will be dependent on whether the performance was good or bad, pointing out the need for consistent treatment going forward.

An example is in order:  Baillie Gifford’s 2022 value assessment report.  This attractive document shows that a regulatory requirement can serve double-duty as a piece of marketing material.  The firm meets the terms of the FCA framework, while taking the opportunity to explain its beliefs and approach.

During the year under review (ending in March 2022), “many of the funds underperformed significantly,” yet only a few got bad performance marks, due to the explosive returns of the prior years, when the firm “acknowledged the exceptional returns were unlikely to be repeated.”

As those good years fade into the past, how will the assessments — boiled down to the colors of red, amber, and green (RAG) — change?  (Baillie Gifford uses three- or five-year periods, as appropriate for each fund objective, and most are compared to an index-plus target when assigning one of those colors.)

The colors are assigned to each dimension evaluated, as well as for the aggregate assessment of value, but our focus here is on performance.  What will happen if investors, advisors, and governing bodies see row after row of red dots in the performance column?  Concern (fear?) will be pervasive, just as comfort was when they all were green.

Australia

Another relatively new performance test is supervised by the Australian Prudential Regulation Authority as part of the “Your Future, Your Super” (YFYS) reform package.  Unlike in the UK, it has specific calculations and remedies.  The initial test was for seven years, which has now been expanded to eight.  Two July Pensions & Investments articles (here and here) address the implications of the performance test.

MySuper funds that lag by an average of fifty basis points per annum are required to “to inform their members of that fact and point them to super fund competitors with superior results.”  Not surprisingly, that “has made tracking error a first-order consideration,” leading to changes in strategy:

The propensity of super funds to allocate to contrarian, high-active-share managers has diminished in an environment where benchmarks have gone from an afterthought to being front and center.

We can argue the merits of holding managers to some semblance of performance success, but it seems that the test will result in more closet indexing, which is hardly a desirable outcome.  And consider TWUSUPER, which passed by having only 47 basis points of underperformance.  Wherever you draw a hard line there will be slim differences between those facing “severe penalties” and those who escape them for at least a year more.

Naturally, there are advocates and critics of the approach.  A 2021 Firstlinks article asked the question, “Is this really the best way to remove the super underperformers?”  On the flip side, CEM Benchmarking applied the YFYS rules to its database of pension plans and arrived at this conclusion:

Our research has found that the YFYS test, over the long term, is likely to contribute to improvement in system-wide performance.  It has also highlighted the characteristics of funds that tend to perform well (and less well).

General practice

Beyond these regulator-mandated initiatives, many organizations have performance tests of their own.  In some cases performance hurdles for existing managers are spelled out in investment policy statements (usually accompanied by “watch list” rules to be followed).  Monthly, quarterly, and yearly reviews often feature those RAG colors or some other scheme to make the presentation more salient to the reader.

You’ll also sometimes see requirements spelled out for prospective investment vehicles — that they must be in the first or second quartile, or have four or five stars.  But that’s usually not necessary for the same result to occur.  Screens filter out past losers early in the selection process, and almost no one wants to go to bat for a manager that will be challenged because of its historical results.

Explicit and implicit performance rules are often applied without consideration as to whether they add value or not — since they just seem right — even though they are often counterproductive.  Therefore, it makes sense to identify and assess them, and to design organizations to minimize the bad decisions that can result.  This is especially difficult given the chains of agents that reinforce the tendencies.

For example, an institutional asset owner may have to deal with layers of tests applied by consultants, in-house due diligence analysts, the CIO, and the investment committee.  Someone at an advisory firm may use an outside research firm for ideas, need to hew to the buy list of a centralized due diligence function, make a presentation to their investment committee — and then have to sell the idea to clients.  At each stage, performance exerts a gravitational pull on those involved.

Here’s an idea:  Deconstruct your due diligence and manager selection processes and find out where the performance tests are and whether they make sense.  You can even color the steps red, amber, and green if you’d like.

Published: September 7, 2022

To comment, please send an email to editor@investmentecosystem.com. Comments are for the editor and are not viewable by readers.