Fund returnsandperformanceevaluationtechniques grinblatt

JOURNAL OF FINANCIAL AND QUANTITATIVE ANALYSIS VOL. 29, NO. 3, SEPTEMBER 1994

A Study of Monthly Mutual Fund Returns andPerformance Evaluation Techniques

Mark Grinblatt and Sheridan Titman*

Abstract

This paper empiricatty contrasts the Jensen Measure, the Positive Period Weighting Mea-sure, developed in Grinblatt and Titman (1989b), and a measure developed from the Trey nor-Mazuy (t966) quadratic regression on a sample of 279 mutuat funds and tO9 passive port-folios, using a variety of benchmark portfolios. The study finds that the measures generallyyield simitar inferences when using the same benchmark and that inferences can vary, evenfrom the same measure, when using different benchmarks. This paper also analyzes thedeterminants of mutual fund performance. Tests of fund performance that employ fundcharacteristics, such as net asset value, load, expenses, portfolio tumover, and managementfee are reported. These tests surprisingly suggest that tumover is significantly positivelyrelated to the ability of fund managers to eam abnormal retums.

I. Introduction

The development of the Capital Asset Pricing Model (CAPM) in the tnid-

1960s provided financial economists with a tool for adjusting retums for risk. An

important application ofthis model, implemented by Jensen (1968), (1969),' is the

evaluation of the performance of managed portfolios. However, this approach to

evaluating portfolio performance has been the subject ofa great deal of controversy.

There are three major reasons for this controversy: benchmark efficiency,

timing, and statistical power. This paper seeks to empirically assess the impor-

tance of each of these three issues. We do this by studying the performance of

a sample of 109 passive portfolios constructed from securities characteristics and

•Anderson Graduate School of Management, University of Catifomia, Los Angetes, Los Angetes,CA 90024, and Carrott Sctiool of Management, Boston College, Ctiestnut Hill, MA 02167, respectively.Ttie authors thank Julian Franks, Bruce txhmann, David Mayers, Rena Repetti, Jay Shanken, JFQAReferee and Associate Editor Rex Thompson, and seminar participants at University of California, LosAngeles, University of California, Berkeiey, University of British Columbia, University of Washington,Duice University, Rutgers University, and the Wharton School, University of Pennsylvania, for valuablecomments on earlier drafts. The authors also appreciate the contributions of Jim Brandon, Nick Crew,Pierre Hillion, Khai Kan, Haeyon Kim, Erik Sirri, and Mark Tsesarsky, who provided excellent researchassistance, and of Bruce Ixhmann and David Modest, who supplied monthly factor retums. Titmangratefully acknowledges financial support from the Batterymarch Fellowship program. Both authorsacknowledge financial support from the UCLA Academic Senate.

' An equivalent approach was developed by Treynor (1965). The issues discussed in this paper thatapply to Jensen's Measure also apply to Treynor's Measure.

419

420 Journal of Financial and Quantitative Analysis

industry groups, as well as a sample of 279 mutual funds. Differences in the per-formance of the passive portfolios with the various evaluation techniques wouldconfirm that performance was potentially sensitive to the technique used. Evidencethat the abnormal performance, as measured by a particular evaluation technique,systematically deviates from zero would indicate a bias in the technique, giventhat uninformed investors can easily mimic these passive portfolios. Differencesin the performance of the mutual funds would indicate that the set of strategiesfollowed by mutual fund managers is sensitive to the technique used.

A. Benchmark Efficiency

The first source of controversy is that the CAPM approach (and analogousmultifactor approaches) to performance evaluation requires the use of a benchmarkportfolio(s).^ As Roll (1978) and others have noted, performance evaluation withthese methods is likely to be sensitive to the benchmark choice.^ In particular,benchmarks that are mean-variance inefficient provide erroneous inferences. Thewell-known size and dividend-yield biases, documented in tests of the CAPM,provide one set of recipes for managers who wish to game an evaluation withCAPM-based benchmarks.

To assess the benchmark issue, the sensitivity of the different performancemeasures to the choice of the benchmark is analyzed. The benchmarks exam-ined include the CRSP equally-weighted index (EW), the CRSP value-weightedindex (VW) (a benchmark consisting of ten factor portfolios (FIO) constructed byLehmann and Modest (1988)), and an eight-portfolio benchmark (P8) developedby Grinblatt and Titman (1988) and used in Grinblatt and Titman (1989a).

B. Timing Ability

The second source of controversy in this literature is a statistical bias inJensen's evaluation technique that arises whenever an evaluated portfolio suc-cessfully times the market."* This bias can result in successful timers generatingnegative performance numbers, even in large samples.

In response to this problem, Grinblatt and Titman (1989b) proposed a newmeasure, the Positive Period Weighting Measure, that is not subject to the timing-related perversities of traditional evaluation techniques. An altemative to thePositive Period Weighting Measure is a measure developed here that uses theTreynor and Mazuy (1966) quadratic regression to aggregate the effects of timingand selectivity ability.^ This "Treynor-Mazuy Total Performance Measure" isspecifically designed to pick up beta variations that are linearly related to theretum of the benchmark portfolio. If retums are normally distributed, then in theabsence of timing ability, these two new measures generate the same inferenceson average as the Jensen Measure. However, funds that either time the market or

^See Grinblatt and Titman (1993) for a performance measurement approach ttiat does not requirea t)enctimark portfolio.

^See atso ttie discussion in Elton, Gruber, Das, and Hlavka (1993).••See Jensen (t972), Admati and Ross (t985), Dybvig and Ross (t985), and Grinblatt and Titman

(1989b).'See Admati et at. (t986) for the conditions under which this is true.

Grinblatt and Titman 421

pick portfolios with retums that are co-skewed with the benchmark retums willexhibit different performance with these three measures.

C. Statistical Power

The final source of controversy is statistical power. Portfolio retums arenoisy, which makes it difficult to detect abnormal performance when it exists. Forexample, a portfolio manager of a billion dollar fund (of which there are many)who was able to consistently generate a 2-percent abnormal retum would thus becreating over $20 million per year in value for the fund's investors. However,an excess retum of 2 percent per year is generally not statistically significant,even with ten years of monthly retum observations. Studies that employ a largesample of mutual funds often exacerbate this problem. This is because the levelsof statistical significance must be adjusted to account for the fact that out of asample of 200 funds, a few will exhibit extreme performance simply by chance.

The strategy implemented in this paper requires that we have some priorbeliefs about the determinants of superior (or inferior) portfolio performance. Evenwhen performance measures in isolation are not sufficiently powerful to reject thenull hypothesis of no performance, tests using prior beliefs about the determinantsof performance may have power to reject. For example, if we suspect that fundswith low net asset values can outperform funds with high net asset values (becausethey have a smaller effect on market prices with their offers to buy and sell), wecan estimate cross-sectional regressions of performance on net asset value.

The paper is organized as follows. Section II describes the data. SectionIII describes the measures and benchmarks. Section IV assesses the benchmarkissue. Section V assesses the timing issue. Section VI describes the relationbetween performance and fund characteristics such as tumover ratio, managementfee, expense ratio, and load. Section VII concludes the paper.

II. The Data

A. Mutual Fund Data

Mutual fund data were obtained from CDA Investment Technologies, Inc., ofSilver Springs, Maryland. The data consist of monthly cash-distribution-adjustedretums and investment goals for 279 funds that existed from December 31, 1974,to December 31, 1984. The data were spot checked with data collected by handand found to be accurate. As with most mutual fund studies, the mutual fund retumdata are subject to survivorship bias. Since CDA's nonacademic clients have littleinterest in mutual funds that no longer exist, funds that went out of business priorto December 31,1984, are excluded from the CDA data set. Grinblatt and Titman(1989a) estimated the survivorship bias in this sample and it does not appear to belarge, on the order of 0.5 percent per year.^

Fund characteristics were obtained from the Wiesenberger Investment Com-panies Service (1975). This includes annual data on net asset value, load, man-

*See atso the anatysis in Brown, Goetzman, Ibbotson, and Ross (t992).


agement fee, expense ratio, and portfolio tumover at the beginning of the sampleperiod.

B. Stock Data

In addition to the sample of mutual funds, we also obtained stock retums fromthe CRSP Daily Retums File. The daily retums were compounded to calculate themonthly portfolio retums used to form and test the benchmark portfolios, as wellas evaluate the performance of 109 passive investment strategies. In addition, dataon cash dividends and interest rates, used to form some of the passive strategiesand compute a risk-free rate, were respectively obtained from the CRSP DailyMaster and Bond Files.

Since the passive strategies do not use private information, they should gen-erate zero performance with properly designed measures and benchmarks. The109 portfolios include 37 industry portfolios and 72 portfolios formed on the basisof six characteristics that are related to CAPM and APT "anomalies." Firms aredivided into 37 industry portfolios based on their two-digit SIC codes at the be-ginning of the sample period. All "two-digit" industries with at least 20 firms areincluded in the analysis. The 72 characteristic portfolios are formed by rankingthe stocks on the basis of the different characteristics and then dividing them into12 equally-weighted portfolios based on their rankings. For a given characteristic,portfolio 1 represents the portfolio formed from firms with the lowest rankings ofthat characteristic. (For example, portfolio 1 ofthe size portfolios consists of firmswith sizes among the lowest 8'/3 percent.) Specifically, the six characteristics are:

1) Firm size, determined by the most recent capitalization available on the CRSPMaster File prior to the month of the observed retum.

2) Dividend yield, calculated from the CRSP Master File using the calendar yearprior to the observed retum.

3) Past retums, computed from the CRSP Daily Retums File using the threecalendar years prior to the observed retum.

4) Interest rate sensitivity, as measured by the slope coefficient on an equally-weighted index of 16- to 21 -year govemment bonds in an excess retum regres-sion using this bond portfolio's retum and the retum of an equally-weightedportfolio of all CRSP-listed stocks as regressors. The time series uses the threecalendar years prior to the observed retum.

5) Co-skewness, as measured by the slope coefficient on the "squared term" ina regression using the excess retum and squared excess retum of the equally-weighted stock portfolio. The time series uses the three calendar years prior tothe observed retum.

6) Beta, as computed against the equally-weighted stock portfolio in a marketmodel excess retum regression using the three calendar years prior to the ob-served retum.


III. Measures of Performance and Benchmark Portfolios

A. Perfortnance Measure Description

Three performance measures will be compared in this study: the JensenMeasure, the Positive Period Weighting Measure, and the Treynor-Mazuy Measureof Total Perfomiance. Each calculates performance relative to a benchmark, whichis a portfolio or a group of portfolios, and computes abnormal retums by using thebenchmark to adjust the average retum of a portfolio for risk. Ifthe methodologybehind the measures is correct, we interpret the measures as the average amountper month by which a manager beats a passive portfolio with equivalent risk perdollar invested.

The Jensen Measure is the intercept in a regression of the time series ofexcess retums (above the one-month Treasury Bill rate) of the evaluated portfolioagainst the time series of excess retums ofthe benchmark portfolio(s). This is thetraditional measure used in most previous studies of fund performance.

The Positive Period Weighting Measure, developed in Grinblatt and Titman(1989b), is obtained in two steps. First, one selects a vector of weights, wi , . . . , Wf.Each element of the vector corresponds to one time series observation. Second,the performance of a fund is computed by taking the dot product of the weightvector and the excess retum vector of the mutual fund, i.e.,

a = S,w,Rp,.

The weight vector is chosen to have nonnegative weights that make theweighted sum of the excess retums of the benchmark portfolio(s) sum to zero,i.e.,

E,w,R,, = 0, vv, > 0,

where R/, = period ; excess retum of the index portfolio used as a benchmark.Thus, the weight vector is both benchmark specific and sample period specific.

Obviously, there are many sets of weights with the properties mentionedabove. The weights employed in this study can be interpreted as the marginalutilities of an uninformed investor with power utility. Given this interpretation,uninformed investors with power utility prefer to add to their existing optimalpassive portfolio a small amount of any mutual fund retum with a measure thatis positive. Grinblatt and Titman (1989b) provide conditions under which posi-tive values for these measures imply that the mutual fund manager has superiorinformation. This measure is discussed in more detail in Appendix A.

The Treynor-Mazuy (1966) quadratic regression is similar to the Jensen Mea-sure regression. Here, however, there are two explanatory variables: the excessretum of the benchmark portfolio and the square of that excess retum. The in-tercept in this regression provides an estimate of selectivity ability; the productof the quadratic term slope coefficient and the variance of the benchmark retum(henceforth, the Treynor-Mazuy Timing Measure), provides an estimate of timingability. We call the sum of the timing and selectivity terms the Treynor-MazuyMeasure of Total Perfonnance. The latter measure is discussed in more detail inAppendix B.


In principle, if we use the Treynor-Mazuy regression to analyze performancewith multiple portfolio benchmarks, a large number of cross-product terms mustbe included in the regression. For the P8 and FIO benchmarks, the number ofcross-product terms would be very large, making this infeasible. What we doinstead is calculate P8 and FIO Treynor-Mazuy measures that use the retums ofthe ex post efficient combination of the portfolios included in the benchmarks.However, since no prior research has offered theoretical justification for doingthis, one should interpret any Treynor-Mazuy performance results with multipleportfolio benchmarks cautiously.

B. Benchmark Portfolio Description

There are four benchmarks: the first two are the monthly rebalanced equally-weighted index computed from all CRSP securities and the CRSP value-weightedindex. The third benchmark is a factor portfolio benchmark, created from factorportfolio weights used in Lehmann and Modest (1988). The input portfolio weightswere derived from a 10-factor maximum likelihood factor analysis over the 1978-1982 period. The portfolios contain 750 securities in the 1978-1982 period andslightly fewer in the 1975-1977 and 1983-1984 periods since some ofthe securitiesfrom the middle period did not exist in the early and later periods. Althoughthis method of forming factor portfolios can potentially create survivorship bias,(unreported) comparisons with the equally-weighted index suggest that this biasis not large.

Past research suggests that none of these benchmarks is mean-variance effi-cient. In particular, they generate biased performance measures that relate to size(Banz (1981), Reinganum (1981)), dividend yield (Litzenberger and Ramaswamy(1979), (1982)), and beta (Black, Jensen, and Scholes (1972)). In the 1975-1984sample period, Grinblatt and Titman (1988) found the same size, dividend yield,and beta-related biases with the EW and FIO benchmarks.^

The fourth benchmark, the P8 benchmark developed in Grinblatt and Titman(1988) and used in Grinblatt and Titman (1989a), is not subject to any ofthe afore-mentioned biases. The basic idea underlying the formation of this benchmark isthat various firm characteristics are correlated with their stocks' factor loadings. Asa result, characteristic-based portfolios can be used as proxies for the factors. Theeight-portfolio benchmark consists of four size-based portfolios, three dividend-yield-based portfolios, and the lowest past retums portfolio: the equal weightingof the smallest 8'A percent of firms is the first size-based portfolio; the average ofthe second and third smallest size portfolios (out of 12) is the second portfolio;the average of the fourth through ninth smallest size portfolios is the third port-folio; and the average of the three largest size portfolios is the fourth. An equalweighting of the two lowest dividend-yield portfolios (out of 12), the fifth andsixth lowest dividend-yield portfolios, and the tenth and eleventh dividend-yieldportfolios comprise the three dividend-yield portfolios in the benchmark. Thelowest past retums portfolio (out of 12) is the eighth portfolio in the benchmark.

'Grinblatt and Titman (1988) did not test the VW benchmark for mean-variance efficiency, butthere is a well-known, (and by historical standards) large size-related bias with this benchmark overthe sample period.


There is some evidence for the assertion that the P8 benchmark better reflectstme performance than does the factor benchmark. First, as we stated earlier, theredo not appear to be biases in the P8 benchmark associated with the well-knownCAPM and APT anomalies in finance (see Grinblatt and Titman (1988)). Second,the performance statistics reported in Grinblatt and Titman (1989a) with the P8benchmark are very similar to the scores in Grinblatt and Titman (1993), whichdo not require a benchmark, and instead make use of the fund's prior portfolioholdings to risk-adjust a fund's average retum.^

IV. The Sensitivity of Performance to Different Benchmarks

Tables 1-3 analyze the sensitivity of perfonnance to benchmarks using thesamples of 109 passive portfolios and 279 mutual funds. Table 1 presents theaverage monthly abnormal retums in these samples with the three measures andthe four benchmarks. Table 2 presents correlation matrices that examine the extentto which benchmarks matter. Table 3 presents regression coefficients and F-testsof the null hypothesis that various pairs of measures are identical.

A. The Sensitivity of Average Performance to Different Benchmarks

The average abnormal retums in Table 1 can be generated either with across section (find performance for each fund, then average) or with a time se-ries (equally-weight the retums of the funds, then find the performance of theequally-weighted portfolio). The f-statistics, presented below the abnormal re-tums, are computed from the time series standard errors.' Thus, under the randomwalk hypothesis and the null hypotheses of no performance and homoskedasticresidual variances, the ̂ -statistics should be generated by the student f-distribution.The procedures for calculating the ^-statistic for the Positive Period Weighting andTreynor-Mazuy Measures are described in Appendices A and B, respectively.

Note that with either Panel A (passive portfolios) or Panel B (mutual funds)in Table 1, both the average performance and f-statistic in each column (samebenchmark, different measures) vary much less than the numbers in each row(same measure, different benchmarks). Hence, for the average passive portfolioand average mutual fund, benchmarks seem to matter much more for performancethan do measures.

1. Passive Portfolios

With the exception of the performance results with the value-weighted indexand the Treynor-Mazuy Measure with the P8 benchmark, all of the passive port-folio's average performance numbers in Table 1, Panel A are small. The largest

ignores the fact ttiat neittier of these two papers focused on actual fund retums, but onty onhypothetical retums constructed from the CRSP-tisted portion of their holdings. If non-CRSP listedsecurities are important components of the numtjers we report here, then the P8 benchmark could havehidden biases.

'The cross-sectional standard deviations and standard errors are biased because the retums ofdifferent funds are correlated. For the passive portfolios, the 12 cross-sectional standard deviations ofthe abnormal monthly retums range from 0.0021 to 0.0027. For the mutual funds, the 12 range from0.0030 to 0.0036.


TABLE 1

Means and f-Statistics for the Three Performance Measures Using Four Benchmarks

Benchmark EW Index

Panel A. 109 Passive Portfolios

JM

PW

TM

0.0002(1,50)

0,0003(1,66)

0,0003(1,70)

Panel B. 279 Equity Mutual Funds

JM

PW

TM

-0.0028(-1.59)

-0,0034(-1,92)

-0,0037(-2,09)*

VW Index

0,0080(2.83)**

0.0080(2.82)**

0.0080(2,82)**

0,0009(1,07)

0,0008(1,01)

0.0008(0.98)

F10

0,0004(0,50)

0,0007(0,88)

0,0003(0,39)

-0.0033(-3,56)**

-0.0037(-3.92)**

-0.0043(-4,48)**

P8

0,0001(0,42)

0,0001(0,46)

-0.0022(-11,29)**

-0,0004(-0.65)

-0,0001(-0,23)

-0,0025(-4,23)**

This table presents the average performance of 109 passive portfolios formed on the basis of secu-rity characteristics as well as 279 mutual funds calculated with the Jensen (JM), the Positive PeriodWeighting (PW), and the Treynor-Mazuy Total Performance (TM) Measures, The benchmarks used in-clude the Equally-Weighted Index (EW), the Value-Weighted Index (VW), 10 Factor Portfolios (F10),and eight Characteristic-Based Portfolios (P8), The f-statistics are calculated from the 120 time seriesobservations. For the Jensen Measure, they are the standard intercept f-statistic derived from a regres-sion of the excess returns of an equally-weighted portfolio of passive portfolios or mutual funds on theexcess returns of the benchmark portfolio(s). For the other two measures, see Appendices A and Brespectively,

*Significant at 0,05 level (two-tailed test),**Significant at 0.01 level (two-tailed test).

performance is seen with the value-weighted index, which exhibits a 10 percentper year abnormal retum for the average passive portfolio. Since the averagepassive portfolios should exhibit zero performance, the magnitude of the perfor-mance numbers implies an inefficiency in the VW benchmark that can easily begamed (for example, by buying small stocks). The P8 benchmark, by contrast,yields an average abnonnal retum of about 0.1 percent per year for the averagepassive portfolio with both the Jensen Measure and the Positive Period WeightingMeasure.

2. Mutual Funds

The mutual funds exhibit average abnormal performance that ranges fromabout —4 percent or —5 percent to 1 percent in each of the rows. Once again, theP8 benchmark with the Positive Period Weighting Measure and Jensen Measureis closest to zero. If we add 2 percent per year in fees and transaction costs tothis number, this benchmark would seem to indicate that the average mutual fundmanager beats the market by 2 percent per year. This perfonnance is consistentwith the performance evaluation of gross retums in Grinblatt and Titman (1993),which employed fund holdings to adjust for risk, rather than employ a benchmark.None of the other benchmarks generate perfonnance numbers that are consistentwith these earlier findings.

The magnitude of Table 1, Panel B's average performance scores for themutual funds using the equally-weighted and factor analysis benchmarks (about


-3.5 to - 5 percent per year) are too large to be explained by the transaction costsand the expenses of the funds.'" This suggests that the negative performancemust be either due to the funds systematically picking stocks that do poorly, orto the benchmarks being inefficient. Given the known size-related biases of thesebenchmarks, and the fact that mutual funds tend to invest in larger than averagefirms, the latter possibility is the more plausible. Indeed, since average fees andtransaction costs are less than 3 percent per year, the highly negative abnormalretums of the mutual funds alone with the EW and FIO benchmarks should besufficient to reject the benchmarks as valid indicators of a fund manager's ability.

Panel B's positive mutual fund performance with the value-weighted bench-mark can be similarly explained by examining the composition of the averagemutual fund portfolio. The average mutual fund tilts its portfolio toward smallstocks more than the value-weighted index, but less than the equally-weightedindex. Thus, the known size-related bias of the value-weighted benchmark overthis period is probably sufficient to generate the observed results.

With the P8 benchmark, which is not subject to this size-related bias, averageperfonnance is virtually zero (except for the Treynor-Mazuy measure). Clearly,the conclusions that one would draw about the overall performance of the mutualfund industry are strongly influenced by the choice of benchmarks. Moreover,they are likely to be erroneous if the benchmark can be easily gamed by exploitingCAPM and APT anomalies.

B. Correlations between Performance Estimates Using DifferentBenchmarks

Table 2 reports correlations that, in combination with the results from Table1," illustrate the sensitivity of the different performance measures to the choiceof the benchmark. Of particular note is the correlation between the perfonnancescores generated with the P8 benchmark and the three other benchmarks, whichare not very large. While the EW, VW, and FIO benchmarks also generate differentinferences, the correlations are much higher. For example, with the Jensen Measureand the passive portfolios, the correlation between EW and FIO performance is0.64, which is about twice as large as the correlation between P8 perfonnance andperformance with either of these two benchmarks.

Table 2 suggests that the performance of individual funds is likely to besensitive to the choice of benchmark portfolios, even in cases where the averagefund has similar perfonnance with two benchmarks (e.g., P8 and VW in Panel B).The correlations between performance with any pair of benchmarks are not closeto one. The passive portfolios are deliberately comprised of stock portfolios thatdiffer as much as possible in a number of important dimensions and thus exhibitlower correlations than the mutual funds. However, even with the mutual funds,the maximum correlation does not exceed 0.9. Moreover, except for the FIO andEW comparison, the largest mutual fund correlations are with benchmark pairsthat exhibit highly different average scores in Table 1.

'"See Grinblatt and Titman (1989a) for estimates of these costs.' ' If, for two benchmarks (measures), the means and f-statistics are similar and the correlations are

close to one, the scores of individual funds are virtually identical with the two benchmarks (measures).


TABLE 2

Pearson Correlations between Abnormal Returns Using Different Benchmarks

109 Passive Portfolios

Benchmark VW

Panel A. Jensen treasure

FIO

EW 0.67 0.64VW 0.51F10

Panel B. Positive Period Weighting Measure

EW 0.74 0.69VW 0.65FIO

Pane! C. Treynor-Mazuy Total Performance Measure

EW 0.75VWFIO

0.710.59

P8

0.350.170.32

0.190.090.21

>

0.370.220.41

VW

0.91

0.89

0.88

279 Equity Mutual

FIO

0.860.69

0.890.71

0.870.65

Funds

P8

0.600.630.42

0.480.570.37

0.570.610.43

This table presents the Pearson correlations between the performance numbers generated with fourdifferent benchmarks on a sample of 109 passive portfolios formed on the basis of security charac-teristics as well as 279 mutual funds. The benchmarks include the Equally-Weighted Index (EW), theValue-Weighted Index (VW), 10 Factor Portfolios (FIO), and eight Characteristic-Based Portfolios (P8).These correlations are calculated separately for three performance measures: the Jensen, the PositivePeriod Weighting, and the Treynor-Mazuy Total Performance Measures.

C. A Formal Test of the Similarity of Performance between Benchmarks

A limitation of the analysis in the previous section is the lack of reportedsignificance levels for the correlations in the previous subsection. Because thevectors used to compute the correlations contain correlated random elements, thestandard significance tests are biased. This section presents a formal F-ie&l ofwhether the performance numbers with two different benchmarks are similar. Theprocedure is based on an extension and aggregation of the techniques developed inFama and MacBeth (1973), Sefcik and Thompson (1986), and Gibbons, Ross, andShanken (1989), and makes use ofthe time series to compute significance levelsfor the similarity of the measures.

For N funds, let a\ and 0:2 denote the two N x I vectors of performancecomputed using two different benchmarks (and the same measure). A univariatecross-sectional regression of ai on 02 has an intercept of zero and a slope co-efficient of one under the null hypothesis that 02 is an unbiased estimate of theperformance measure a j . The reverse regression tests whether ai is an unbiasedestimate of 0:2. The two performance measures can only be unbiased estimatesof each other when they are identical, since forecast error in either regression bi-ases the slope coefficient in the other regression toward zero. As a result, eitherthe regression or the reverse regression will reject the hypothesis of unbiasednesswhenever the measures are sufficiently different. A small sample statistic with aknown distribution exists only when the test is performed separately for the re-gression and reverse regression (assuming bivariate normality). Unfortunately, nosimilar test exists for the joint hypothesis that the coefficients have these values inboth the regression and reverse regression simultaneously.


TABLE 3

F-Tests of the Similarity between Performance Scores with Different Benchmarks

Independent VariabieBenchmarl<

EW2,117

Panel A. Jensen Measure

EW

VW

FIO

7071F7071F7071F

P8 ' 7071F

Pane! B. Positive Period

EW

VW

FIO

7071F7071F7071F

P8 7071F

Panel C. Treynor-Mazuy

EW

VW

FIO

P8

7071

7071F

7071F7071F

-0,0040.8426.04**

-0,0000.7931.08

-0.0030.5484,05*

Weighting Measure

-0.0040.8437.39**

-0.0000.8470.61

Dependent Variabie Benchmari< andDegrees of Freedom for F-Test

VW2,117

0.0040.9809.97**

0.0030.691

12.98**

0.0010.623

10.74**

0.0040.931

13,46**

0.0030.709

15.55**

-0.003 0.0010.408 0.5146.29** 12.20**

Total Performance Measure

-0.0040.8557.78**

-0.0000.7850.95

-0.0020.5525.81**

0,0040.905

15.29**

0,0030.609

19.62**

0.0020.608

22.90**

FIO2,108

-0,0010.9280.58

-0.0040.695

18,46**

-0.0030.418

19.73**

-0.0010.9280.51

-0.0040.704

20.07**

-0.0040.334

22.48**

-0,0010.9560.51

-0.0050.701

23,49**

-0.0030.461

18.14**

P82,110

0,0010.647

13.75**

-0.0010.6329.72**

0.0010,421

19.76**

0.0020.556

18.95**

-0,0010.6358.03**

0,0010.416

22,24**

-0.0000.5817.59**

-0.0030.605

23.35**

-0,0010,399

12.67**

This tabie presents the intercepts {70), siope coefficients (71), and time series F-tests (described inAppendix C) from cross-sectional regressions of performance with one benchmark against performancewith another benchmark. The benchmarks inciude the Equaiiy-Weighted index (EW), the Vaiue-Weightedindex (VW), 10 Factor Portfolios (FIO), and eight Characteristic-Based Portfoiios (P8). These tests arecalcuiated separateiy for three performance measures: the Jensen, the Positive Period Weighting, andthe Treynor-Mazuy Totai Performance Measures.

*Significant at 0.05 levei.** Significant at 0.01 levei.


The F-test of whether the coefficient vector in a single regression significantlydeviates from (0,1) is biased when the cross section is used to compute the F-statistic. To remedy this, we employ a time series procedure to compute therelevant F-statistic. The steps required for computing this time series F-statisticare reported in Appendix C.

Table 3 reports the coefficient estimates for cross-sectional regressions andreverse regressions for the mutual funds along with results of an F-test of whetherthe intercept is zero and the slope coefficient is one in these regressions.'^ Theresults are presented in four matrices, where each matrix corresponds to a measure.The rows in each matrix correspond to the independent variables in the cross-sectional regression. The columns correspond to the dependent variable. Eachelement in the matrix consists of a triplet: respectively, the least squares interceptestimate, the slope coefficient estimate, and the time series F-statistic.

The results, which are consistent between each regression and its correspond-ing reverse regression, provide conclusions that are similar to those derived fromTables 1 and 2: benchmarks generally do matter. There are significant deviationsfrom an intercept of zero and a slope coefficient of one for all of the regressions,except for results with the factor benchmark versus the equally-weighted bench-mark.

V. The Sensitivity of Performance to Different Measures

Tables 1, 4, and 5 indicate that the Jensen and Positive Period WeightingMeasures provide very similar inferences. Table 4 examines correlations betweenthe performance scores using the different measures, but the same benchmark.The most striking observation is the magnitude of these numbers. For any givenbenchmark, the correlation between the performance scores exceeds 0.94 for thepassive portfolios and 0.97 for the mutual funds.

. In combination with the means and /-statistics in Table 1, these results in-dicate highly similar scores between the Jensen and Positive Period WeightingMeasures.'^ This is true for both the passive portfolios and the mutual funds.The Treynor-Mazuy Total Performance Measure, which also exhibits near perfectcorrelation with the other two measures, is also highly similar for three of the fourbenchmarks. However, it appears to shift performance downward by 2.5 percentto 3 percent per year for the P8 benchmark, as indicated by the means in Table1. This downward shift appears to be virtually constant on a fund by fund basis,whether looking at individual passive portfolios or looking at individual mutualfunds. One has to conclude from this that the Treynor-Mazuy Measure is gener-ating spurious inferences with the P8 benchmark. As was suggested earlier, thisis possible because of the ad hoc innovation employed to adapt this measure to amultiple portfolio benchmark.

F-statistics cannot be computed for the passive portfolios because computing the F wouldrequire inversion of a singular or nearly singular matrix.

"Unreported cross-sectional standard errors are also virtually identical across measures, irrespectiveof the benchmark.


TABLE 4

Pearson Correlations between Abnormal Returns Using Different Measures

279 Equity Mutual Funds

Measure

Panel A.

JMPWPanel 8.

JMPWPanel C.

JMPWPanel D.

JMPW

109 Passive Portfolios

PW

EW Benchmark

0,97

VW Benchmark

1.00

F10 Benchmark

0.95

P8 Benchmark

0.95

TM

0.961.00

1,001.00

0,940.99

0,940,94

Measure PW TM

JM 0.99 0,98PW 1.00

JM 1,00 1,00PW 1,00

JM 0,99 0.98PW 0,99

JM 0,98 0.97PW 0,97

This table presents the Pearson correlations between the performance numbers generated with threedifferent performance measures on a sample of 109 passive portfolios formed on the basis of securitycharacteristics as well as 279 mutual funds. The measures inciude the Jensen (JM), the Positive PeriodWeighting (PW), and the Treynor-Mazuy Total Performance (TM) Measures, The benchmari^s includethe Equaiiy-Weighted Index (EW), the Value-Weighted Index (VW), 10 Factor Portfolios (FIO), and eightCharacteristic-Based Portfolios (P8),

A. A Formal Test of the Similarity between Performance Measures

Table 5, which follows the same format as Table 3, formally tests whether mea-sures matter. It reports the intercepts and slope coefficients from cross-sectionalregressions of performance with one measure against perfonnance with anothermeasure. Time series F-statistics, analogous to those produced in Table 3, testwhether the intercept and slope coefficients in a particular regression are, respec-tively, zero and one. Each matrix in Table 5 corresponds to a benchmark. Rowsrepresent independent variables in the cross-sectional regression. Columns corre-spond to the dependent variables. Once again, both the regressions and the reverseregressions all seem to yield the same inferences.

The F-statistics are generally far below the critical F that demarcates the 5-percent significance level (most are close to zero). This indicates that differentmeasures generally yield the same performance scores. The only significant Fsconsistent with our discussion in the last section are those indicating a distinctionbetween the Treynor-Mazuy Measure and the other two measures with the P8benchmark.

B. When Do the Jensen and Positive Period Weighting Measures Differ?

This section has concluded that for most funds, the different measures pro-vide similar inferences. In particular, the Jensen Measure and the Positive PeriodWeighting Measure are virtually identical for most mutual funds, irrespective ofthe benchmark. The analysis presented in this subsection suggests that the ob-served similarity between these two measures arises because most mutual fundsfail to successfully time the market. The Positive Period Weighting and JensenMeasures could nonetheless differ for a few mutual funds that do time the mar-


TABLE 5

F-Tests for the Similarity between Performance Scores Using Different Measures

Benchmark: EWF-Test Degrees of Freedom: 2,117

Dependent Variable Measure

Benchmark: F10F-Test Degrees of Freedom: 2,108


IndependentVariable Measure

JM

PW

TM

7071F

7071F

7071F

JM

0.0000,9670,060

0,0010,9400,138

PW

-0,0011,0110,062

0,0000,9780.015

TM

-0,0011,0240,130

-0,0001,0190,015

JM

0,0000,9960,115

0,0010.9210,513

PW

-0,0000,9750,211

0,0000,9220.253

EW

-0,0011,0430,502

-0.0001,0660,223

Benchmark: VWF-Test Degrees of Freedom: 2,117

Dependent Variabie Measure

Benchmark: P8F-test Degrees of Freedom: 2,110


IndependentVariable Measure JM PW TM

JM

PW

TM

7071F

7071F

7071F

0,0001,0030,0020,0001,0040,005

JM

0,0000,9960.002

0,0001,0010,001

-0,0000,9950,005

-0,0000,9990,001

PW

-0,0000,8880,428

0,0020,9878,103*

EW

0,0001,0740,208

0,0031,0847,956*

-0.0020.9607,367*

-0,0020,8728,289*

This table the l), slope, — , .. .w ..,.w<ww^... \ HJ,. .j..jfj... vw\,.iii\^nii ii<> ('yi), and time series F-tests (descr ibed inAppendix C) from cross-sectional regressions of performance with one measure against performancecalculated with a different measure but using the same benchmark. The measures include the Jensen,the Positive Period Weighting, and the Treynor-Mazuy Total Performance Measures, These tests arecalculated separately using four benchmarks: the Equaiiy-Weighted index (EW), the Value-WeightedIndex (VW), 10 Factor Portfolios (FIO), and eight Characteristic-Based Portfolios (P8).

*Significant at 0,05 level,**Significant at 0,01 level,

ket. This would indicate that, for some purposes, employing the Positive PeriodWeighting Measure in lieu of the Jensen Measure could still be worthwhile.

To test whether, for some mutual funds, the difference between the Jensenand Positive Period Weighting Measures arises from timing, we regress this differ-ence against the Treynor-Mazuy Timing Measure: the product of the coefficientof the Treynor-Mazuy quadratic term times the variance of the benchmark port-folio (or the ex post efficient combination of the portfolios in benchmark). Thiscross-sectional regression, which is reported in Table 6, documents a statisticallysignificant relation between these variables.'"* The significance is robust withrespect to the benchmark used.

'''The time series procedure for calculating the r-statistics in this table is described in Appendix C.

Grinbiatt and Titman 433

TABLE 6

How Timing Performance Relates to the Difference betweenthe Positive Period Weighting and Jensen Measures

Siope CoefficientTime Series (-Statistic

EW

0.1945,84**

VW

0,025

Benchmark

FIO

0.2043,45**

P8

0.1682,22*

Using the sample of 279 mutual funds, this table reports the slope coefficients and the time series(-statistics (described in Appendix C) of regressions of the difference between the Positive PeriodWeighting Measure, and the Jensen Measure on the Treynor-Mazuy Timing Measure, These tests arecalculated separately using four benchmarks, the Equally-Weighted Index (EW), the Value-WeightedIndex (VW), 10 Factor Portfolios (FIO), and eight Characteristic-Based Portfoiios (P8).

*Significant at 0.05 level.•'Significant at 0,01 level,

VI. The Power Issue: Fund Characteristic Tests

Grinblatt and Titman (1989a) used joint intercept tests to document significantdifferences in the performance of mutual funds. These tests, however, offer no ev-idence about the determinants of these differences. Regressions of cross-sectionalperformance measures on fund characteristics may offer more powerful and moreinteresting tests of performance against certain altemative hypotheses. They mayalso offer insights into how inferences about the determinants of perfonnance areaffected by the benchmark choice.

Five fund characteristics reported in the 1975 edition of The WiesenbergerInvestment Companies Service are used in these regressions:'^

1) Net Asset Value in millions of dollars on December 31, 1974.2) Load, computed at the end of 1974, which is the range of sales charges per

dollar investment in the fund, in percent terms.3) Expense ratio (%), which is the fee-inclusive expenses of the fund divided by

average net asset value over the fiscal year prior to 1975.4) Tumover (%), which is the minimum ofthe total market value of purchases or

sales (excluding transactions in U.S. govemment securities) divided by averagenet asset value in the fiscal year prior to 1975.

5) Management fee (%), which is the schedule of (annualized) fees charged forvarious net asset values. We took the fee percentage that is relevant for theDecember 31, 1974, net asset value.

The analysis of how perfonnance relates to these characteristics is based ontwo benchmarks, the reliable P8 and the unreliable FIO benchmark. The latterbenchmark is included to illustrate how benchmark inefficiency can lead to spuri-ous inferences about the determinants of true performance in studies of this kind.Table 7 presents a multiple regression of Jensen Measure performance on net assetvalue, a load dummy, expenses net of fees, tumover, and fees. A load dummyis used instead of the load itself because for some of the funds, the load charges

sixth characteristic reported by Wiesenberger (1975). cash yield, which is cash divided bynet asset value, was not analyzed because it is highly related to fund objective. The relation betweenfund performance and fund investment objective has already been documented in Grinblatt and Titman(1989a).


depend on the amount invested and whether one is a new or old customer. The re-gression also includes dummy variables to control for investment objective, sinceprior work has shown that investment objective is related to performance.

These regressions provide insights into whether differences in performancebetween funds are due to differences in transaction costs. If transaction costs areimportant, we expect significant negative estimates for the fees, expenses, andtumover coefficients, and significant positive estimates for the net asset value andload coefficients, since large funds or funds whose marketing expenses are bome bybrokers collecting load-related commissions may economize on transaction costs.One might also observe a negative coefficient on net asset value if small funds haveless impact on the market with their buy and sell orders than do large funds. Thecoefficient on net asset value may also provide insights about survivorship bias inthe sample, which could induce a negative correlation between net asset value andperformance. Positive coefficients on fees and expenses would be indicative oftheexistence of perfonnance. If investors were aware that a fund manager was capableof eaming abnormal retums, they would be willing to compensate the managerwith higher fees and expenses (which might provide nonpecuniary benefits). Apositive coefficient on tumover would also be indicative ofthe existence of superiorperformance, implying that better managers trade more to take advantage of theirsuperior information.

A. Calculating Appropriate f-Statistics and F-Statistics

Statistical significance cannot be inferred from the ordinary /-statistics de-rived from cross-sectional regressions of fund perfonnance on fund characteris-tics. These ̂ statistics are biased because the residuals are correlated across mutualfunds. For this reason, the /-statistics we report are derived from a time series pro-cedure that is an extension of the procedure used by Fama and MacBeth (1973)and Sefcik and Thompson (1986). Similar time series F-statistics test whether thefive fund characteristics (but excluding the investment objective dummies), jointlyexplain performance. These time series procedures are described in Appendix C.

B. Ennplrical Results

The results of two multiple regressions are reported in Table 7. The re-gressions with the factor benchmark indicate that (controlling for all the othervariables) there is no statistically reliable relation between the performance scoresand expense ratios or net asset values, but there is a significant relation betweenperformance and both management fees and load. However, these are most likelyto be due to the inefficiency of the factor benchmark.

A coefficient of -0.000833 (1/1200) on the fee variable is consistent with feesmerely reducing performance by the amount ofthe fee. For the P8 benchmark, themarginally insignificant fee coefficient is less than 1.3 standard deviations awayfrom —0.000833. For the FIO benchmark, the estimated fee coefficient is morethan 3.5 standard deviations away from this value. While it is possible that low feemanagers are superior portfolio performers, one would most likely have expectedthe opposite result. Moreover, if low fee managers are superior performers, one


TABLE 7

Cross-Sectional Slope Coefficients for Two Multiple Regressionswith Fund Characteristics as the Regressors

Characteristic

NetAsset Expense Mgt, Fund Load F-Stat, No, of Funds

Benchmark Value Ratio Fee Turnover Dummy (p-Value) in Regression

FIO -6 ,96E-7 4,15E-4 -4 ,37E-3 1,32E-5 8,32E-4 5,87 209(-1,60) (0,719) (-4.40**) (2,29*) (2,50**) (0,000**)

P8 -4 .03E-7 -4 .08E-4 -2 ,55E-3 1,31E-5 6,15E-5 2,40 209(-0,84) (-0.67) (-1,99*) (2,46*) (0,16) (0,042*)

This table presents the five slope coefficients and time series f-statistics for the cross-sectionai regres-sion of mutuai fund performance on net asset vaiue, expense ratio, management fee, fund turnover, aload dummy, and six dummy variables for Investment objective. To calculate performance, we use theJensen Measure with both the eight characteristic benchmark (P8) and the 10 factor portfolio bench-mark (FIO), The time series (-statistics (described in Appendix C) are beiow the siope coefficients inparentheses. Time series F-statistics and significance ievels (p-vaiues) are aiso reported. These testwhether the five slope coefficients for the noninvestment objective variables are jointly zero,

* Significant at 0.05 level,**Significant at 0.01 levei,

would not expect their performance to be on the order of 50 basis points per yearfor every 10 basis point reduction in the fee. Yet, this is what the FIO fee coefficientimplies.

Our conclusions about benchmark inefficiency being responsible for the loadcoefficient with the factor benchmark are largely driven by heretofore unreportedwork. In this work, we found that the "load portfolio" (long in load funds and shortin no load funds) is significantly negatively correlated with the retums of large firmsand firms with high dividend yields, and significantly positively correlated withlow dividend yield firms. Hence, the significant relation between perfomiance andload may have been caused by the negative large firm bias and positive dividendyield bias of the factor benchmark (see Grinblatt and Titman (1988)).

Table 7 also reports significant positive relations between portfolio tumoverand performance. The time series r-statistics were 2.46 with the P8 benchmark,and 2.50 for the factor benchmark. The evidence from the P8 benchmark, whichhas no apparent performance bias, is inconsistent with the null hypothesis of noperfonnance ability. Under this hypothesis, there should be a negative relationbetween tumover and measured performance when trading costs are netted outof retums. In addition, the time series F-statistic for the joint significance ofthe five characteristics with the P8 benchmark is at the margin for the 5-percentsignificance level cutoff, indicating that multiple comparisons are not responsiblefor our finding that fund characteristics, particularly tumover, may be importantdeterminants of performance.

The tumover coefficient for the P8 benchmark implies that a strategy of buyinghigh tumover funds and shorting low tumover funds achieves positive risk-adjustedretums. Note, however, that investors could not have eamed these abnormal re-tums, since low tumover open-end mutual funds cannot be sold short. For thisreason, using the P8 benchmark, we examined portfolio strategies that consisted ofequally-weighted portfolios of either the top 20 percent or the bottom 20 percent


of mutual funds ranked on these characteristics.'* The results from these strate-gies, reported in Table 8, represent abnonnal retums that an investor could haveeamed. They indicate that the positive relation between tumover and performancewith the P8 benchmark are due to both high tumover funds doing well and to lowtumover funds doing poorly. The abnormal retum for the high tumover portfoliois about 0.8 percent per year and low tumover portfolio's performance is about-1.3 percent per year. '̂ The positive performance of the high tumover portfoliosis largest for the aggressive growth high tumover funds, about 2.8 percent per year.

There may have been a profit opportunity for mutual fund investors whobought high tumover funds. However, there is not enough statistical power todetermine it with these tests. For example, if we break the sample of funds downby investment objective and examine the pairs of extreme tumover funds, thereare no significant f-statistics in any investment objective category. As a generalrule, this is true for the other fund characteristics as well. Other than the low feeaggressive growth funds (at a significance level that does not survive a multiplecomparisons hurdle), there are no significant f-statistics for the performance offunds with extreme amounts of any characteristic.

The differences between the extreme portfolios in Table 8 represent an altema-tive functional form for the relation between these characteristics and performance.These numbers, reported in the rows labeled "difference" in Table 8, are thus re-lated to those in Table 7. With the P8 benchmark, the differences between therisk-adjusted retums of the extreme tumover portfolios are significant, support-ing the findings of the cross-sectional regressions. However, the extreme tumoverportfolio test does not appear to have enough power to determine whether tumoveris an important determinant of performance for subgroups of funds with the sameinvestment objective.

The Table 8 tests appear to have enough power to conclude that fund charac-teristics matter as a group, even amongst funds with the same investment objective.The F-statistics in the rightmost column, which test whether the four portfoliosgrouped by fund characteristic have zero performance, are significant for both theaggressive growth and growth income categories. Although we observed threeinvestment objective categories to arrive at two significant ones, the F-statistic of4.05 for the growth income category has a significance level of 0.004, which issignificant at the 5-percent level after adjusting for the multiple comparison withthe Bonferroni inequality.'^

'*The analysis of funds with extreme loads was not carried out because, in 1974. virtually all loadfunds had the same maximum load of 8.5 percent.

'^The top 10 percent of funds, ranked by tumover, have performance that is about twice as large,or about 1.5 percent per year.

'*We also examined f-statistics from extreme portfolios reconstructed each year using character-istics from the previous year rather than characteristics from the 1975 edition of Wiesenberger. Incontrast to the portfolios tested in Table 8, the weights of these portfolios change over time as thefund characteristics change. These dynamic investment strategies in mutual funds have about the sameabnormal retums as the static strategy that is based on the 1975 characteristics.


TABLE 8

Jensen Measures of Equally-Weighted Portfolios of Funds for Extreme Deciles of FundsSorted by Four Characteristics*

Sample Criterion

All Funds Top 20%

Bottom 20%

Difference

Aggressive Growth Top 20%

Bottom 20%

Difference

Growth Top 20%

Bottom 20%

Difference

Growth Income Top 20%

Bottom 20%

Difference

NetAssetValue

-0.0004(-0.67)-0.0000

(-0.056)-0.0004

(-0.48)

-0.0000(-0.01)

0.0007(0.47)

-0.0007(-0.50)

-0.0002(-0.28)-0.0007

(-0.90)0.0005

(0.56)

-0.0003(-0.37)-0.0009

(-1.17)0.0007

(0.93)

Characteristic

ExpenseRatio

0.0005(0.60)

-0.0014(-2.38)

0.0019(2.39)

0.0029(1.83)

-0.0005(-0.38)

0.0034(2.64)"

-0.0013(-1.38)-0.0012

(-1.69)-0.0001

(-0.13)

0.0004(0.53)

-0.0013(-1.67)

0.0016(2.50)*

Mgt.Fee

-0.0005(-0.50)-0.0006

(-0.98)0.0001

(0.12)

0.0027(1.39)0.0031

(2.09)*-0.0004

(-0.34)

-0.0015(-1.58)

0.0001(0.13)

-0.0016(-1.73)

-0.0003(-0.31)-0.0009

(-1.19)0.0007

(0.72)

FundTurnover

0.0007(0.66)

-0.0011(-1.92)

0.0017(2.01)*

0.0024(1.20)

-0.0000(-0.00)

0.0024(1.45)

-0.0000(-0.01)-0.0008

(-1.11)0.0008

(0.79)

0.0003(0.31)

-0.0003(-0.40)

0.0006(0.65)

F-Test(p-Value)

3.61(0.008)**3.82

(0.006)**7.41

(0.000)**

1.83(0.128)3.53

(0.010)**2.98

(0.023)*

1.48(0.222)1.44

(0.225)1.71

(0.152)

0.52(0.822)1.35

(0.255)4.05

(0.004)**

This table presents the average monthly performance (in decimal form) and time series f-statistics formutual funds ranked in the top 20 percent and the bottom 20 percent in terms of net asset value, expenseratio, management fee, and fund turnover. Results for three subsamples based on a fund's investmentobjective classification are also reported. The difference between the performance of the top 20 percentand bottom 20 percent is also reported. To calculate performance we use the Jensen Measure with theeight characteristic benchmark (P8). The time series (-statistics are in parentheses below the abnormalreturn. The time series f-statistics are the standard intercept (-statistics from a regression of the returnsof equally-weighted portfolios (or the return difference between two equaliy-weighted portfolios) againstthe excess returns of the benchmark portfolio(s). In addition, this table reports time series F-tests andassociated significance levels (p-values). These test whether the performance of the four portfolios inthe row are jointly zero.

*Significantat0.05 level.**SignificantatO.O1 level.

VII. Conclusion

This study contains three contributions to the literature on ponfolio perfor-mance evaluation. First, it examines the sensitivity of performance inferences tobenchmark choice. Second, it compares the Jensen Measure with two new mea-sures that were developed to overcome the timing-related biases of the JensenMeasure. Finally, it analyzes whether fund performance is related to fund at-tributes.

We find that the choice of a benchmark can have a large effect on inferencesabout perfonnance. However, this does not mean that all results about mutualfund perfonnance are spurious. Rather, it means that care must be taken to avoidinefficient benchmarks. For instance, the mutual funds display strong negative per-


formance, on average, with either the equally-weighted index or the factor-basedbenchmark and virtually zero performance with the benchmark formed from se-curities characteristics. This difference is most probably due to the size-relatedbiases of the equally-weighted and factor-based benchmarks. In our sample pe-riod, the stocks of large tirms perfonn poorly relative to these benchmarks and,as a result, mutual funds, which on average purchase larger than average stocks,also perfonn poorly relative to those benchmarks. In addition to affecting averageperformance, benchmarks have a large effect on how funds perform relative toeach other. In particular, the correlation between the performance numbers gen-erated by the characteristics-based benchmark and those generated with the otherbenchmarks were low.

The different measures of performance that were examined in the paper, theJensen Measure, the Treynor-Mazuy Total Perfonnance Measure, and the Posi-tive Period Weighting Measure, displayed high cross-sectional correlations. Thissuggests that the concems of Jensen (1972), Admati and Ross (1985), Dybvig andRoss (1985), and Grinblatt and Titman (1989b) regarding a timing-related prob-lem in the Jensen Measure may not be important in practice since measures thateliminate this problem yield almost identical inferences." We believe, however,that the measures are similar because very few funds successfully time the market.In fact, the measures are significantly different for those funds that appear to havesuccessfully timed the market.

The latter part of the paper presented tests to examine the determinants ofmutual fund performance. These tests analyzed whether performance, as measuredby the only reliable benchmark, the P8 benchmark, is related to fund size, expenses,management fee, portfolio tumover, and load. We found that performance ispositively related to portfolio tumover, but not to the size of the mutual funds orto the expenses that the funds generate. This suggests that the funds that spendthe most on research and trade the most may, in fact, be uncovering underpricedstocks.

These results are related to the earlier work of Grinblatt and Titman (1989a),(1993) which found evidence of abnormal performance with the P8 benchmarkbased on hypothetical retums constructed from the funds' exchange-listed equityholdings. Because these hypothetical retums are computed without deducting feesand transaction costs, the results do not necessarily imply that investors could gainby buying shares in the funds. The evidence of abnormal performance found inthis paper may be more surprising since we are measuring actual retums net oftransaction costs.

Appendix A. The Positive Period Weighting Measure

The period weights used in this study can be interpreted as the marginalutilities of an investor with power utility. Since this utility function does not exhibit

"Ttie high correlations also imply that the Positive Period Weighting Measure is robust to our spec-ification ofthe weights. This is because the Jensen Measure and the Treynor-Mazuy Total PerformanceMeasure are Period Weighting Measures (without the nonnegativity constraint) that are based on utilityfunctions that differ substantially from the power utility function used to calculate the weights in thisstudy. For further discussion of this, see Grinblatt and Titman (1988). (1989b).


satiation, period weights derived from it satisfy the nonnegativity constraint. A riskaversion parameter of eight was chosen because it generated an optimal portfoliothat required almost no holdings of the risk-free asset. To calculate a set of weightsfor each of the six benchmarks, we:

1) Applied an algorithm that searched for the utility optimal combination of theportfolios in the benchmark and the risk-free asset, i.e., solved for 7, s.t.

7 = argmax{£(W)} = argmax | -17, (1 + ry, -1-

where Rn It is the excess retum of the benchmark. (In the case of multiple port-folio benchmarks, 7 is a row vector and R,, is a column vector of period t excessretums with element / corresponding to the /th portfolio in the benchmark).

2) Calculated the time series of gross retums of the optimal portfolio,

1 + ry, + 7^/,.

3) Interpreted the gross retums as wealth levels (i.e., WLOG set initial wealth toone for each observation), and calculated the marginal utility of this wealthlevel with the power function, i.e.,

/ - >. — 8

marginal utility at f = 7 (1 H- r/, -t- 7/?/,)

4) Rescaled the marginal utilities to be weights that sum to one,^° i.e.,

5) Computed performance as the dot product of the weight vector and the excessretum vector of the portfolio to be evaluated, i.e.,

PW = S,w,Rp,,

where PW = Positive Period Weighting Measure.

Since the first order condition for utility maximization requires that

E{U'i)R,) = 0

for the excess retum, R/, of each portfolio in the benchmark, the five-step proceduredescribed above derives weights that approximately satisfy the weighted excessretum condition

E,w,Ri, = 0.

The Positive Period Weighting Measure, like the Jensen Measure, is a linearweighting of retums. The standard f-statistic can thus be used to test whether itsignificantly differs from zero, when conditioned on the excess retums of the

weights are scaled to sum to one so that observed Positive Period Weighting Measures canbe interpreted as monthly abnormal retums.


portfolios in the benchmark. The test statistic, which has a r-distribution^' withT-K-1 degrees of freedom if there are T retums and K benchmark portfolios, is

where s = std error of the excess retum regression used to compute the portfolio'sJensen Measure.

The /-statistic formula above applies whether the test is based on the retumsof a single fund or the retums of a portfolio of funds. In Table 1, the retums of anequally-weighted portfolio ofthe respective funds (passive funds or mutual funds)are used to generate a f-test of whether average perfonnance across funds is zero.

Appendix B. The Treynor-Mazuy Quadratic Regression

The Treynor-Mazuy quadratic regression is

Rp = ap + PipR, + Pipk] -(- ip,

where Ri is the excess retum ofthe benchmark portfolio and Rp is the excess retumof the portfolio being evaluated. Jensen (1972) and Admati et al. (1986), amongothers, have analyzed the asymptotic properties ofthe two slope coefficients in theregression when the portfolio strategy has linear risk adjustments to timing signals.They noted that the second slope coefficient, which measures co-skewness with thebenchmark portfolio, is related to timing ability. In this special case, it is trivialto prove that the contribution of timing information to the excess retum of theportfolio is proportional to the coefficient on the quadratic term in large samples.The Treynor-Mazuy Total Performance Measure is defined to be

TM = ap-i-/32pVar(^,),

which, again, is easily demonstrated to be the added retum from superior infor-mation under the Admati et al. (1986) assumptions (i.e., exponential utility andmuitivariate normality).

The Treynor-Mazuy Measure, like the Jensen Measure, is a linear weightingof retums. The standard f-statistic can thus be used to test whether it is significantlydifferent from zero, when conditioned on the excess retums ofthe portfolios in thebenchmark. The test statistic, which has a /-distribution with T - K - I degreesof freedom if there are T retums and K benchmark portfolios is

TM/s(TM),

where i(TM) is the standard error of the Treynor-Mazuy total performance mea-sure.

To compute this standard error, we:

1) Computed s{e), the standard error of the regression from the excess retum re-gression used to compute the Jensen Measure for the portfolio being evaluated.

• '̂This assumes that the benchmark portfolios add up to a point on the efficient frontier of the 109test portfolios and that the test portfolios have normally distributed residuals.


2) Computed the variance-covariance matrix of the three coefficients in the qua-dratic regression, conditional on the benchmark excess retum.^^ This is

V = / ( e ) ( X ' X ) " ' ,

where X is the T x 3 matrix of regressors in the Treynor-Mazuy quadraticregression.

3) Computed .r(TM) as the square root of q'Vq, where the 1 x 3 row vector

q' = ( lOVar(^ , ) )

and Var(^/) is the variance of the benchmark portfolio's excess retum. (In thecase of multiple portfolio benchmarks, we employ retums from the ex postefficient combination of the portfolios in the benchmark.)

The /-statistic formula above applies whether the test employs the retums of asingle fund or the retums of a portfolio of funds. The ^-statistics reported in Table 1for the passive portfolios and mutual funds use the retums of an equally-weightedportfolio of the respective funds to detemiine whether average performance acrossfunds is zero.

Appendix C. Time-Series Test Statistics for Tables 3, 5, 6,and 7

Given N funds, leta = N X I vector of "performance" (for Table 6, the difference in per-

formance with two measures),

R = T X N matrix with the entry in row t and column n comprised offund n's excess retum in month t.

By assumption, the row vectors of R are i.i.d. normal conditional on the excessretums of the portfolio(s) in the benchmark. We note that

(A-1) a = R'w,

where w is a 7 x 1 weight vector with elements that sum to one (zero for Table 6)and depend solely on the benchmarks' excess retums and the measure used.

LetA = N X K matrix comprised of a column of ones and ^ — 1 columns

of fund attributes, denoted ak,k= l,...,K - 1.

Mo

= KxN partitioned matrix = (A'A)"'A'.

M K - 1

compute the standard error of the regression conditional only on the excess retum of thebenchmark, but do not additionally condition on the squared excess retum to permit fair comparisonswith the Positive Period Weighting and Jensen Measures.

442 Journal of Financiai and Quantitative Anaiysis

(A-2) = T x l vector = = O,...,K-l,

Cjt - T X I residual vector from regressing Rpk on the benchmark port-

folio(s). ek, is its rth element.

Consider the cross-sectional regression.

K-i

a = 70 +

The coefficient

(A-3) -fk = Mka = MkR'w = RpkW, k = O,...,K-l

by Equations (A-1) and (A-2).The K X K covariance matrix of CQ,, . . . , e/^_i,, is

V = Var

which is the same for all t by the earlier i.i.d. assumption. Vy denotes the unbiasedestimate of element ij of this matrix and ^((,7) denotes the 0' - ' + 1) x 0 - ' + 1)matrix consisting ofthe unbiased sample estimate ofthe submatrix of V comprisedof rows and columns / through y. By Equation (A-3),

Var

7o

= (w'w)V.

To test whether 74 = 0, we use the test statistic,

Ikt{T - P - 1),

where P is the number of portfolios in the benchmark and

A joint test of whether 7* = gi, 'by the test statistic.

= g2,---, Ik+j-1 = gjj < K -k,is given

F =T-P-j

j(T -P- l)(w'v

Grinbiatt and Titman 443

Under our assumptions, F has a small sample central F-distribution with j andT — P -j degrees of freedom. The proof is a trivial extension of Gibbons, Ross,and Shanken (1989).

To apply these results to Tables 3 and 5, which regress performance withmethod c (denoted ac) against performance with method d (ad), we let the aabove be ac and let the fund characteristic, a\ be aj. We test whether 70 = 0 andand 7i = 1 in the F-test, which has A' = 1 and; = 2.

In Table 6, the dependent variable, a, is the difference between two measuresusing the same benchmark. One merely substitutes the difference between the twomeasures for a above. The weight vector, w, is now the difference between twoweight vectors. Since the test in Table 6 is a single coefficient restriction, the r-testdescribed above is the one that is used.

In Table 7, the dependent variable is performance. The right side variablesin the cross-sectional regression are a vector of ones (for the intercept), five fundcharacteristics, and a set of investment objective dummies. If we let a] through asin the discussion above be the fund characteristic vectors, it is straightforward toapply an F-test that examines whether ai = ^2 = • • • = ^5 = 0. In this test,y' = 5.

ReferencesAdmati, A.; S. Bhattacharya; P. Pfleiderer; and S. A. Ross. "On Timing and Selectivity." Journal of

Finance. 46 (July 1986). 715-730.Admati, A., and S. A. Ross. "Measuring Investment Performance in a Rational Expectations Equilib-

rium Model." Journal of Business. 58 (Jan. 1985), 1-26.Banz, R. "The Relationship between Return and Market Value of Common Stocks." Journal of Finan-

cial Economics. 9 (March 1981), 3-18.Black. E; M. Jensen; and M. Scholes. "The Capital Asset Pricing Model: Some Empirical Tests." tn

Studies in the Theory of Capital Markets, M. Jensen, ed. New York, NY: Praeger (1972).Brown, S. J.; W. Goetzmann; R. G. Ibbotson; and S. A. Ross. "Survivorship Bias in Performance

Studies." Review of Financial Studies. 5 (4, 1992), 553-580.Dybvig, P., and S. A. Ross. "Differential Information and Performance Measurement Using A Security

Market Line." Journal of Finance. 40 (June 1985), 383-399.Elton, E. J.; M. J. Gruber; S. Das; and M. Hlavka. "Efficiency with Costly Information: A Reinterpre-

tation of Evidence from Managed Portfolios." Review of Financial Studies. 6(1 , 1993), 1-22.Fama, E., and J. MacBeth. "Risk, Retum and Equilibrium: Empirical Tests." Journal of Political

Economy. 72 (May-June 1973), 607-636.Gibbons, M.; S. A. Ross; and J. Shanken. "A Test ofthe Efficiency of a Given Portfolio." Econometrica,

57 (Sept. 1989), 1121-1152.Grinblatt, M.. and S. Titman. "Mutual Fund Performance: An Analysis of Monthly Retums." Working

Paper, Univ. of Califomia, Los Angeles (March 1988)."The Evaluation of Mutual Fund Performance: An Analysis of Quarterly

Portfolio Holdings." Journal of Business. 62 (July 1989a), 394-415.. "Portfolio Perfomiance Evaluation: Old Issues and New Insights." Review of

Financial Studies. 2 (No. 3, 1989b), 396-422."The Persistence of Mutual Fund Performance." Journal of Finance. 47 (Dec.

1992), 1977-1984.. "Performance Measurement without Benchmarks: An Examination of Mutual

Fund Retums." Journal of Business. 66 (Jan. 1993), 47-68.Jensen, M. "The Performance of Mutual Funds in the Period 1945-1964." Journal of Finance, 23

(May 1968), 389-416."Risk, the Pricing of Capital Assets, and the Evaluation of Investment Portfo-

lios." Journal of Business. 42 (April 1969), 167-247._. "Optimal Utilization of Market Forecasts and the Evaluation of Investment

Portfolio Performance." In Mathematical Methods in Investment and Finance. Szego and Shell,eds. Amsterdam: North Holland (1972).


Lehmann, B., and D. Modest. "Mutual Fund Performance Evaluation: A Comparison of Benchmarksand Benchmark Comparisons." Journal of Finance, 42 (June 1987), 233-265.^ "The Empirical Foundations of the Arbitrage Pricing Theory." Journal ofFinancial Economics. 21 (Sept. 1988), 213-254.

Litzenberger. R., and K. Ramaswamy. "The Effects of Personal Taxes and Dividends on Capital AssetPrices: Theory and Empirical Evidence." Journal of Financial Economics.! Qnns 1979), 163-195.

"The Effects of Dividends on Common Stock Retums: Tax Effects or Infor-mation Effects?" Journal of Finance, 37 (May 1982), 429-443.

Reinganum, M. "Misspecification of Capital Asset Pricing: Empirical Anomalies Based on Eamings'Yields and Market Values." Journal of Financial Economics. 9 (March 1981), 19-46.

Roll, R. "Ambiguity when Performance is Measured by the Securities Market Line." Journal ofFinance. 33 (Sept. 1978), 1051-1069.

Sefcik, S., and R. Thompson. "An Approach to Statistical Inference in Cross-Sectional Models with Se-curity Abnormal Retums as the Dependent Variable." Journal of Accounting Research. 24 (Autumn1986), 316-334.

Treynor, J. "How to Rate Management of Investment Funds." Harvard Business Review, 44 (Feb.1965), 131-136.

Treynor, J., and F. Mazuy. "Can Mutual Funds Outguess the Market?" Harvard Business Review. 45(July-Aug. 1966), 131-136.

Wiesenberger, A. Investment Companies Service. New York, NY: A. Wiesenberger and Co. (1975).

Fund returnsandperformanceevaluationtechniques grinblatt

Economy & Finance

Transcript of Fund returnsandperformanceevaluationtechniques grinblatt