The Core Factor Method
May 4, 2021

I arrived in 2011 knowing next to nothing about the fixed income markets, and was immediately put to work helping another colleague build the firm's attribution and risk models. The risk oversight system that we sought to replace fit inside a single Excel notebook. It took forever to open, and, having run up against Excel's row and column limits, could not be expanded or enhanced. For a firm rapidly growing assets and expanding its management capabilities, we had to move quickly.

We struggled with linear factor selection, the process of choosing drivers of risk and return that satisfied necessary statistical properties. Whatever we did, we had to assure portfolio managers and analysts that we were capturing factor return in a way that did not seem too engineered or mathematical, but reflected some underlying fundamental property of how their markets functioned.

So we pushed out thinking about constructing correlation and covariance matrices, the mathematical objects that capture interactions between these factors. And when we did finally turn our attention to covariance matrix construction, we assumed we would do the simplest possible thing: take a sample correlation matrix, diagonalize it, increase its near-zero or negative eigenvalues to some small \(\epsilon > 0\) while reducing large eigenvalues to preserve the trace, and reconstruct a modified correlation matrix.

While probably the most common technique to prepare correlation matrices for optimization---at least it was in 2011 when we were doing this work---the eigenvalues clipping approach suffers some very significant drawbacks. First, it modifies correlation matrices in ways that we might struggle to justify to a portfolio manager or analyst, or really anyone who can load two timeseries in Excel and compute a sample correlations and covariances themselves. Second, it creates matrices that continue to perform poorly in portfolio optimization. The mean variance portfolio is given by the product of the inverse of a covariance matrix and projected asset class returns. When inverting a covariance matrix, small, noisy eigenvalues \(\epsilon \) overwhelm better estimated large eigenvalues, and the eigenvectors of such eigenvalues contribute in unexpected ways to the optimal portfolio weights.

It's worth remarking that such small eigenvalues are guaranteed insignificant by Marchenko-Pastur. And indeed some set \(\epsilon\) to be greater than the Marchenko-Pastur upper bound. But this distorts the correlation matrix to an even greater degree and makes sanity checking individual correlations untenable, even between critical factor returns timeseries: 10 year Treasury returns and percentage changes in spreads, say.

We also considered shrinkage, the textbook approach to eliminating negative eigenvalues of correlation matrices. At least with shrinkage you know how you modify your correlation matrix. That is, chose some \(\alpha\) so that the constructed correlation correlation matrix has the form \(\alpha E+ (1 - \alpha)I_N\), where \(E\) is our sample correlation matrix, and \(I_N\) is the all-ones matrix of dimension \(N\times N\).

The principal problem with shrinkage techniques---all shrinkage techniques that I am aware of anyway---is that some correlations ought to be 0 and others strongly negative. The mask \(I_N\) however is an all-ones matrix, which is to say that in shrinking our sample correlation matrix we treat all correlations as if their true value is 1. That may be fine for equity- and spread-like asset classes, industries, and countries, but it absolutely will not capture interactions between rates and spreads, currencies and commodities, commodities and sectors, and others, precisely the interactions that we count on a computer to model better than we can in our own heads.

My colleague and I were doing this work in 2011 and 2012, just as rotationally invariant, optimal shrinkage techniques and been developed. A hybrid between standard shrinkage and eigenvalue clipping, these techniques ensure that the bilinear form induced by the cleaned matrix, when applied to each eigenvector of the sample matrix, approaches the eigenvalues of the true correlation matrix with convergence \(O(T^{-1/2})\). A nice result, to be sure, and one I would like to play with if I one day find the time (a future post?). But, what have we done to specific correlations between key factors?

Enter the core factor method. I thought for all the years I worked with him that my colleague had invented the technique. Later, when cleaning out my files, I discovered a Barclays POINT slide from 2011 that sketched it out. I was recently told that MSCI/Barra uses a version of the core factor method now. Certainly when we were mining their work for ideas, they used the then-standard eigenvalue clipping approach.

A sketch of a preliminary core factor method. From Barclays POINT "Factor Covariance Matrix Estimation," 2011.

I took my colleague's fragmentary memory of this slide, formalized the core factor method a bit more carefully, and produced a few interesting results that gave us great confidence in the cleaning technique. Here my notation will depart only slightly from Barclays's.

Assume representative core factors \(X = \{X_1, \cdots, X_M\}\) have each been observed \(T\) times so that \(X\) is an \(M\times T\) matrix. Regress our factor returns \(F = \{F_1, \cdots, F_N\}\) on \(X\) to find a best fit matrix \(B\) such that \begin{equation} F = BX + \Phi. \end{equation} \(F\) and \(\Phi\) are of course \(N\times T\). Calculate \(C_F^{\text{Core}} = C_{BX}\), the sample correlation matrix corresponding to \(BX\). The \(N \times N\) matrix \(C_F^{\text{Core}} \) has rank at most \(M \leq N\), so we have definitely not improved on \(C_F\), the sample correlation matrix we are trying to clean. At least not yet.

Now, partition our \(N\) uncorrelated factors into \(M\) sectors \(\mathcal{P}_m\), each with \(N_m = |\mathcal{P}_m|\) factors such that \(\sum_M^m N_m = N\). We are going to associate to each core factor \(X_m\) a sector \(m\), and with it \(N_m\) factors \(\{F_i\}_{i\in\mathcal{P}_m}\).

Take the low-fidelity matrix \(C_F^{\text{Core}} \) and substitute along its diagonal the correlation matrices corresponding to each sector \(m\) to create \(C_F^{\text{Estim}} \). In other words, \begin{equation} [C_F^{\text{Estim}}]_{i,j} = \left\{ \begin{array}{ll} \text{Cov}(F_i, F_j) & \text{if } i,j \in \mathcal{P}_m, \\ \text{Cov}([BX]_i, [BX]_j) & \text{if } i \in \mathcal{P}_m, j \in \mathcal{P}_{n}, m\neq n. \end{array} \right. \end{equation} The correlation matrix \(C_F^{\text{Estim}} \) is the final version of our cleaned correlation matrices.

It remains to show that \(C_F^{\text{Estim}} \) is of full rank, and that it sufficiently approximates the sample correlation matrix \(C\). As an estimator for the true correlation correlation matrix, however, \(C_F^{\text{Estim}} \) cannot be expected to perform.

This core factor method differs from Barclays's in some important ways. First, and most importantly, we do not assume that "a factor is correlated only to very specific factors from \(C\), imposing a structure on \(C_F^{\text{Core}} \)." Doing so, it can be shown, may cause our matrix \(C_F^{\text{Estim}} \)to become singular.

Second, the choice of core factors is not arbitrary. In fact, core factors must be chosen to achieve the properties we want from the matrix.

To be continued...

References
Bouchard, Jean-Philippe and Marc Potters. "Financial applications of random matrix theory: a short review." The Oxford Handbook of Random Matrix Theory. Oxford, 2011.
Ledoit, Olivier and Michael Wolf. "Nonlinear shrinkage estimation of large-dimensional covariance matrices." The Annals of Statistics,Vol. 40, No. 2 (2012): 1024–1060. http://ledoit.net/AOS989.pdf