Can machine learning algorithms improve strategies using asset class?

In-depth research conducted by EPFR’s quantitative team utilizing machine learning algorithms leads us to believe that the answer to this question is, yes, it is entirely possible!

Machine learning algorithms are increasingly important tools for quantitative researchers, allowing the analysis of larger data sets in order to make predictions or conclusions against defined goals.

Utilizing EPFR’s Country Allocations data, our goal was to determine how machine learning algorithms would cluster the data based on the distribution of a fund’s country allocations.


Machine learning clustering algorithms

During the process of answering the question, several machine learning algorithms were tested. The best performing one — for our purposes — was the K-Means algorithm.

K-Means is an unsupervised learning algorithm which is unique, in that its use does not require the subject data to be split into two groups for testing and training. Instead, this algorithm starts by randomly selecting data points for k number of clusters. The algorithm then calculates the centroid of each group and re-clusters the data points based on its shortest Euclidean distance to each centroid. The algorithm runs as many iterations of this step as needed until there is no more change in the groupings of data points. The final output of this algorithm is k clustered groups based on similarities.

In order to start using this method, the optimal (k) number of groups needs to be determined. Chart 1 is a graph of the Total Within Sum of Squares (TWSS) for k, ranging from 1 to 20, groups. Choosing the smallest value of k that has the smallest TWSS will be the most optimal (k) number of clusters. Therefore, by referring to Chart 1, we can determine that the ideal k value is 3 clusters.


K-Means and the Power of Three

By applying the K-Means Algorithm to EPFR’s Country Allocations data using the optimal k, the desired output of 3 distinct clusters of data were formed based on a fund’s country allocations.

When analyzing the individual data points assigned to each cluster, very clear patterns emerge. For example, Table 1 shows the relationship between each cluster and the asset classes of the funds within it.

Some of the trends appearing in the first group are a high count of Global and Europe mandated funds, the second group has strictly Global-dedicated funds, and the third has a mix of funds primarily with Emerging Markets mandates.

This trend was strengthened when investigating the mean country allocations for each of these three groups, which are shown in Table 2.

When looking at the first group’s mean allocations we expected to see a high percentage within funds with Global or European mandates. The highest mean country allocations are from France, Germany, the United Kingdom, Switzerland, and the Netherlands, which gives confidence in assigning the name “Developed Global – ex US” to this first group.

When evaluating the contents within group two, we expected that Global funds would make up the entirety of the cluster. But the mean country allocations of this group were dominated by the US. As a result, we assigned the name “Global – US” to this group.

Finally, with group 3 we expected to see a mix of Emerging Markets countries with high allocations, which turned out to be true. Therefore, group 3 was given the label “Emerging Markets”.


Is four better than three?

When evaluating what happens when the data is split into four clusters (k=4) through this same clustering algorithm, the trend results are predominately the same.

The only difference in the way the data is split when using k = 4, is that our Developed Global ex-US group, is split into two (Group 1, 2 in Table 4).

Since there is considerable overlap between our new groups 1 and 2, there is no significance in adding another cluster to our strategy and thus we choose to continue to use the defined 3 groups.


Strategy Uses

Continuing to explore the three dominant groups of geographic focuses that emerged from the k=3 analysis – Developed Global – ex US, Global – US, and Emerging Markets – the next step is to apply a strategy to the data.

Looking at these groups in tandem with daily data (dating back to 2008), promising returns were yielded using EPFR’s Flow-Percentage Country Strategy. This strategy uses daily percentage flows compounded over a trailing 20-day period, which are then ranked. The weekly results of using this strategy are shown in Table 5.

Table 5 shows positive average weekly returns. In addition, Chart 3 shows that these groupings have largely shown positive returns over the years. This is particularly evident during the last four years.

Comparing these results to the Emerging Markets/Developed Markets switch strategy shown in Chart 4, there are mostly negative returns, especially in the most current years.

Although the Sharpe ratio (0.14) using Developed Global – ex US, Global – US, and Emerging Markets geo focuses is not particularly high in this case, this strategy still shows promise based on the better returns it has delivered in recent years (shown in Chart 5).

Another option is to use the same strategy but utilize data from passive funds only. The average weekly returns are shown in Table 6.

The Sharpe ratio for passive funds only (0.20) is better than for all funds. But comparing the yields of all three strategies (see Chart 6) clearly shows the best returns come from using all funds.


Case proven

In conclusion, machine learning algorithms proved very helpful in developing and strengthening our knowledge because of their value in classifying information which can either confirm or reveal a new cluster of data.

Specifically, using EPFR’s Country Allocations data, we were able to confirm that EPFR’s existing geographic focus labels accurately reflect what is in the data, and were able to isolate an additional dominant geo focus group (of Global – US) that could be pulled from the data.

Using this new information gathered from the K-Means algorithm led to an improved strategy. Since machine learning is particularly useful for grouping data into categories, this concept could be explored further with other data sets.

Did you find this useful? Get our EPFR Insights delivered to your inbox.

Related Posts

When ETF flows confound expectations

When ETF flows confound expectations

From time to time, EPFR’s clients alert us to anomalous flows into exchange traded funds (ETFs) that occur on a specific day and for a specific fund. Given our awareness of these types of flows, and the granularity of our databases, EPFR’s quant team decided it was high time they dove into our ETF database and conducted a systematic analysis of these events.

Comfort with China exceeds $1 trillion

Comfort with China exceeds $1 trillion

At the turn of the century, investing in China was viewed as a risky proposition. Foreign access to a notoriously volatile, retail-driven equity market was heavily restricted. The lack of a credible regulatory framework and legal protections deterred US venture capitalists from making direct investments in Chinese companies. In many cases, Chinese banks and the country’s fledgling private equity industry also balked. So, when Chinese technology firm Alibaba received its first $25 million investment from Goldman Sachs in 1999, investors sat up and took notice.

Better, More Actionable Insights

Let us show you how EPFR can create value for your specific strategy


*Indicates required fields

By ticking this box, you agree to receive marketing communications from EPFR. You can review your email preferences upon submitting this form