Sometimes on this site, I like to do some small analyses and modeling , and when I encountered this GDP Predictor, I wanted to play around with the data some and see what I could uncover. Unfortunately, the Excel spreadsheet doesn’t exist anymore, so I had to reconstruct the author’s dataset. I took this data and made a hierarchical categorization of nations based upon GDP per capita. These categories wound up having geographical and cultural meanings that went beyond the economic scope of the data.
While I used the link above for inspiration, I made some slight changes to the dataset. I included all of the sub-scores from the Fraser economic freedom index, which turned out to be very useful. I also used raw inputs, rather than converting everything to dollar values. My data included national IQ, Economic Freedom Index and its subscores: Size of Government, Legal System & Property Rights, Sound Money, Freedom to Trade Internationally, Regulation. EU/NAFTA membership, and oil production per-capita rounded out the data used for each of the 185 countries I used.
With this data in hand, I decided to make hierarchical clusters of nations with a greedy, tree-based algorithm. This procedure looks for the strongest division within the data and then recursively looks for strong divisions in the new groups. When the algorithm is presented with data from all of the nations, it forms two clusters that minimize the difference between GDP per capita for each country and the cluster that it belongs to. In order to judge which two clusters were the best choice, I tried every possible split that used only one variable.
Once the initial split was formed, this process was applied recursively on each cluster, until the splits stopped being meaningful. I defined a meaningful split as one where the errors from the two new clusters were at least 10% less than the initial cluster. This process allowed me to categorize nations that have similar GDPs for similar reasons.
One of the nice things about this method is it isolates the reasons for categorization. In the whole tree I’ve displayed below, there are no more than four variables that go into describing each group. Trees like this are generated automatically, but are pretty easily interpreted by humans compared with other fitting algorithms.
Before I go any further, I want to make a note about IQ. The original work that this is building on included IQ, so my inclusion is supported on that basis alone. However, I know that IQ is controversial, so I’ll briefly describe what the results look like without using IQ. Without IQ, there were four categories of countries: Oil-rich nations, NAFTA/EU nations, Former Pacific British Colonies (Australia, New Zealand, Hong Kong & Singapore) and everyone else.
Without further ado, here is the tree that was generated by my software.
That’s a bit too painful to look at all at once. There are three main areas of approximately equal complexity in the tree. The first are nations with an average IQ greater than or equal to 97.
In this high IQ sub-tree, there have been five categories formed. These categories are based mostly on the EFI Legal Systems & Property Rights Index with the exclusion of one separation due to soundness of money. The categories fit pretty well. There’s a Scandinavia, plus Asian City States, plus Australia & New Zealand, plus the Netherlands & Luxembourg category. There’s a category that includes large Western nations, plus thoroughly westernized countries like Japan and South Korea. Another category includes Greater China plus the Baltics and Hungary. The Austria/Belgium/Switzerland/Taiwan category fits as small westernized nations. North Korea stands on its own as a high-IQ nation with a very low GDP per-capita.
Moving on to the countries with either mid-ranged IQs or lots of oil, we find another five categories. The Islamic petro-monarchies form a category of their own due to each having at least 80 barrels of oil produced per person per year. Ireland stands on its own as a country which would have fit in with the Scandinavian category except for its lower IQ. Armenia & Georgia have low GDPs despite their IQ and economic freedom. The two big categories share the most heavily castizo South and Central American countries, the Eastern Mediterranean, the Balkans, Slavic nations, and some of Southeast Asia.
The final set of countries is also a large majority of the total. Puerto Rico, which isn’t really a country, but got included anyways since it was listed in the data, is a large outlier here with a large GDP per capita and no significant oil resources. As before, a major category here are second-tier oil-producing nations, which encompasses most of the remaining Latin American and Arab countries. The countries with the lowest measured IQs are all in sub-Saharan Africa and the Caribbean. These nations split on IQ and oil production. The final, and largest category are nations that don’t fit any of the other categories. This includes the Indian subcontinent as a whole along with Central American and island countries.
Looking at all of these categories, there are a few important insights to be gained. First, is that while high IQ separates the majority of wealthy nations from the rest, oil-producing nations are even richer. Second, Ireland, North Korea, Puerto Rico and the Caucasus are clear outliers with respect to their national IQ. Ireland and Puerto Rico for more wealth than their IQ would indicate and the others for their lack of wealth. Third, legal systems and property rights are the most important part of economic freedom for maximizing GDP per capita.
While creating a fit for GDP per capita with the tree was not the intention of this model, the model does work as a regression model. Below shows the tree-based fit to the GDP data.