Tonight I decided it would be fun to build my own US Presidential Election prediction model. The goal of this is to show you how these things are built at their most basic level, and to show their limitations.

With those caveats, the goal for the model itself was to make it simple, something which most people can understand. All my source code will be included in the post. It’s only 50 lines of MATLAB code, I don’t mind if you steal it.

The model is pretty simple, and I know it’s got all sorts of things it doesn’t take into account and that there are fundamental limitations to this type of approach. It took me more time to write this article than to make the model, for what its worth.

For input data, I made a text file that consisted of each state’s: name, electoral college votes, poll average for Clinton and poll average for Trump. I got these poll averages from RealClearPolitics, so take that as you will.

Adding on top of that, I used historical data from this BBC article that popped up high in a Google search to say that, on average, national polls were off about 6% from election outcomes this far out from election day. (NB: I converted this to a standard deviation in my code). If you’ve read Taleb, this step is probably where he’d start laughing, but we’ll move forwards anyways.

I also have added a random 5% shift for each state that is independent from the national trend. I made that number up. As always, being clear about your input data is important because of the GIGO rule.

Those are all the inputs to the model. Using these inputs, I do one million runs of the election. It takes 2.35 seconds to run them all.

Here is the procedure for each run:

- I choose a random national shift in the polls.
- For each state, I choose a random local shift to the poll numbers.
- Each candidate’s poll numbers for a state get shifted by the sum of the national and local shift.
- Whoever has the highest total value after the random shifts wins the state for that run.

Before I get into the results, I want to discuss the limitations of this modeling compared to more well-thought out versions. First, I’m not doing anything special to improve my poll averages. Second, I don’t account for changes to underlying demographic groups and how changes at the level effect states as a whole. Third, I don’t account for the polling in some states being better than the polling in others. These are pretty big things that would effect the output of this model and make it more accurate.

All of that said, as we get closer to election time, the accuracy of this forecast will probably converge with “expert” opinion anyways.

Once everything is finished executing, the main script takes these results and computes the output data. Here’s what it found.

- 35.9% chance of a Trump victory, which is not an unreasonable number given current polls
- Trump’s most likely electoral vote total right now is 216.

Here is a histogram of the outcomes of all 1 million runs. Red line is 270 electoral college votes for Trump. I also include a table of each state and how often Trump won it.

State |
Frequency of Trump Win |

‘Alabama’ | 100% |

‘Alaska’ | 74% |

‘Arizona’ | 55% |

‘Arkansas’ | 92% |

‘California’ | 8% |

‘Colorado’ | 39% |

‘Connecticut’ | 25% |

‘DC’ | 0% |

‘Delaware’ | 24% |

‘Florida’ | 50% |

‘Georgia’ | 61% |

‘Hawaii’ | 0% |

‘Idaho’ | 94% |

‘Illinois’ | 17% |

‘Indiana’ | 74% |

‘Iowa’ | 62% |

‘Kansas’ | 79% |

‘Kentucky’ | 74% |

‘Louisiana’ | 81% |

‘Maine’ | 31% |

‘Maryland’ | 1% |

‘Massachusetts’ | 5% |

‘Michigan’ | 35% |

‘Minnesota’ | 35% |

‘Mississippi’ | 74% |

‘Missouri’ | 71% |

‘Montana’ | 100% |

‘Nebraska’ | 100% |

‘Nevada’ | 56% |

‘NewHampshire’ | 36% |

‘NewJersey’ | 18% |

‘NewMexico’ | 28% |

‘NewYork’ | 11% |

‘NorthCarolina’ | 45% |

‘NorthDakota’ | 100% |

‘Ohio’ | 55% |

‘Oklahoma’ | 96% |

‘Oregon’ | 28% |

‘Pennsylvania’ | 32% |

‘RhodeIsland’ | 41% |

‘SouthCarolina’ | 69% |

‘Tennessee’ | 74% |

‘Texas’ | 70% |

‘Utah’ | 86% |

‘Vermont’ | 6% |

‘Virginia’ | 40% |

‘Washington’ | 13% |

‘WestVirginia’ | 95% |

‘Wisconsin’ | 40% |

‘Wyoming’ | 100% |

In any statistical modeling, it is important to understand the effects of your chosen input parameters. I varied the national and local variability effects and observed the changes. Trump’s odds of victory were between 18% and 41% for increasing amounts of variability. His most likely electoral vote score hovered in the 200-240 range.

Below is all of the MATLAB source code I used to generate this. Most of it should be compatible with various free versions of MATLAB like Octave and FreeMat. Also, this is all better formatted on my computer, copy-paste took out all my tabs.

%% CQW’s Open Election Model

% 9/21/2016 – Caleb Q Washington

tic;

paramFile = ‘electionModelInputs.csv’;

nationalVariability = 0.06*sqrt(2/pi);

stateVariability = 0.05;

numberOfRuns = 1e6;

output = electionModel(paramFile,nationalVariability,stateVariability,numberOfRuns);winOdds = mean(sum(output.electoralCollege,1) > 270); % fraction of trump wins

statePct = mean(output.outcomes,2); % fraction of trump wins by state

minElecVotes = min(sum(output.electoralCollege,1)); % minimum trump electoral votes

maxElecVotes = max(sum(output.electoralCollege,1)); % maximum trump electoral votes

toc

hist(sum(output.electoralCollege,1),50);

hold on;

line([270 270],[0 3.5e4]);

xlabel(‘Electoral College Votes for Trump’)

ylabel(‘#/1,000,000’);

function output = electionModel(paramFile,stdOvr,stdLoc,N)

% param file is a text file of format:

% StateName,ElectoralVotes,ClintonPoll,TrumpPoll

%

% stdOvr is the standard deviation between polls and results, common to all

% states, and represents the national shift in polls between now and election day.

%

% stdLoc is the local standard deviation between polls and results,

% potentially different in each state. It represents states changinge more

% or less than the national change

%

% N is the number of iterations to run

%

% CQW 09/21/2016% open param file and read it into cell array C

fid = fopen(paramFile);

C = textscan(fid,’%s%f%f%f’,’Delimiter’,’,’);

fclose(fid);% take cell array and turn into vectors

stateNames = C{1};

electoralVotes = C{2};

candidate0Poll = C{3}; %Clinton

candidate1Poll = C{4}; %TrumpNstate = length(stateNames);

outcomes = zeros(Nstate,N);for n = 1:N

nationalShift = stdOvr*randn;

for m = 1:Nstate

totalShift = nationalShift + stdLoc*randn; % amount to shift poll

cand0 = candidate0Poll(m) + totalShift; % Clinton result

cand1 = candidate1Poll(m) – totalShift; % Trump result

if cand1 > cand0

outcomes(m,n) = 1; % mark Trump wins with a 1

end

end

endelectoralCollege = outcomes.*repmat(electoralVotes,1,N);

output.outcomes = outcomes;

output.electoralCollege = electoralCollege;

end

State | Votes | Clinton Poll | Trump Poll |

Alabama | 9 | 0 | 1 |

Alaska | 3 | 0.30 | 0.39 |

Arizona | 11 | 0.40 | 0.416 |

Arkansas | 6 | 0.325 | 0.52 |

California | 55 | 0.51 | 0.317 |

Colorado | 9 | 0.427 | 0.39 |

Connecticut | 7 | 0.475 | 0.3825 |

DC | 3 | 1 | 0 |

Delaware | 3 | 0.42 | 0.32 |

Florida | 29 | 0.45 | 0.45 |

Georgia | 16 | 0.415 | 0.455 |

Hawaii | 4 | 1 | 0 |

Idaho | 4 | 0.23 | 0.44 |

Illinois | 20 | 0.43 | 0.30 |

Indiana | 11 | 0.36 | 0.45 |

Iowa | 6 | 0.387 | 0.430 |

Kansas | 6 | 0.345 | 0.4575 |

Kentucky | 8 | 0.36 | 0.45 |

Louisiana | 8 | 0.375 | 0.495 |

Maine | 4 | 0.438 | 0.37 |

Maryland | 10 | 0.603 | 0.270 |

Massachusetts | 11 | 0.55 | 0.32 |

Michigan | 16 | 0.445 | 0.393 |

Minnesota | 10 | 0.442 | 0.39 |

Mississippi | 6 | 0.41 | 0.50 |

Missouri | 10 | 0.383 | 0.460 |

Montana | 3 | 0 | 1 |

Nebraska | 5 | 0 | 1 |

Nevada | 6 | 0.42 | 0.44 |

NewHampshire | 4 | 0.437 | 0.387 |

NewJersey | 14 | 0.495 | 0.370 |

NewMexico | 5 | 0.41 | 0.33 |

NewYork | 29 | 0.508 | 0.338 |

NorthCarolina | 15 | 0.448 | 0.430 |

NorthDakota | 3 | 0 | 1 |

Ohio | 8 | 0.432 | 0.450 |

Oklahoma | 7 | 0.29 | 0.53 |

Oregon | 7 | 0.405 | 0.325 |

Pennsylvania | 20 | 0.468 | 0.402 |

RhodeIsland | 4 | 0.44 | 0.41 |

SouthCarolina | 9 | 0.3967 | 0.4667 |

Tennessee | 11 | 0.35 | 0.44 |

Texas | 38 | 0.378 | 0.450 |

Utah | 6 | 0.24 | 0.39 |

Vermont | 3 | 0.43 | 0.215 |

Virginia | 13 | 0.443 | 0.408 |

Washington | 12 | 0.46 | 0.3050 |

WestVirginia | 5 | 0.305 | 0.53 |

Wisconsin | 10 | 0.435 | 0.400 |

Wyoming | 3 | 0 | 1 |