CQW’s Simple Election Model

Tonight I decided it would be fun to build my own US Presidential Election prediction model. The goal of this is to show you how these things are built at their most basic level, and to show their limitations.

With those caveats, the goal for the model itself was to make it simple, something which most people can understand. All my source code will be included in the post. It’s only 50 lines of MATLAB code, I don’t mind if you steal it.

The model is pretty simple, and I know it’s got all sorts of things it doesn’t take into account and that there are fundamental limitations to this type of approach. It took me more time to write this article than to make the model, for what its worth.

For input data, I made a text file that consisted of each state’s: name, electoral college votes, poll average for Clinton and poll average for Trump. I got these poll averages from RealClearPolitics, so take that as you will.

Adding on top of that, I used historical data from this BBC article that popped up high in a Google search to say that, on average, national polls were off about 6% from election outcomes this far out from election day. (NB: I converted this to a standard deviation in my code). If you’ve read Taleb, this step is probably where he’d start laughing, but we’ll move forwards anyways.

I also have added a random 5% shift for each state that is independent from the national trend. I made that number up. As always, being clear about your input data is important because of the GIGO rule.

Those are all the inputs to the model. Using these inputs, I do one million runs of the election. It takes 2.35 seconds to run them all.

Here is the procedure for each run:

  1. I choose a random national shift in the polls.
  2. For each state, I choose a random local shift to the poll numbers.
  3. Each candidate’s poll numbers for a state get shifted by the sum of the national and local shift.
  4. Whoever has the highest total value after the random shifts wins the state for that run.

Before I get into the results, I want to discuss the limitations of this modeling compared to more well-thought out versions. First, I’m not doing anything special to improve my poll averages. Second, I don’t account for changes to underlying demographic groups and how changes at the level effect states as a whole. Third, I don’t account for the polling in some states being better than the polling in others. These are pretty big things that would effect the output of this model and make it more accurate.

All of that said, as we get closer to election time, the accuracy of this forecast will probably converge with “expert” opinion anyways.

Once everything is finished executing, the main script takes these results and computes the output data. Here’s what it found.

  • 35.9% chance of a Trump victory, which is not an unreasonable number given current polls
  • Trump’s most likely electoral vote total right now is 216.

Here is a histogram of the outcomes of all 1 million runs. Red line is 270 electoral college votes for Trump. I also include a table of each state and how often Trump won it.

20160921

State Frequency of Trump Win
‘Alabama’ 100%
‘Alaska’ 74%
‘Arizona’ 55%
‘Arkansas’ 92%
‘California’ 8%
‘Colorado’ 39%
‘Connecticut’ 25%
‘DC’ 0%
‘Delaware’ 24%
‘Florida’ 50%
‘Georgia’ 61%
‘Hawaii’ 0%
‘Idaho’ 94%
‘Illinois’ 17%
‘Indiana’ 74%
‘Iowa’ 62%
‘Kansas’ 79%
‘Kentucky’ 74%
‘Louisiana’ 81%
‘Maine’ 31%
‘Maryland’ 1%
‘Massachusetts’ 5%
‘Michigan’ 35%
‘Minnesota’ 35%
‘Mississippi’ 74%
‘Missouri’ 71%
‘Montana’ 100%
‘Nebraska’ 100%
‘Nevada’ 56%
‘NewHampshire’ 36%
‘NewJersey’ 18%
‘NewMexico’ 28%
‘NewYork’ 11%
‘NorthCarolina’ 45%
‘NorthDakota’ 100%
‘Ohio’ 55%
‘Oklahoma’ 96%
‘Oregon’ 28%
‘Pennsylvania’ 32%
‘RhodeIsland’ 41%
‘SouthCarolina’ 69%
‘Tennessee’ 74%
‘Texas’ 70%
‘Utah’ 86%
‘Vermont’ 6%
‘Virginia’ 40%
‘Washington’ 13%
‘WestVirginia’ 95%
‘Wisconsin’ 40%
‘Wyoming’ 100%

In any statistical modeling, it is important to understand the effects of your chosen input parameters. I varied the national and local variability effects and observed the changes. Trump’s odds of victory were between 18% and 41% for increasing amounts of variability. His most likely electoral vote score hovered in the 200-240 range.

Below is all of the MATLAB source code I used to generate this. Most of it should be compatible with various free versions of MATLAB like Octave and FreeMat. Also, this is all better formatted on my computer, copy-paste took out all my tabs.

%% CQW’s Open Election Model
% 9/21/2016 – Caleb Q Washington
tic;
paramFile = ‘electionModelInputs.csv’;
nationalVariability = 0.06*sqrt(2/pi);
stateVariability = 0.05;
numberOfRuns = 1e6;
output = electionModel(paramFile,nationalVariability,stateVariability,numberOfRuns);

winOdds = mean(sum(output.electoralCollege,1) > 270); % fraction of trump wins
statePct = mean(output.outcomes,2); % fraction of trump wins by state
minElecVotes = min(sum(output.electoralCollege,1)); % minimum trump electoral votes
maxElecVotes = max(sum(output.electoralCollege,1)); % maximum trump electoral votes
toc
hist(sum(output.electoralCollege,1),50);
hold on;
line([270 270],[0 3.5e4]);
xlabel(‘Electoral College Votes for Trump’)
ylabel(‘#/1,000,000’);

 

function output = electionModel(paramFile,stdOvr,stdLoc,N)
% param file is a text file of format:
% StateName,ElectoralVotes,ClintonPoll,TrumpPoll
%
% stdOvr is the standard deviation between polls and results, common to all
% states, and represents the national shift in polls between now and election day.
%
% stdLoc is the local standard deviation between polls and results,
% potentially different in each state. It represents states changinge more
% or less than the national change
%
% N is the number of iterations to run
%
% CQW 09/21/2016

% open param file and read it into cell array C
fid = fopen(paramFile);
C = textscan(fid,’%s%f%f%f’,’Delimiter’,’,’);
fclose(fid);

% take cell array and turn into vectors
stateNames = C{1};
electoralVotes = C{2};
candidate0Poll = C{3}; %Clinton
candidate1Poll = C{4}; %Trump

Nstate = length(stateNames);
outcomes = zeros(Nstate,N);

for n = 1:N
nationalShift = stdOvr*randn;
for m = 1:Nstate
totalShift = nationalShift + stdLoc*randn; % amount to shift poll
cand0 = candidate0Poll(m) + totalShift; % Clinton result
cand1 = candidate1Poll(m) – totalShift; % Trump result
if cand1 > cand0
outcomes(m,n) = 1; % mark Trump wins with a 1
end
end
end

electoralCollege = outcomes.*repmat(electoralVotes,1,N);

output.outcomes = outcomes;
output.electoralCollege = electoralCollege;
end

State Votes Clinton Poll Trump Poll
Alabama 9 0 1
Alaska 3 0.30 0.39
Arizona 11 0.40 0.416
Arkansas 6 0.325 0.52
California 55 0.51 0.317
Colorado 9 0.427 0.39
Connecticut 7 0.475 0.3825
DC 3 1 0
Delaware 3 0.42 0.32
Florida 29 0.45 0.45
Georgia 16 0.415 0.455
Hawaii 4 1 0
Idaho 4 0.23 0.44
Illinois 20 0.43 0.30
Indiana 11 0.36 0.45
Iowa 6 0.387 0.430
Kansas 6 0.345 0.4575
Kentucky 8 0.36 0.45
Louisiana 8 0.375 0.495
Maine 4 0.438 0.37
Maryland 10 0.603 0.270
Massachusetts 11 0.55 0.32
Michigan 16 0.445 0.393
Minnesota 10 0.442 0.39
Mississippi 6 0.41 0.50
Missouri 10 0.383 0.460
Montana 3 0 1
Nebraska 5 0 1
Nevada 6 0.42 0.44
NewHampshire 4 0.437 0.387
NewJersey 14 0.495 0.370
NewMexico 5 0.41 0.33
NewYork 29 0.508 0.338
NorthCarolina 15 0.448 0.430
NorthDakota 3 0 1
Ohio 8 0.432 0.450
Oklahoma 7 0.29 0.53
Oregon 7 0.405 0.325
Pennsylvania 20 0.468 0.402
RhodeIsland 4 0.44 0.41
SouthCarolina 9 0.3967 0.4667
Tennessee 11 0.35 0.44
Texas 38 0.378 0.450
Utah 6 0.24 0.39
Vermont 3 0.43 0.215
Virginia 13 0.443 0.408
Washington 12 0.46 0.3050
WestVirginia 5 0.305 0.53
Wisconsin 10 0.435 0.400
Wyoming 3 0 1

Leave a Reply

Your email address will not be published. Required fields are marked *