add uncertainty due to being far away, change README

2024-04-14 20:09:52 -04:00 · 2024-04-14 20:09:52 -04:00 · c27c76ac60
commit c27c76ac60
parent cd0546b3e8
4 changed files with 754 additions and 836 deletions
--- a/README.md
+++ b/README.md
@ -2,7 +2,7 @@

 ## About

-This is a simple model of the US electoral college. It aims to be conceptually simple and replicatable. Currently, it incorporates data from the electoral history in each state, since the 2000 election, as well as states-specific polls, if they exist.
+This is a simple model of the US electoral college. It aims to be conceptually simple and replicatable. Currently, it incorporates data from state specific polls, and otherwise defaults to the state's electoral history baserate. 

 Other projects, like [538](https://en.wikipedia.org/wiki/FiveThirtyEight), [Nate Silver's substack](https://www.natesilver.net/) or [Gelman's model](https://github.com/TheEconomist/us-potus-model) are to this project as a sportscar is to a walking stick. They are much more sophisticated, and probably more accurate. However, they are also more difficult to understand and to maintain.

@ -59,7 +59,9 @@ Essentially, Obama won by much more than Bush, Trump or Biden. But our naïve mo

 So the story here is that our model is not very sophisticated. But another might be that Obama was much more popular than Biden, and if Democrats can tap into that again, they will do better.

-### The polls story
+Still, *for states in which there is no polling*, the electoral history seems like a decent enough proxy: these are the states which are solid Republican or solid Democrat.
+
+### The unadjusted polls story

 If we only look at polls (and use baserates when there are no polls—which happens for states like Alabama, which lean strongly towards one party already), this time the Republicans win by a mile: with 95% probability. 

@ -73,11 +75,21 @@ What's happening here is that:
  - In a normal democracy, like in Spain, a protest party could amass some electors, and use them as bargaining chips to govern together with one of the other major parties. For instance, this is what happened with Ciudadanos in Spain. Perhaps third parties performing strongly could conceivably, create pressure to reform the US electoral system.
  - In the US, with the system as currently exists, these votes seem to favour Trump.

+However, this 95% really doesn't feel right. It is only accounting, and very naively, for the sample size of the poll. It not only assumes that the poll is a representative sample, it also assumes that opinions will not drift between now and election time. This later assumption is fatal.
+
+### The adjusted polls story 
+
+If we look at how [Gallup presidential election polls](https://news.gallup.com/poll/110548/gallup-presidential-election-trial-heat-trends.aspx) did between 1936 and 2008, we get a sense that polls in mid April just aren't very informative as to the eventual result. Doing the tally, for republicans, polls have a 15% relative standard error: huge when races in battleground states tend to be close to 50/50 (49/51, 48/52, 47/53, etc.)
+
+Moreover, these are national polls: state polls will have smaller samples and thus more uncertainty. And current pollsters are nor as good as gallup. And... there might be other sources of uncertainty that I'm missing.
+
+But incorporating reasonable estimates of uncertainty, the probability of a republican win the model gives is 51%. This is now in line with [prediction markets](https://electionbettingodds.com/PresidentialParty2024.html).
+
 ## Notes on other models

 **FiveThirtyEight** [2020](https://fivethirtyeight.com/features/how-fivethirtyeights-2020-presidential-forecast-works-and-whats-different-because-of-covid-19/), [2016](https://fivethirtyeight.com/features/a-users-guide-to-fivethirtyeights-2016-general-election-forecast/)

-Notes on 2020:
+Notes on 2020 model:

 - Adjusted for COVID pandemic
  - Manually increased uncertainty
@ -144,20 +156,13 @@ Notes on 2020:

 ## Roadmap

-It's not clear to me what I will do with this. As a result of writting down the model, I've realized that 80/20-ing a 538 would involve more effort than what I was expecting. I may just add the national drift + election day error + idiosyncratic error terms and then just call it a day.
+It's not clear to me what I will do with this. After starting to program this, I realized that creating a model that was in the same ballpark as The Economist's or 538's would just be too much effort. After adding national drift + election day error + idiosyncratic error terms, this isn't quite at the 80/20 stage, but it feels like it's at a good point, and I may just leave it here.

 ### To do

 General:

 - [ ] Share with Samotsvety
- [ ] Implement possible next steps:
-  - Uncertainty due to drift between now and the election
-  - Uncertainty due to difference between last election poll and final vote share
- [ ] Better prior by incorporating more past elections
- Think about how to:
-  - [ ] Inject error
-  - [ ] Inject correlated error
 - [ ] Think about whether I want to monetize this
  - Maybe with Vox?
  - Otherwise: add MIT license & publish
@ -166,15 +171,17 @@ General:

 Steps to make this more accurate:

+- [ ] Better prior by incorporating more past elections
+- Think about how to:
+  - [x] Inject error
+  - [ ] Inject correlated error
 - [ ] Think about correlation between states. 
  - How?
  - [ ] Consider conditional probabilities
  - See how other models account for the correlation
 - [ ] Add more years
 - [ ] Polling company errors
- [ ] Make polling errors wider?
 - [ ] Economic fundamentals?
- [ ] Print more data for polls

 ### Done 

@ -197,6 +204,12 @@ Consider polls:
 - [x] Exclude polls older than one month?
 - [x] Inspect polling stderrs

+Uncertainty
+
+- [x] Implement key possible next steps:
+  - [x] Uncertainty due to drift between now and the election
+  - [x] Uncertainty due to difference between last election poll and final vote share
+
 General

 - [x] Work on README
@ -204,6 +217,8 @@ General
 - [x] Histogram distributions of electoral college votes
 - [x] Think about next steps
 - [x] Get clarity on next steps
+- [x] Make polling errors wider?
+- [x] Print more data for polls

 ### Discarded

--- a/data/poll-errors/note.txt
+++ b/data/poll-errors/note.txt
@ -0,0 +1,3 @@
+Properly, the 0.15 relative std isn't over the *normalized* rep/democrat share of vote (i.e., excluding independents). 
+
+However, I think that's a relatively unimportant effect, so I'm ignoring it for now.
--- a/main.go
+++ b/main.go
@ -224,7 +224,7 @@ func getChanceCandidateWinsFromPollShare(candidate_p float64, poll_sample_size f
 	return getProbabilityAboveX(0.5, candidate_p, std)
 }

-func getChanceRepublicanWinFromPoll(poll Poll, pretty_print bool) float64 {
+func getChanceRepublicanWinFromPoll(poll Poll, pretty_print bool, std_additional_uncertainty float64) float64 {

 	biden_percentage, biden_exists := poll.PollResults["Biden"]
 	trump_percentage, trump_exists := poll.PollResults["Trump"]
@ -240,7 +240,8 @@ func getChanceRepublicanWinFromPoll(poll Poll, pretty_print bool) float64 {
 	joint_trump_biden_sample_size := (biden_share + trump_share) * float64(poll.SampleSize)
 	std_error_poll_mean := math.Sqrt((normalized_trump_share * normalized_biden_share) / joint_trump_biden_sample_size)

-	p_republican_win := getProbabilityAboveX(0.5, normalized_trump_share, std_error_poll_mean)
+	std_error := std_error_poll_mean + std_additional_uncertainty
+	p_republican_win := getProbabilityAboveX(0.5, normalized_trump_share, std_error)

 	if pretty_print {
 		fmt.Printf("\n\t\tSample size: %f", joint_trump_biden_sample_size)
@ -259,18 +260,18 @@ func printStates(states []State) {
 		fmt.Printf("\n\tVotes: %d", state.Votes)
 		fmt.Printf("\n\tHistory: %s", state.PresidentialElectoralHistory)

-		p_baserate_republican := 0.0
+		p_baserate_republican_win := 0.0
 		for _, party := range state.PresidentialElectoralHistory {
 			if party == "R" {
-				p_baserate_republican++
+				p_baserate_republican_win++
 			}
 		}
-		fmt.Printf("\n\tHistorical base rate of R win: %f", p_baserate_republican)
+		fmt.Printf("\n\tHistorical base rate of R win: %f", p_baserate_republican_win)

 		// Individual poll
 		for _, poll := range state.Polls {
 			fmt.Printf("\n\tPoll: %+v", poll)
-			_ = getChanceRepublicanWinFromPoll(poll, true)
+			_ = getChanceRepublicanWinFromPoll(poll, true, 0.0)
 		}

 		// Aggregate poll
@ -292,7 +293,7 @@ func printStates(states []State) {
 			aggregate_poll.PollResults["Trump"] = 100.0 * num_trump_votes / aggregate_sample_size

 			fmt.Printf("\n\tAggregate poll: %+v", aggregate_poll)
-			_ = getChanceRepublicanWinFromPoll(aggregate_poll, true)
+			_ = getChanceRepublicanWinFromPoll(aggregate_poll, true, 0.0)
 		}

 	}
@ -335,21 +336,19 @@ func sampleFromState(state State) VotesForEachParty {
 	default:
 		{
 			/* Consider the base rate for the state */
-			p_baserate_republican := 0.0
+			p_baserate_republican_win := 0.0
 			for _, party := range state.PresidentialElectoralHistory {
 				if party == "R" {
-					p_baserate_republican++
+					p_baserate_republican_win++
 				}
 			}
-			p_baserate_republican = p_baserate_republican / float64(len(state.PresidentialElectoralHistory))
-			p_republican := p_baserate_republican // if no polls
+			p_baserate_republican_win = p_baserate_republican_win / float64(len(state.PresidentialElectoralHistory))
+			p_republican_win := p_baserate_republican_win // if no polls

 			/* Consider polls */
 			num_biden_votes := 0.0
 			num_trump_votes := 0.0
 			for _, poll := range state.Polls {
-				// p_republican_win_poll = getChanceRepublicanWinFromPoll(poll)
-
 				biden_percentage, biden_exists := poll.PollResults["Biden"]
 				trump_percentage, trump_exists := poll.PollResults["Trump"]
 				if !biden_exists || !trump_exists {
@ -364,12 +363,20 @@ func sampleFromState(state State) VotesForEachParty {
 				var aggregate_poll = Poll{SampleSize: int(aggregate_sample_size), PollResults: make(map[string]float64)}
 				aggregate_poll.PollResults["Biden"] = 100.0 * num_biden_votes / aggregate_sample_size
 				aggregate_poll.PollResults["Trump"] = 100.0 * num_trump_votes / aggregate_sample_size
-				p_republican_win_aggregate_polls := getChanceRepublicanWinFromPoll(aggregate_poll, false)
-				weight_polls := 0.5
-				p_republican = weight_polls*p_republican_win_aggregate_polls + (1.0-weight_polls)*p_baserate_republican
+
+				national_drift := 0.15
+				state_more_uncertain_than_national := 0.03
+				not_as_good_as_gallup := 0.03
+				idiosyncratic := 0.03
+				std_additional_uncertainty := national_drift + state_more_uncertain_than_national + not_as_good_as_gallup + idiosyncratic
+				p_republican_win_aggregate_polls := getChanceRepublicanWinFromPoll(aggregate_poll, false, std_additional_uncertainty)
+
+				// weight_polls := 0.75
+				// p_republican = weight_polls*p_republican_win_aggregate_polls + (1.0-weight_polls)*p_baserate_republican_win
+				p_republican_win = p_republican_win_aggregate_polls
 			}

-			if r.Float64() < p_republican {
+			if r.Float64() < p_republican_win {
 				return VotesForEachParty{Democrats: 0, Republicans: state.Votes}
 			} else {
 				return VotesForEachParty{Democrats: state.Votes, Republicans: 0}
@ -418,7 +425,12 @@ func printElectoralCollegeHistogram(samples []int) {
 		p := float64(count) / float64(len(samples)) * 100
 		cp += p

+		if i > 130 && i < 400 {
 			fmt.Printf("[ %2d, %4d): %s %.2f%% (%.0f%%)\n", i, i+1, barString(bar_length), p, cp)
+		} else if p >= 0.01 {
+			fmt.Printf(">0.01 probability outside of domain, you might want to change histogram parameters\n")
+
+		}
 	}

 }
@ -430,7 +442,7 @@ func main() {
 		return
 	}

-	n_sims := 100_000
+	n_sims := 1_000_000

 	printStates(states)
 	fmt.Printf("\n\n")
@ -444,8 +456,9 @@ func main() {
 			p_republicans++
 		}
 	}
-	p_republicans = p_republicans / float64(n_sims)
-	fmt.Printf("\n%% republicans: %f\n", p_republicans)
 	printElectoralCollegeHistogram(results)

+	p_republicans = p_republicans / float64(n_sims)
+	fmt.Printf("\n%% republicans: %f\n", p_republicans)
+
 }
--- a/output.txt
+++ b/output.txt