Update Self-experimentation-calibration.md

This commit is contained in:
Nuño Sempere 2019-05-06 16:01:25 +02:00 committed by GitHub
parent d54cff97b6
commit f4c61e2465
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -215,11 +215,11 @@ Multiply my probability by 1.005-ish and take 1.2% from that, and I'd be slightl
If, like before, I train that model 1000 times on a randomly selected 80% of my dataset, and test it on the other 20%, I get on average a Brier score of 0.07545493, slightly *better* than my own 0.0755985, but not by much. Perhaps it gets that slight advantage because the p*1.0005 - 1.2% corrects my uncalibrated 1:15 odds without murking the rest too much? Surprisingly, if I train it on a randomly selected 50% of the dataset (1000 times), its average Brier score improves to 0.07538371. I do not think that a difference of 0.002 tells me much. If, like before, I train that model 1000 times on a randomly selected 80% of my dataset, and test it on the other 20%, I get on average a Brier score of 0.07545493, slightly *better* than my own 0.0755985, but not by much. Perhaps it gets that slight advantage because the p*1.0005 - 1.2% corrects my uncalibrated 1:15 odds without murking the rest too much? Surprisingly, if I train it on a randomly selected 50% of the dataset (1000 times), its average Brier score improves to 0.07538371. I do not think that a difference of 0.002 tells me much.
## 3. Conclusion & things I would have done differently. ## 3. Things I would do differently
In conclusion, I am surprised that the dumb model beats the others most of the time, though I think this might be explained by the combination of not having that much data and of having a lot of variables: the random errors in my regression are large.
If I were to redo this experiment, I'd: If I were to redo this experiment, I'd:
- Use more data: I only used half the questions of the aforementioned course. 500 datapoints are really not that much. - Gather more data: I only used half the questions of the aforementioned course. 500 datapoints are really not that much.
- Program a function to enter the data for me much earlier. Instead of doing that, I instead: - Program a function to enter the data for me much earlier. Instead of doing that, I instead:
1. Started by writting my probabilities in my lecture notes, with the intention of cribbing them later. Never got around to doing that. 1. Started by writting my probabilities in my lecture notes, with the intention of cribbing them later. Never got around to doing that.
2. Started by writting it directly to a .csv myself 2. Started by writting it directly to a .csv myself
@ -227,3 +227,8 @@ If I were to redo this experiment, I'd:
4. Saw that still took too much time -> Wrote a function to wrap the other functions. Everything went much more smoothly afterwards. 4. Saw that still took too much time -> Wrote a function to wrap the other functions. Everything went much more smoothly afterwards.
- Use a scale other than the BDC: it's not made for measuring daily moods. - Use a scale other than the BDC: it's not made for measuring daily moods.
- Think through which data I want to collect from the beginning; I could have added the BDC from the start, but didn't. - Think through which data I want to collect from the beginning; I could have added the BDC from the start, but didn't.
## 4. Conclusion
In conclusion, I am surprised that the dumb model beats the others most of the time, though I think this might be explained by the combination of not having that much data and of having a lot of variables: the random errors in my regression are large. I see that I am in general well calibrated (in the particular domain analyzed here) but with room for improvement when giving 1:5, 1:6, and 1:15 odds.