Update and rename Writeup.md to Self-experimentation-calibration.md

This commit is contained in:
Nuño Sempere 2019-05-06 15:59:03 +02:00 committed by GitHub
parent ddfcf0927d
commit d54cff97b6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -21,11 +21,14 @@ I have 505 observations. The dataset is available if I know you or if you can ge
## 1. How well calibrated am I?
A picture is worth a thousand words:
In this case, two pictures: The second merges probabilities > and < than .5 in the obvious way: it interprets having assigned a probability of, say, 0.33 to "X" as having assigned a probability of 0.66 to "Not X". Working with odds, this is even more straightforward: if you offer 1:2 odds in favor of "X", you're also offering 2:1 odds in favor of "Not X".
![](https://nunosempere.github.io/rat/Calibration1.png)
![](https://nunosempere.github.io/rat/Calibration2.png)
In this case, two pictures: The second merges probabilities > and < than .5 in the obvious way: it interprets having assigned a probability of, say, 0.33 to "X" as having assigned a probability of 0.66 to "Not X". Working with odds, this is also straightforward: if you think that 1:2 are fair odds in favor of "X", you also think that 2:1 are fair odds in favor of "Not X".
I notice that my 1:5 is closer to 1:2.5 in reality, with n=28 observations. My 1:15 is also closer to 1:5, but I think that this particularity can be explained by 1:15 being the default value, i.e., the value which got written when I left that cell blank. I'll nonetheless pay attention to that in the future. On the bright side, my 1:2 and 1:3 odds are exactly on point.
My Brier score is 0.0755985, which indicates that the questions were in general rather easy, or that I'm very sharp.
My Brier score is 0.0755985. The significant digits become relevant later.
## 2. How do I compare to a some simple regression models?
I create four simple linear regression models and interpret their output as probability. I also consider a really really dumb predictor, for comparison purposes.
@ -37,14 +40,14 @@ I create four simple linear regression models and interpret their output as prob
Here is a table:
| | Which variable does the regression try to predict? | Variables regressed on (what information does this model work with) | Brier score tested & trained on the whole set | Trained on 80% and tested on the rest (average value, 1000+ times) | Trained on 50% and tested on the rest (average value, 1000+ times) |
|---------------------------------------------------------|----------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|--------------------------------------------------------------------|--------------------------------------------------------------------|
| 1. Dumbest model | Binary outcome | None. Empty regression, just the intercept | 0.095496 | 0.09538288 | 0.09542671 |
| 2. Regression without any subjective factors | Binary outcome | Type of question Homework vs Exam vs Lecture question First vs second try BDC | 0.082598 | 0.131296 | 0.1412152 |
| 3. Regression model with inner experience | Binary outcome | 1. Type of question 2. Homework vs Exam vs Lecture question 3. First vs second try 4. Becker Depression Checklist Score. 5. Inner experience: Hunch to Incredibly Confident | 0.076962 | 0.1040722 | 0.1149272 |
| 4. Full regression model | Binary outcome | 1. Type of question 2. Homework vs Exam vs Lecture question 3. First vs second try 4. Becker Depression Checklist Score. 5. Inner experience: Hunch to Incredibly Confident 6. Subjective probability | 0.073224 | 0.09260587 | 0.1020023 |
| 5. Regression model with only my subjective probability | Binary outcome | 1. Subjective probability | 0.075541 | 0.07545493 | 0.07538371 |
| 6. Subjective probability | Does not apply | Not a regression model | Does not appy | Not a regression model | Does not apply | My Brier score was 0.0755985 | - | - |
| | Which variable does the regression try to predict? | Variables regressed on (what information does this model work with) | Brier score tested & trained on the whole set | Trained on 80% and tested on the rest (average value, 1000+ times) | Trained on 50% and tested on the rest (average value, 1000+ times) |
|---------------------------------------------------------|----------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|--------------------------------------------------------------------|--------------------------------------------------------------------|
| 1. Dumbest model | Binary outcome | None. Empty regression, just the intercept | 0.095496 | 0.09538288 | 0.09542671 |
| 2. Regression without any subjective factors | Binary outcome | 1. Type of question 2. Homework vs Exam vs Lecture question 3. First vs second try 4. BDC | 0.082598 | 0.131296 | 0.1412152 |
| 3. Regression model with inner experience | Binary outcome | 1. Type of question 2. Homework vs Exam vs Lecture question 3. First vs second try 4. Becker Depression Checklist Score. 5. Inner experience: Hunch to Incredibly Confident | 0.076962 | 0.1040722 | 0.1149272 |
| 4. Full regression model | Binary outcome | 1. Type of question 2. Homework vs Exam vs Lecture question 3. First vs second try 4. Becker Depression Checklist Score. 5. Inner experience: Hunch to Incredibly Confident 6. Subjective probability | 0.073224 | 0.09260587 | 0.1020023 |
| 5. Regression model with only my subjective probability | Binary outcome | 1. Subjective probability | 0.075541 | 0.07545493 | 0.07538371 |
| 6. Subjective probability | Does not apply | Not a regression model | Does not appy | Not a regression model | Does not apply | My Brier score was 0.0755985 | - | - |
### 2.1. Dumb model
@ -54,6 +57,7 @@ A dumb model which always outputs the overall base rate gets a Brier score of 0.
#### 2.2.1. Code and output
```
> summary(LM6)
Call:
@ -65,21 +69,23 @@ Residuals:
-1.00188 0.01493 0.04878 0.09309 0.49155
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.8408927 0.0893759 9.408 < 2e-16 ***
as.factor(D$Type.of.question)MC 0.1898829 0.0590873 3.214 0.001396 **
as.factor(D$Type.of.question)MS -0.0080026 0.0618861 -0.129 0.897163
as.factor(D$Type.of.question)TF 0.1560246 0.0631872 2.469 0.013875 *
Is_Normal 0.0342562 0.0453966 0.755 0.450848
Is_Homework -0.0003015 0.0512829 -0.006 0.995311
BDC -0.0020370 0.0015640 -1.302 0.193366
as.factor(Trial..1st.if.not.specified.)2 -0.2039517 0.0590451 -3.454 0.000599 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Estimate Std. Error t value Pr(>|t|)
Residual standard error: 0.2897 on 497 degrees of freedom
Multiple R-squared: 0.1349, Adjusted R-squared: 0.1227
F-statistic: 11.07 on 7 and 497 DF, p-value: 4.899e-13
(Intercept) 0.8408927 0.0893759 9.408 < 2e-16 ***
as.factor(D$Type.of.question)MC 0.1898829 0.0590873 3.214 0.001396 **
as.factor(D$Type.of.question)MS -0.0080026 0.0618861 -0.129 0.897163
as.factor(D$Type.of.question)TF 0.1560246 0.0631872 2.469 0.013875 *
Is_Normal 0.0342562 0.0453966 0.755 0.450848
Is_Homework -0.0003015 0.0512829 -0.006 0.995311
BDC -0.0020370 0.0015640 -1.302 0.193366
as.factor(Trial..1st.if.not.specified.)2 -0.2039517 0.0590451 -3.454 0.000599 ***
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.2897 on 497 degrees of freedom
Multiple R-squared: 0.1349, Adjusted R-squared: 0.1227
F-statistic: 11.07 on 7 and 497 DF, p-value: 4.899e-13
```
#### 2.2.2. Interpretation
I found it rather surprising that how depressed I was (BDC: Becker Depression Checklist) didn't seem to have that big of an effect. In particular, I try to adjust for my mood, but I wasn't particularly expecting to succeed. Anecdotically, I do see an effect of my mood on the extremity of my odds: The sadder I am the more recluctant I am to give 1:1000, 1:10000 and higher odds, even about things which I'm really sure about.
@ -94,6 +100,7 @@ If I use this model to output predicted probabilities for each question, I get a
#### 2.3.1. Code and output
```
> lm(Result_Binary ~ as.factor(D$Type.of.question) + as.factor(D$One.word) + Is_Normal + Is_Homework + BDC + as.factor(Trial..1st.if.not.specified.), data=D) -> LM1
> summary(LM1)
@ -125,6 +132,7 @@ Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
Residual standard error: 0.2813 on 492 degrees of freedom
Multiple R-squared: 0.193, Adjusted R-squared: 0.1733
F-statistic: 9.804 on 12 and 492 DF, p-value: < 2.2e-16
```
#### 2.3.2 Interpretation
As expected, the coefficients associated with a measure of my inner confidence check out. Huch < Somewhat confident < Confident < Very confident < Incredibly confident (IC). Note that all of the factors are present, instead of one of them having been swallowed by the intercept, because there were 3 times which I just left that question blank, and I didn't want to remove that data.
@ -135,7 +143,7 @@ If I instead train the model on a random selection of 80% of the data points, an
### 2.4. Including my subjective probability & everything else.
#### 2.4.1. Code and output
```
> summary(LM3)
Call:
@ -169,6 +177,7 @@ Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
Residual standard error: 0.2747 on 491 degrees of freedom
Multiple R-squared: 0.2316, Adjusted R-squared: 0.2113
F-statistic: 11.39 on 13 and 491 DF, p-value: < 2.2e-16
```
#### 2.4.1. Interpretation
All the other factors become slightly more irrelevant. It seems that my subjective probability does add information, a lot of it. After having seen the graph at the beginning, this is not surprising.
@ -179,6 +188,7 @@ If I use this model to output probabilities for each question, I get a Brier sco
### 2.5. Including only my subjective probability
#### 2.5.1. Code and output
```
> summary(LM2)
Call:
@ -198,6 +208,7 @@ Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
Residual standard error: 0.2754 on 503 degrees of freedom
Multiple R-squared: 0.209, Adjusted R-squared: 0.2074
F-statistic: 132.9 on 1 and 503 DF, p-value: < 2.2e-16
```
#### 2.3.2. Interpretation
Multiply my probability by 1.005-ish and take 1.2% from that, and I'd be slightly more accurate. I'm not reading much into that. If I do this, I get an slightly better Brier score of 0.075541, slightly better than my own 0.0755985.