Update phacking.md

This commit is contained in:
Nuño Sempere 2019-05-18 13:29:25 +02:00 committed by GitHub
parent abebbe2b62
commit cd2b8ea1c9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -29,11 +29,11 @@ b) If we only report A ~ Y, we find a huge effect; whereas male or female EAs ha
## A note on regressions and frequentist probability.
If you have 303 values for the variable A: {A1, A2, A3, ..., A303} and 303 values for the variable B. {B1, B2, B3, ..., B303}, you consider lines of the form A = I + C\*B, and look at their associated points {(I+C\*B1,B1),(I+C\*B2,B2), (I+C\*B3,B3), ..., (I+C\*B303,B303)}. They are separated from the points {(A1,B1),(A2,B2),(A3,B3),..., (A303,B303)} by whatever distance.
For example, with I and C set, the point (I+C\*B1, B1) is separated from (A1,B1) by a distance of sqrt((I+C\*B1 - A1)^2 - (B1-B1)^2) = sqrt(((I+C\*B1 - A1)^2)) = abs(I+C\*B1 - A1). In the mathematical concept of distance, which is always greater than 0, but we do want the sign, so d1 = (I+C\*B1 - A1). All in all, you have 303 such distances: {d1, d2, d3, ..., d4}
For example, with I and C set, the point (I+C\*B1, B1) is separated from (A1,B1) by a distance of sqrt((I+C\*B1 - A1)^2 - (B1-B1)^2) = sqrt(((I+C\*B1 - A1)^2)) = abs(I+C\*B1 - A1). For mathematicians, distances are always greater than 0, but we'll let go of our prejudices, so d1 = (I+C\*B1 - A1). All in all, you'll have 303 such distances: {d1, d2, d3, ..., d303} for every pair of points (I,C).
You then find the values I and C which minimize the sum of the distances from the point (Ai, Bi) to the point (I+C\*Bi, Bi). That is, you find the line A = I + C\*B which best fits your data. We'll want to distinguish between I and C as variables and (II,CC) as the point which solves their minimization problem.
You then find the values I and C which minimize the sum of the squares of the distances: d1^2+d2^2 +...+d303^2. You could also have ^3 or ^4, or another transformation altogether. In any case, you find values for I and C so that the line A = I + C\*B which approximates your data. We'll want to distinguish between I and C as variables and (II,CC) as the point which solves their minimization problem.
Now, you can consider the distances which you calculated before: {d1, d2, d3, ..., d4}, treat them like rightful variables, and calculate it's mean and its standard deviation. Intuitively, the mean is going to be 0, because otherwise, you would have another better line (just change the intercept). Keep the standard deviation of the distances = SD in mind, though.
Now, you can consider the distances which you calculated before: {d1, d2, d3, ..., d303}, treat them like rightful variables, and calculate it's mean and its standard deviation. Intuitively, the mean is going to be 0, because otherwise, you would have another better line (just change the intercept). Keep the standard deviation of the distances = SD in mind, though.
If you're a frequentist, you can then assign a p-value to that. Once you've found out the value (II,CC), you pretend that you were a virtuous Bayesian all along, and it just happened that you had the following prior for C:
- You had previously assigned p=0.5 to being a gaussian distribution centered at 0, with standard deviation SD (the standard deviation of the mean of the distances)