Update readme.md

2019-10-12 19:34:51 +02:00 · 2019-10-12 19:34:51 +02:00 · e64d7ab559
commit e64d7ab559
parent a5012d52d0
1 changed files with 15 additions and 1 deletions
--- a/maths-prog/MachineLearningDemystified/readme.md
+++ b/maths-prog/MachineLearningDemystified/readme.md
@ -11,7 +11,7 @@ Otherwise, the current files in this directory are:
  - Algorithms: Naïve Bayes (Bernoulli & Gaussian), Nearest Neighbours, Support Vector Machines, Decision Trees, Random Forests (and Extrarandom forests), and multilayer perceptron (simple NN).
 - [AlgorithmsRegression.py](https://github.com/NunoSempere/nunosempere.github.io/blob/master/maths-prog/MachineLearningDemystified/AlgorithmsRegression.py). I try to predict the healthcare costs of a particular individual, using all the features in the dataset.
  - Algorithms: Linear Regression, Lasso, Nearest Neighbours Regression, LinearSVR, SVR with different kernels, Tree regression, Random forest regression (and extra-random forest regression), and multilayer perceptron regression (simple NN).
- [Clustering.py](https://github.com/NunoSempere/nunosempere.github.io/blob/master/maths-prog/MachineLearningDemystified/Clustering.py). I then studied some of the most common clustering algorithms. The area seems almost pre-Aristotelian. Clustering algorithms get the task to *[send a message to Garcia](https://courses.csail.mit.edu/6.803/pdf/hubbard1899.pdf)*, and they undertake the task, no questions asked. 
+- [Clustering.py](https://github.com/NunoSempere/nunosempere.github.io/blob/master/maths-prog/MachineLearningDemystified/Clustering.py). I then studied some of the most common clustering algorithms. The area seems almost pre-Aristotelian. Clustering algorithms get the task to *[send a message to Garcia](https://courses.csail.mit.edu/6.803/pdf/hubbard1899.pdf)*, and they undertake the task, no questions asked. Heroically. I also take the opportunity here to create some visualizations, with the seaborn library.
  - Algorithms: KMeans, Affinity Propagation, Mean Shift, Spectral Clustering, Agglomerative Clustering, DBSCAN, Birch, Gaussian Mixture.

 ## Thoughts on sklearn
@ -23,3 +23,17 @@ The exercise proved highly, highly instructive, because sklearn is really easy t
 It came as a surprise to me that understanding and implementing the algorithm were two completely different steps.

 ## Some visualizations and findings about the dataset.
+
+- Those who have 4+ children get charged less by insurance, and smoke less.
+![](children-charge-smoking.png)
+
+- The disgreggation by age seems interesting, because there are three prongs, roughly: 1) normal people who don't smoke, 2) those who get charged more: made out of those who don't smoke, and 3) those who get charged a lot, which only comprises smokers. The Gaussian Mixture & K-Means algorithms do better than most others at discriminating between these threee groups, and made me realize the difference.
+
+![](GaussianMixture-age.png)
+![](GaussianMixture-smoker_numeric.png)
+
+![](age_charge_smoking.png)
+![](AgglomerativeClustering-age.png)
+
+- BMI is interesting, because there seems to be a line at BMI = 30, almost as if someone used that to make decisions about how much to charge, or what to diagnose. Normally, we'd expect something more continuous.
+![](AgglomerativeClustering-age.png)