From e64d7ab5596300ad28e7c39f7838d94ab96ea6a9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nu=C3=B1o=20Sempere?= <nuno.sempere@gmail.com>
Date: Sat, 12 Oct 2019 19:34:51 +0200
Subject: [PATCH] Update readme.md

---
 maths-prog/MachineLearningDemystified/readme.md | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/maths-prog/MachineLearningDemystified/readme.md b/maths-prog/MachineLearningDemystified/readme.md
index b2bf061..827dce3 100644
--- a/maths-prog/MachineLearningDemystified/readme.md
+++ b/maths-prog/MachineLearningDemystified/readme.md
@@ -11,7 +11,7 @@ Otherwise, the current files in this directory are:
   - Algorithms: Naïve Bayes (Bernoulli & Gaussian), Nearest Neighbours, Support Vector Machines, Decision Trees, Random Forests (and Extrarandom forests), and multilayer perceptron (simple NN).
 - [AlgorithmsRegression.py](https://github.com/NunoSempere/nunosempere.github.io/blob/master/maths-prog/MachineLearningDemystified/AlgorithmsRegression.py). I try to predict the healthcare costs of a particular individual, using all the features in the dataset.
   - Algorithms: Linear Regression, Lasso, Nearest Neighbours Regression, LinearSVR, SVR with different kernels, Tree regression, Random forest regression (and extra-random forest regression), and multilayer perceptron regression (simple NN).
-- [Clustering.py](https://github.com/NunoSempere/nunosempere.github.io/blob/master/maths-prog/MachineLearningDemystified/Clustering.py). I then studied some of the most common clustering algorithms. The area seems almost pre-Aristotelian. Clustering algorithms get the task to *[send a message to Garcia](https://courses.csail.mit.edu/6.803/pdf/hubbard1899.pdf)*, and they undertake the task, no questions asked. 
+- [Clustering.py](https://github.com/NunoSempere/nunosempere.github.io/blob/master/maths-prog/MachineLearningDemystified/Clustering.py). I then studied some of the most common clustering algorithms. The area seems almost pre-Aristotelian. Clustering algorithms get the task to *[send a message to Garcia](https://courses.csail.mit.edu/6.803/pdf/hubbard1899.pdf)*, and they undertake the task, no questions asked. Heroically. I also take the opportunity here to create some visualizations, with the seaborn library.
   - Algorithms: KMeans, Affinity Propagation, Mean Shift, Spectral Clustering, Agglomerative Clustering, DBSCAN, Birch, Gaussian Mixture.
 
 ## Thoughts on sklearn
@@ -23,3 +23,17 @@ The exercise proved highly, highly instructive, because sklearn is really easy t
 It came as a surprise to me that understanding and implementing the algorithm were two completely different steps.
 
 ## Some visualizations and findings about the dataset.
+
+- Those who have 4+ children get charged less by insurance, and smoke less.
+![](children-charge-smoking.png)
+
+- The disgreggation by age seems interesting, because there are three prongs, roughly: 1) normal people who don't smoke, 2) those who get charged more: made out of those who don't smoke, and 3) those who get charged a lot, which only comprises smokers. The Gaussian Mixture & K-Means algorithms do better than most others at discriminating between these threee groups, and made me realize the difference.
+
+![](GaussianMixture-age.png)
+![](GaussianMixture-smoker_numeric.png)
+
+![](age_charge_smoking.png)
+![](AgglomerativeClustering-age.png)
+
+- BMI is interesting, because there seems to be a line at BMI = 30, almost as if someone used that to make decisions about how much to charge, or what to diagnose. Normally, we'd expect something more continuous.
+![](AgglomerativeClustering-age.png)