nunosempere.github.io/maths-prog/MachineLearningDemystified
2019-10-09 20:53:29 +02:00
..
AlgorithmsClassification.py Create AlgorithmsClassification.py 2019-10-09 20:41:24 +02:00
AlgorithmsRegression,py Create AlgorithmsRegression,py 2019-10-09 20:42:03 +02:00
CleaningUpData.py Create CleaningUpData.py 2019-10-09 20:40:56 +02:00
readme.md Create readme.md 2019-10-09 20:53:29 +02:00

Machine Learning Demystified

Several friends encouraged me to apply to a Data Scientist position at ID Insights, an organization I greatly admire, and for a position which I would be passionate about.

Unfortunately, they require Python, and I'm more of a R programmer. I decided to apply anyways, but before, I familiarized myself throrougly with numpy, pandas and sklearn, three of the most important libraries for machine learning in Python.

I used a dataset from Kaggle Health Care Cost Analysis, referenced as insurance.csv thoughout the code. The reader will also have to change the variable directory to fit their needs.

Otherwise, the current files in this directory are:

  • CleaningUpData.py. I couldn't work with the dataset directly, so I tweaked it somewhat.
  • AlgorithmsClassification.py. As a first exercise, I try to predict whether the medical bills of a particular individual are higher than the mean of the dataset. Some algorithms, like Naïve Bayes, are not really suitable for regression, but are great for predicting classes.
  • AlgorithmsRegression,py. I try to predict the healthcare costs of a particular individual, using all the features in the dataset.