Machine Learning Notes

Machine learning is the study of teaching machines to learn using data, with and without human supervision.

This microsite collects Jupyter notebooks I (@residentmario) created on Kaggle exploring scikit-learn and a few related traditional machine learning libraries.

The topics covered range from basic linear regression and train-test splits all way the way to random forests, oversampling, and probability calibration. Here is the list:

Pumpkin Price Linear Regression
Pumpkin Price Polynomial Regression
Ridge Regression with Video Game Sales Prediction
Lasso Regression with Tennis Odds
Ridge Regression Proof and Implementation
Gradient Descent with Linear Regression
Ridge Regression Cost Function
Soft Thresholding with Lasso Regression
L1 Norms Versus L2 Norms
Model Fit Metrics
Variance Inflation Factors with NYC Building Sales
Pearson's R with Health Searches
Spearman Correlation with Montreal Bikes
Gaming Cross Validation and Hyperparameter Search
NYC Buildings Part 1: Elastic Net
NYC Buildings Part 2: Feature Scales and Grid Search
Hypothesis Testing with Firearm Licensees
Bootstrapping and Confidence Intervals with Veteran Suicides
Cross Validation Schemes with Food Consumption
Leakage Especially Knowledge Leakage
Pipelines with Linux Gamers
Logistic Regression with WTA Tennis Matches
Convext Hulls with Vancouver Crimes
Wald Confidence Intervals with Iowa Liquor Sales
Wilson Confidence Intervals with Amazon Toys
Bias-Variance Tradeoff
Curse of Dimensionality
Learning Curves with Zillow Economics Data
Dimensionality Reduction and PCA for Fashion MNIST
Indirect Models and PLS Regression with F-MNIST
Linear Discriminant Analysis with Pokemon Stats
Classification Metrics with Seattle Rain
Log Loss with New York City Building Sales
Kernel Density Estimation with TED Talks
Model Optimism and Information Criteria
Primer on Naive Bayes Algorithm
Decision Trees with Animal Shelter Outcomes
Bagging with Animal Shelter Outcomes
Support Vector Machines and Stochastic Gradient Descent
Kernels and Support Vector Machine Regularization
ML Visualization with Yellowbrick, Part 1
ML Visualization with Yellowbrick, Part 2
ML Visualization with Yellowbrick, Part 3
Undersampling and Oversamping Imbalanced Data
Oversampling with SMOTE and ADASYN
Advanced Undersampling and Data Cleaning
Automated Feature Selection with Sklearn
Automated Feature Selection with Boruta
Notes on Matrix Factorization Machines
Simple Techniques for Missing Data Imputation
Notes on Semi-Supervised Learning
Non-Parametric Regression
Gaussian Process Regression and Classification
Notes on Multiclass and Multitask Schemes
Notes on Classification Probability Calibration