Machine Learning Notes

Machine learning is the study of teaching machines to learn using data, with and without human supervision.

This microsite collects Jupyter notebooks I (@residentmario) created on Kaggle exploring scikit-learn and a few related traditional machine learning libraries.

The topics covered range from basic linear regression and train-test splits all way the way to random forests, oversampling, and probability calibration. Here is the list:

  1. Pumpkin Price Linear Regression
  2. Pumpkin Price Polynomial Regression
  3. Ridge Regression with Video Game Sales Prediction
  4. Lasso Regression with Tennis Odds
  5. Ridge Regression Proof and Implementation
  6. Gradient Descent with Linear Regression
  7. Ridge Regression Cost Function
  8. Soft Thresholding with Lasso Regression
  9. L1 Norms Versus L2 Norms
  10. Model Fit Metrics
  11. Variance Inflation Factors with NYC Building Sales
  12. Pearson's R with Health Searches
  13. Spearman Correlation with Montreal Bikes
  14. Gaming Cross Validation and Hyperparameter Search
  15. NYC Buildings Part 1: Elastic Net
  16. NYC Buildings Part 2: Feature Scales and Grid Search
  17. Hypothesis Testing with Firearm Licensees
  18. Bootstrapping and Confidence Intervals with Veteran Suicides
  19. Cross Validation Schemes with Food Consumption
  20. Leakage Especially Knowledge Leakage
  21. Pipelines with Linux Gamers
  22. Logistic Regression with WTA Tennis Matches
  23. Convext Hulls with Vancouver Crimes
  24. Wald Confidence Intervals with Iowa Liquor Sales
  25. Wilson Confidence Intervals with Amazon Toys
  26. Bias-Variance Tradeoff
  27. Curse of Dimensionality
  28. Learning Curves with Zillow Economics Data
  29. Dimensionality Reduction and PCA for Fashion MNIST
  30. Indirect Models and PLS Regression with F-MNIST
  31. Linear Discriminant Analysis with Pokemon Stats
  32. Classification Metrics with Seattle Rain
  33. Log Loss with New York City Building Sales
  34. Kernel Density Estimation with TED Talks
  35. Model Optimism and Information Criteria
  36. Primer on Naive Bayes Algorithm
  37. Decision Trees with Animal Shelter Outcomes
  38. Bagging with Animal Shelter Outcomes
  39. Support Vector Machines and Stochastic Gradient Descent
  40. Kernels and Support Vector Machine Regularization
  41. ML Visualization with Yellowbrick, Part 1
  42. ML Visualization with Yellowbrick, Part 2
  43. ML Visualization with Yellowbrick, Part 3
  44. Undersampling and Oversamping Imbalanced Data
  45. Oversampling with SMOTE and ADASYN
  46. Advanced Undersampling and Data Cleaning
  47. Automated Feature Selection with Sklearn
  48. Automated Feature Selection with Boruta
  49. Notes on Matrix Factorization Machines
  50. Simple Techniques for Missing Data Imputation
  51. Notes on Semi-Supervised Learning
  52. Non-Parametric Regression
  53. Gaussian Process Regression and Classification
  54. Notes on Multiclass and Multitask Schemes
  55. Notes on Classification Probability Calibration