Statistical/ML modeling with scikit-learn

The sckit-learn module is a full featured Python module for all kinds of data analysis and predictive modeling algorithms. In the pcda class we did one session at the end of the semester that just introduced this library and did some basic statistical/ML modeling. We’ll pick up where that session left off and dive a little deeper both into some advanced modeling concepts as well as some of of sklearn’s features such as preprocessors and pipelines.

Through this module you will:

  • review using sklearn to train and use statistical and machine learning models,

  • learn about and build ensemble models in sklearn,

  • use cookiecutter templates to create project folder/file structures and learn concepts for data science project file management,

  • learn about regularization methods within the context of building classifier models for the Pump it Up competition,

  • build logistic and tree based classifiers in sklearn using preprocessors and pipelines.