Pump it Up: Data Prep

In this submodule, we’ll do a little EDA data prep for the pumpitup project. This isn’t meant to be an exhaustive example as our focus is really on classification modeling in Module 2. Nevertheless, there are some useful tips in here including:

  • automated EDA tools for Python,

  • doing factor lumping with a port of the R package, forcats,

  • creating a data prep script,

  • getting your data ready for use with sklearn for classification models.

You’ll be working in your newly created pumpitup project folder.

Start by opening the data_prep.ipynb notebook in Jupyter Lab.

Here is a screencast to help guide you through the notebook:

Move on to the last submodule, Classification models for Pump it Up project.