Real data issues in Machine Learning, and how to handle them
Instructor: Giovane Cesar da Silva (IBM)
Typically, when someone learns Machine Learning methods, the example data sets are pre-processed and cleaned. When working in professional projects, the state of the data is unknown and requires effort to make it usable even for simple statistics.
In this hands-on workshop, you will:
- practice data understanding,
- ingest data with bad format,
- apply standard data quality tests,
- apply data specific quality tests, and
- use some data patching algorithms.
Target audience: general
Duration: 3 hours
Level: beginner to intermediate
Prerequisites: Knowledge of Python language, package management, and Pandas data processing.
Laptop software: All attendees will need to bring laptops with wireless access with Anaconda 2019.03 installed and updated and configured in your preferred IDE.