Python Libraries for Researchers

Instructor: Ian Percel (Research Computing Services)

An introduction to some of the most fundamental and powerful data structuring libraries in Python including NumPy, Pandas, and SciPy.

This course will begin with an introduction to the key ideas of navigating some standard advanced data structures: NumPy Arrays, Pandas Series, and Pandas DataFrames. Building on this, we will discuss how Arrays interact with one another through Linear Algebra and how DataFrames interact through SQL-like operations. Finally, we will illustrate how these seemingly different techniques can be melded into a single complex analysis that weaves together relationships in numerical and categorical data. The type of analysis that we will illustrate is typical of preparing data to function as a training set for a machine learning classifier.

Target audience: researchers who work on complex structured data, numerical analysis, or machine learning

Duration: 3 hours

Level: intermediate

Prerequisites: This course assumes a significant familiarity with basic python syntax for variable declaration, function definition and use, and iteration. Prior experience with relational algebra (SQL) and linear algebra would be helpful but are not required.

Laptop software: All attendees will need to bring their laptops with wireless access and with a remote SSH client installed (on Windows laptops we recommend the free edition of MobaXterm; on Mac and Linux laptops no need to install anything).