Instructor: Philip Austin (Earth, Ocean and Atmospheric Sciences, UBC)

Title: Parallelization in Python 3 with large datasets

The objective is to learn how to write parallel Python programs that make use of multiple cores on a single node. The tutorial will introduce several python modules that schedule operations and manage data to simplify multiprocessing with Python.

Target audience: Researchers interested in Python programming on multiple core machines

Course plan:

  1. Benchmarking parallel code
  2. Understanding the global interpreter lock (GIL)
  3. Multiprocessing and multithreading with joblib
  4. Checkpointing/restarting multiprocessor jobs
  5. Writing extensions that release the GIL:
    1. Using numba
    2. Using cython
    3. Using C++ and pybind11
  6. Using xarray to analyze out-of-core datasets
  7. Using dask and xarray to compute on multiple cores
  8. Visualizing parallelization with dask
  9. Setting up a conda-forge environment for parallel computing

Duration: 3 hours

Level: intermediate

Prerequisites: Some familiarity with Jupyter notebooks, Python and numpy at the level of Jake Vanderplas’ Whirlwind tour of Python.

Setup: