Instructor: Philip Austin (Earth, Ocean and Atmospheric Sciences, UBC)
Title: Parallelization in Python 3 with large datasets
The objective is to learn how to write parallel Python programs that make use of multiple cores on a single node. The tutorial will introduce several python modules that schedule operations and manage data to simplify multiprocessing with Python.
Target audience: Researchers interested in Python programming on multiple core machines
Course plan:
- Benchmarking parallel code
- Understanding the global interpreter lock (GIL)
- Multiprocessing and multithreading with joblib
- Checkpointing/restarting multiprocessor jobs
- Writing extensions that release the GIL:
- Using numba
- Using cython
- Using C++ and pybind11
- Using xarray to analyze out-of-core datasets
- Using dask and xarray to compute on multiple cores
- Visualizing parallelization with dask
- Setting up a conda-forge environment for parallel computing
Duration: 3 hours
Level: intermediate
Prerequisites: Some familiarity with Jupyter notebooks, Python and numpy at the level of Jake Vanderplas’ Whirlwind tour of Python.
Setup:
-
All the examples in the tutorial should work on Windows, MacOS or Linux laptops that have Miniconda 3.6 installed.
-
Python installation: https://clouds.eos.ubc.ca/~phil/courses/parallel_python/00_intro.html
-
Course notes: https://clouds.eos.ubc.ca/~phil/courses/parallel_python/index.html
-
Course notebooks: (under construction) download from https://github.com/phaustin/parallel_python_course