Up and Running With Fast.ai and Docker · The Future Has Arrived

Last Monday marked the start of the latest series of Fast.ai courses: Cutting Edge Deep Learning For Coders. If you have an interest in data science and haven’t heard of Fast.ai, you should check them out. Fast.ai is a community started by Jeremy Howard and Rachael Thomas in 2016. It now includes an impressive set of courses and a machine learning library by the same name. What sets them apart is their practical no nonsense approach to solving data science problems by example. In this post, I’m sharing two Fast.ai docker files which provide a data science environment based on the Fast.ai library, as well some tips for getting up and running with docker quickly.

Why Docker?

Docker provides a software layer that sits above the operating system to support containerization. Virtual machines have been around for years but docker is more lightweight. With the development of Nvidia-Docker, GPU support is baked into the docker environment.

Three reasons you might want to use docker for data science are:

Plug and Play: Once you’ve installed the Nvidia-Docker server on your host machine, you run a docker image to create a container. There are thousands of docker images pre-build by companies like Nvidia with software like CUDA already installed. Once you have the image you need, you’re ready to go.

Easy Configuration: Your docker image is based on a docker file which is a script for building the image. Adding or removing software is as easy as modifying the docker file and rebuilding the image. From the docker file you can also import an existing image and add to it using the FROM command (see the Docker Files section below for details)

Containerization: Because the docker container separates your operational software environment from the host operating system, you get the benefits of containerization. There are lots of benefits to containerization, but the one I really like is the capability to manage potential software conflicts or dependency issues at the container level without permanently changing the operating system. If things get ugly you can erase the docker image as if it never existed.

Caveats

Firstly, Docker containers need root privileges to run, which may pose security issues in some corporate settings.

Secondly, Docker doesn’t run natively on Windows or OSX. To use docker with these platforms you need to introduce an additional layer of virtualization, such as Docker For Windows or Docker for Mac. According to this article performance is improving, but still lags bare metal Linux installations. However nvidia-docker is still not supported on Windows and Mac. Given the importance of a GPU for deep learning, Docker probably doesn’t make sense if you want to use Windows or Mac. The docker files have been developed for systems with a GPU supported by nvidia-docker.

Installation on Host

The homepage for Nvidia-Docker provides a useful starting point for installing the software on your host machine. There are also many tutorials online providing guides for the various flavors of linux and other operating systems. Nvidia-Docker is an additional software package that supplements the core docker installation. Its job is to interface with the Nvidia drivers on the host which control the GPU hardware. The Nvidia driver version on the host machine will determine the version of CUDA you can run in the container. Once you know the driver version you have on the host, you can check the compatible CUDA version here.

Docker Files

The repo contains two docker files

fastai.latest.cuda8
fastai.latest.cuda9

You can access the repo here. The docker files support nvidia-docker for versions 8 and 9 of cuda respectively. Both images inherit from an ubuntu16.04 image with CUDA. The differences between the two files are:

The cuda version installed on the OS, being 8 and 9 respectively.

FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04

FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04

The python package to interface with CUDA being cuda80 and cuda90 respectively. The Fastai python environment is created from the environment.yml file included in the Fastai github repository. To support cuda 8, we replace the cuda90 python package provided in environment.yml with the cuda80 package:
```
#           FASTAI

# clone fastai repo
RUN git clone https://github.com/fastai/fastai.git /usr/local/fastai

# replace cuda90 package with cuda80 for host machines supporting cuda8
RUN sed -i -e 's/cuda90/cuda80/' /usr/local/fastai/environment.yml
```

When run, the docker container automatically starts the Jupyter Notebook server with a default password: fastai.

If you need to change the password, refer to the section of the docker file titled Start Up :

run the code below in a Jupyter Notebook to generate a new password key
```
from notebook.auth import passwd; passwd()
```

use the key generated above to update the NotebookApp.password attribute in the docker file

NotebookApp.password='sha1:a60ff295d0b9:506732d050d4f50bfac9b6d6f37ea6b86348f4ed'"

. rebuild the docker image. (refer to the Docker Quickstart Commands section below)

#           START UP

# start jupyter server specifying password: fastai
# in jupyter notebook run the following to generate custom password key and update --NotebookApp.password=
# from notebook.auth import passwd; passwd()

CMD /bin/bash -c "source activate fastai && jupyter notebook --allow-root --no-browser --NotebookApp.password='sha1:a60ff295d0b9:506732d050d4f50bfac9b6d6f37ea6b86348f4ed'"

Docker Quickstart Commands

You’ll find a full command line reference at docker.com. Don’t be overwhelmed by the number of commands. Chances are you’ll only need to remember a handful of them.

docker build: Builds an image from a docker file.

Example

# Arguments
# ---------
# -f path to local docker file 
# -t tag(name) of new image
# . passes the current directory as the build context

cd /home/nick/docker/ml/ml-gpu/; 
docker build -f /home/nick/docker/ml/ml-gpu/fastai.latest.cuda9 -t ireland/fastai.cuda9:latest .

docker run: Runs a container (created from an image if it doesn’t yet exist).

Example

# Here we run a container without passing in additional commands. 
# The fast.ai containers start a Jupyter Notebook server that runs in the background

# Arguments
# ---------
# --rm remove existing container if it exists
# -d detach from the terminal
# --name assign a name to the new container
# -p map port 8888 on the host to 8888 in the container
# -v map file path /home/nick on the host to /home in the container
# ireland/fastai.cuda9:latest the docker image to start

nvidia-docker run --rm -d --name fastai -p 8888:8888 -v /home/nick:/home -v /home/nick/data:/data ireland/fastai.cuda9:latest

docker exec Run a command in a running docker container.

Example

# Attach current command window to a running docker container. 
# This allows us to control the running container from the command window
# Arguments
# -i interactive mode
# -t allocate a terminal to the container
# fastai (container name)
# bash (command to run)

docker exec -it fastai bash

docker images: List docker images

docker ps: List running docker containers

docker system df: Show docker disk usage

docker system prune: Purges any stopped containers, the build cache and dangling images. A dangling image occurs when you rebuild an image without assigning a new name. The old version is kept and continues to take up disk space.

Customize Docker Image

In the course of experimentation you will probably discover the need for additional python package or software tools. Here are two simple examples of how you can modify the docker file to include new python packages.

Using Conda

# install packages: numpy-index 
# add the following before the cleanup section of the dockerfile
RUN /bin/bash -c "source activate fastai && conda install -y numpy-indexed"

Install From Source

# ML-From-Scratch
RUN git clone https://github.com/eriklindernoren/ML-From-Scratch /usr/local/ML-From-Scratch
RUN cd /usr/local/ML-From-Scratch && /bin/bash -c "source activate fastai && python setup.py install"

Summary

To get started there are 5 steps you need follow:

Install Nvidia-Docker on your host machine.
Download the docker file you need

Build the docker file

cd <path to dockerfile>; docker build -f <path to dockerfile> -t <img name> .

Run the docker image

nvidia-docker run --rm -d --name fastai -p 8888:8888 -v <host workspace path>:/home -v <host data path>:/data <img name>

In your web browser, navigate to localhost:8888 to access the Jupyter environment