Infrastructure for Deep Learning

Infrastructure for Deep Learning

Last year I freed up some time to follow along with the MOOC “Practical Deep Learning for Coders” at fast.ai. It’s a great introduction if you want to dive right into the material and get your hands dirty. However, I am a little allergic to statements like “it only requires 7 lines of code!” so the example notebooks are too magical for my taste. I quickly found myself peeling off the layers of abstraction and digging into the utility classes. This can be difficult because of all the abbreviated variable and function names, and the lack of comments in the code itself. After a while it started to feel like more work than building the whole thing from the ground up.

Infra

At the same time, I wanted to understand what is involved in actually running this code in the cloud. There are some very userfriendly services such as Paperspace and Crestle which you can use to start coding without having to worry about infrastructure. But I like to know my options and also save some money, so I opted for the DIY approach.

I started out in AWS for which Jeremy provided some AMI’s. However, using these AMI’s you need to pay for the EBS volume even when the machine is not running. So I followed some guides that had a more sophisticated approach, and mounted the volume on boot. This was fun to play around with, but took some time. The field is developing rapidly, and blogs and guides with installation instructions from even a few months are typically outdated. I found myself struggling to get the stack right: choose the right version of CUDA (not too new, not too old), installing Theano, Keras, all the libraries.

Then halfway through, it became clear that part two of the couse would use Tensorflow, so I installed the whole stack again. I learned a lot but also spend a lot of time on infra, that I didn’t spend on actually applying deep learning.

So I decided to do a second pass of the course. Anno 2018, the course is based on PyTorch so I’ll start from scratch again. I will still be running my own infra, but opt for pre-build images or Docker images whereever possible. The aim is to do more coding this time.

My setup this year

I’m going to use Google Cloud Platform because it’s been a while since I last worked on GCP and I like to keep up with the clouds. I found this guide with Deep Learning VMs and booted one in my subscription. Next was installing Anaconda, cloning the fast.ai repo and installing all the Python packages. But while installation appeared to be fine, I could not load the libraries in my Juyter notebook. Possibly this is caused by the fact that the VM image has both Python 2 and 3 and this makes things unneccesary complex. I would have much preferred two seperate VM images for Python 2 /3. I have gone down this road (debugging installations by other people) before and do not like it.

Docker

My work this year has involved tons of Docker/Kubernetes. I was always skeptical about Docker for production usage, but the tooling has improved much and it really has a lot of benefits. My next thought was that I could isolate the (rapidly changing) Python environment from the more stable VM using Docker.

So I went out to find a Docker image I could use for the fast.ai course. It turns out that paperspace/fastai:cuda9_pytorch0.3.0 exists. However, this has the whole cats vs dogs dataset embedded which I don’t think is a great idea!

Via the forums I found this post which refers to this container by Kai Lichtenberg and it worked perfect for me. It is based of nvcr.io/nvidia/pytorch:18.05-py3 which has the required drivers, Python 3 and even PyTorch. Note that this image is not available on Dockerhub so you’ll need to head over to NVIDIA Cloud which is just another Docker registry. After signing up you get an API key which can be used to to pull the Docker containers.

I mount my data and notebooks into the Docker image on startup like this: docker run --runtime=nvidia -it --rm --ipc=host -v ~/data:/fastai/data -p 8888:8888 kailicht/fastdotai.

Note the --ipc=host flag, this is really important! Everything was working for me, but training a model was really slow. I found this explanation: “Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with –ipc=host or –shm-size command line options to nvidia-docker run.”

So it still took me some hours but I now have nice setup that I can easily reproduce.