Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

A containerized version of LambdaStack is available on the SAIL nodes. This container provides a working installation of PyTorch, TensorFlow, CUDA and cuDNN.

This page is a survival guide to get started with this container and use it with your projects. It assumes that you have an opened SSH session on one of the machines.

Start the container

First check that the container image is available using the docker image ls command:

$ docker image ls
REPOSITORY        TAG                       IMAGE ID       CREATED         SIZE
lambda-stack      20.04                     abe4a492cee1   6 hours ago     12GB
ubuntu            latest                    df5de72bdb3b   3 weeks ago     77.8MB
ubuntu            20.04                     3bc6e9f30f51   3 weeks ago     72.8MB
debian            latest                    07d9246c53a6   3 weeks ago     124MB
nvidia/cuda       11.0.3-base-ubuntu20.04   8017f5c31b74   5 weeks ago     122MB
hello-world       latest                    feb5d9fea6a5   11 months ago   13.3kB

You should see the lambda-stack image in the list, this is the one we will use for now.

To start a program in the container, use the docker run --rm command:

$ docker run --rm lambda-stack:20.04 pwd
/root

The --rm flag ensures that the container is stopped and cleanup after usage. TODO clarify/check this

To run an interactive command like bash or python interpreter, add the -it flag:

$ docker run --rm -it lambda-stack:20.04 bash
root@fa416f6b82f5:~# pwd
/root
root@fa416f6b82f5:~# exit
$

So far, the container does not have access to the GPUs. To give it access to them, you need to change the runtime to nvidia and explicitly specify a list of GPUs. The following example uses the first 2 GPUs:

$ docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0,1 --rm lambda-stack:20.04 nvidia-smi
Thu Sep  1 05:40:13 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   48C    P0   166W / 400W |  50691MiB / 81920MiB |     73%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM...  On   | 00000000:0B:00.0 Off |                    0 |
| N/A   48C    P0   244W / 400W |  34161MiB / 81920MiB |     96%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

This method does not prevent multiple containers to access the same GPUs. Therefore, make sure to check with other users which GPUs they are using.

This method does ensure that your container will not use by mistake any other GPU than the one specified.

TODO mention --gpu too?

Access a folder from the container

A container is isolated from the host environment by default. Bind mounts allow you to mount a folder from the host machine into the container.

Specify the path to the directory on host and the corresponding path inside the container using the -v flag:

-v <path on host>:<path in container>

For example, assuming you have a project folder in $HOME/my_project and want to access it as /my_project in the container, you would use:

$ docker run -v $HOME/my_project:/my_project --rm lambda-stack:20.04 ls /
TODO output

Any program running in the container can then access and modify files in the /my_project folder.

You can repeat the -v flag to mount multiple folders in the container.

Add packages to the container

The container may not have all the packages you need. To add more packages, you can create a new container based on the LambdaStack one.

To create a container, you need a Dockerfile definition file. It contains the information about the base container and the installations instructions for the additional packages.

In the following Dockerfile example, the Transformers library (PyTorch version) from Hugging Face is added to the LambdaStack container:

FROM lambda-stack:20.04
RUN pip install pip install transformers[torch]

To build the corresponding container, first create an empty folder and save the Dockerfile in it:

$ mkdir hugging_container
$ echo "FROM lambda-stack:20.04" > hugging_container/Dockerfile
$ echo "RUN pip install transformers[torch]" >> hugging_container/Dockerfile

then use the docker build command to generate the new container:

$ cd hugging_container
$ docker build -t pytorch-transformers .

The -t flag is used to tag the container, making it easier to find and use it later.

Use docker image ls to check the availability of the image:

$ docker image ls
REPOSITORY             TAG                       IMAGE ID       CREATED          SIZE
pytorch-transformers   latest                    432c6be0a999   13 seconds ago   12.1GB
lambda-stack           20.04                     abe4a492cee1   6 days ago       12GB
ubuntu                 latest                    df5de72bdb3b   4 weeks ago      77.8MB
ubuntu                 20.04                     3bc6e9f30f51   4 weeks ago      72.8MB
debian                 latest                    07d9246c53a6   4 weeks ago      124MB
nvidia/cuda            11.0.3-base-ubuntu20.04   8017f5c31b74   6 weeks ago      122MB
hello-world            latest                    feb5d9fea6a5   11 months ago    13.3kB

You can now use it in place of the LambdaStack container:

$ docker run --rm pytorch-transformers \
    python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I love you'))"
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading config.json: 100%|██████████| 629/629 [00:00<00:00, 1.45MB/s]
Downloading pytorch_model.bin: 100%|██████████| 255M/255M [00:10<00:00, 24.5MB/s]
Downloading tokenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<00:00, 72.8kB/s]
Downloading vocab.txt: 100%|██████████| 226k/226k [00:00<00:00, 295kB/s]
[{'label': 'POSITIVE', 'score': 0.9998656511306763}]

Containers can be deleted using the docker image rm command. For example, remove the pytorch-transformers container as follows:

$ docker image rm pytorch-transformers
Untagged: pytorch-transformers:latest
Deleted: sha256:432c6be0a999484db090c5d9904e5c783454080d8ad8bc39e0499ace479c4559
Deleted: sha256:623ae3b33709c2fc4c40bc2c3959049345fee0087d39b4f53eb95aefd1c16f7d

Next steps

TODO list references to go beyond the basics

  • No labels