Study: DeveloperTools(DevTool)/DevTool: Container

[Docker] docker ๋‚ด๋ถ€ nvidia gpu ์‚ฌ์šฉํ•˜๊ธฐ(feat. nvidia-docker, NVIDIA Container Toolkit)

DrawingProcess 2024. 1. 28. 01:01
๋ฐ˜์‘ํ˜•
๐Ÿ’ก ๋ณธ ๋ฌธ์„œ๋Š” '[Docker] docker ๋‚ด๋ถ€ nvidia gpu ์‚ฌ์šฉํ•˜๊ธฐ(feat. nvidia-docker, NVIDIA Container Toolkit)'์— ๋Œ€ํ•ด ์ •๋ฆฌํ•ด๋†“์€ ๊ธ€์ž…๋‹ˆ๋‹ค.
~~~์ •๋ฆฌํ•˜์˜€์œผ๋‹ˆ ์ฐธ๊ณ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

1. 

1) Installing the NVIDIA Container Toolkit

Installing with Apt

Configure the production repository:Optionally, configure the repository to use experimental packages:

$ sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update the packages list from the repository:

$ sudo apt-get update

Install the NVIDIA Container Toolkit packages:

$ sudo apt-get install -y nvidia-container-toolkit

apt ์™ธ์˜ yum, Dnf, Zypper ๋“ฑ์œผ๋กœ ์„ค์น˜ํ•˜๊ณ  ์‹ถ์€ ๊ฒฝ์šฐ Installing the NVIDIA Container Toolkit๋ฅผ ์ฐธ๊ณ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

Configuration Prerequisites

  • You installed a supported container engine (Docker, Containerd, CRI-O, Podman).
  • You installed the NVIDIA Container Toolkit.

Configuring Docker

Configure the container runtime by using the nvidia-ctk command:The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host. The file is updated so that Docker can use the NVIDIA Container Runtime.

$ sudo nvidia-ctk runtime configure --runtime=docker

Restart the Docker daemon:

$ sudo systemctl restart docker

container engine ์œผ๋กœ docker ์™ธ์˜ containerd๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ Installing the NVIDIA Container Toolkit๋ฅผ ์ฐธ๊ณ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

2) Installing the nvidia-docker

Installing with Apt

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

gpgkey ์ž…๋ ฅ ๋ฐ stable ์ €์žฅ์†Œ ์ถ”๊ฐ€ ํ›„ 

# apt-get update
# apt-get install nvidia-docker2 -y
# systemctl restart docker

์œ„์™€ ๊ฐ™์ด repository update ๋ฐ nvidia-docker2 ๋ฅผ ์„ค์น˜ ํ›„ docker restart ํ›„ ๊ฐ„๋‹จํ•˜๊ฒŒ ์„ค์น˜๊ฐ€ ์™„๋ฃŒ ๋ฌ์Šต๋‹ˆ๋‹ค.

2. nvidia-docker GPU ํ• ๋‹นํ•˜์—ฌ ์‚ฌ์šฉ ํ•˜๋Š” ๋ฐฉ๋ฒ• 3๊ฐ€์ง€

1) NV_GPU

$ NV_GPU=0,1 nvidia-docker run -it nvcr.io/nvidia/tensorflow:20.12-tf1-py3

GPU ์ค‘์— 2๊ฐœ๋งŒ ํ• ๋‹นํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” NV_GPU=0,1 ์˜ ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜์—ฌ nvcr.io/nvidia/tensorflow:20.12-tf1-py3 ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ container์— ์ง„์ž…ํ•˜์—ฌ ํ™•์ธ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋ ‡๋‹ค๋ฉด ๋‹ค๋ฅธ ์˜ต์…˜์€ ์—†์„๊นŒ์š”? ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์œผ๋กœ๋„ GPU๋ฅผ ํ• ๋‹นํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ๋ ค๋“œ๋ฆด๊ป˜์š”.

2) NVIDIA_VISIBLE_DEVICES

$ docker run -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0,1 nvcr.io/nvidia/tensorflow:20.12-tf1-py3

runtime=nvidia๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ -e ํ™˜๊ฒฝ๋ณ€์ˆ˜๋กœ NVIDIA_VISIBLE_DEVICES=2,3 GPU ID ํ˜น์€ UUID๋“ฑ์œผ๋กœ ์„ค์ •ํ•˜์—ฌ GPU๋ฅผ ํ• ๋‹น ํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ธฐ์กด NV_GPU์™€๋Š” ๋‹ค๋ฅด๊ฒŒ nvidia-docker๋กœ ์‹œ์ž‘ํ•˜๋Š” ์ปค๋งจ๋“œ๊ฐ€ ์•„๋‹Œ docker๋กœ ์‹œ์ž‘ํ•˜๋Š” command์ด๋‹ˆ ์ž˜ ๊ธฐ์–ตํ•ด ๋‘์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

๋˜ํ•œ ๋น„๊ต๊ฐ€ ๋˜๊ธฐ ์œ„ํ•ด์„œ NV_GPU ์™€๋Š” ๋‹ค๋ฅธ GPU๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋Š”๋ฐ์š”. ์œ„ NV_GPU ์‚ฌ์ง„์—์„œ Bus-Id์™€ ํ˜„์žฌ NVIDIA_VISIBLE_DEVICES๋ฅผ ๋น„๊ตํ•ด ๋ณด์‹œ๋ฉด ๊ฐ๊ฐ ๋‹ค๋ฅธ GPU๋ฅผ ์‚ฌ์šฉํ•œ๊ฑธ ์•„์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3) --gpus

$ docker run -it --gpus '"device=0,1,2,3"' nvcr.io/nvidia/tensorflow:20.12-tf1-py3

๋‹ค์Œ์œผ๋กœ๋Š” gpus ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

gpus์˜ต์…˜์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์œ„์™€ ๊ฐ™์ด docker run์œผ๋กœ ์‹œ์ž‘ํ•˜๋Š”๋ฐ์š”. --gpus ๋’ค์— '"device="' ์˜ต์…˜์—์„œ๋Š” ํฐ ๋”ฐ์˜ดํ‘œ์™€ ์ž‘์€ ๋”ฐ์˜ดํ‘œ๊ฐ€ ๋ฐ˜๋“œ์‹œ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ  GPU๋ฅผ ํ•œ๋ฒˆ์— ๋‹ค ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” --gpus all ์ด๋ผ๋Š” ์˜ต์…˜์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ์‚ฌ์šฉํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๋ชจ๋“  GPU๋ฅผ ํ• ๋‹นํ•˜์—ฌ ์ปจํ…Œ์ด๋„ˆ๋กœ ์ง„์ž…ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

$ docker run -it --gpus all nvcr.io/nvidia/tensorflow:20.12-tf1-py3

NV_GPU ? NVIDIA_VISIBLE_DEVICES? gpus?

์ด์ œ nvidia gpu ํ• ๋‹น์„ ์ž˜ ํ•˜์‹ค์ˆ˜ ์žˆ์œผ์‹คํ…๋ฐ์š”. NV_GPU์™€ NVIDIA_VISIBLE_DEVICES ๋ฐ gpus ์ฐจ์ด์ ์€ ์–ด๋–ค ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€ ๊ถ๊ธˆํ•˜์‹  ๋ถ„๋“ค์ด ์žˆ์œผ์‹คํ…๋ฐ์š”.nvidia-docker2๊ฐ€ ์„ค์น˜๊ฐ€ ๋˜์–ด ์žˆ๋Š” ์ƒํƒœ์—์„œ๋Š” ์–ด๋– ํ•œ command๋กœ๋„ ์‚ฌ์šฉ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

๋‹ค๋งŒ ์ฐจ์ด์ ์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • NV_GPU = nvidia-docker
  • NVIDIA_VISIBLE_DEVICES = nvidia-docker2
  • gpus '"device="' = nvidia-docker2

nvidia-docker ๋ฒ„์ „์˜ ์ฐจ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ƒ์œ„๋ฒ„์ „์€ ํ•˜์œ„ํ˜ธํ™˜๋˜๊ธฐ ๋•Œ๋ฌธ์— ํŽธํ•˜์‹  ์ปค๋งจ๋“œ๋ฅผ ์ด์šฉํ•˜์…”์„œ docker๋ฅผ ์‚ฌ์šฉํ•˜์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ฐธ๊ณ 

๋ฐ˜์‘ํ˜•