LXDを本番導入するときに気をつけるべき事

本番導入しようとして困った

某コンテストの予選の環境にLinuxコマンドの使えるホストが50個ほど必要だったので、LXD/LXCでまかなうことにした。

Failed to allocate directory watch: Too many open files とか Failed to allocate directory watch: Too many open files とかエラーが出て困っている人向け。

実験

環境

構築

$ sudo apt -y install software-properties-common
$ sudo add-apt-repository ppa:ubuntu-lxc/lxd-stable
$ sudo apt update && sudo apt dist-upgrade
$ sudo apt -y install lxd zfs
$ newgrp lxd
$ sudo lxd init

LXDのコンテナを20個立ち上げてみる

手動で20回コマンド打つのはつらいのでスクリプト書いたほうが速いかもしれない。

ubuntu@lxd01:~$ lxc launch ubuntu:16.04 c01
Creating c01
Starting c01
ubuntu@lxd01:~$ lxc launch ubuntu:16.04 c02
Creating c02
Starting c02
...
ubuntu@lxd01:~$ lxc launch ubuntu:16.04 c20
Creating c20
Starting c20

立ち上がったコンテナを確認

ubuntu@lxd01:~$ lxc list
+------+---------+----------------------+------+------------+-----------+
| NAME |  STATE  |         IPV4         | IPV6 |    TYPE    | SNAPSHOTS |
+------+---------+----------------------+------+------------+-----------+
| c01  | RUNNING | 10.58.243.4 (eth0)   |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c02  | RUNNING | 10.58.243.10 (eth0)  |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c03  | RUNNING | 10.58.243.124 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c04  | RUNNING | 10.58.243.145 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c05  | RUNNING | 10.58.243.2 (eth0)   |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c06  | RUNNING | 10.58.243.174 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c07  | RUNNING | 10.58.243.252 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c08  | RUNNING | 10.58.243.218 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c09  | RUNNING | 10.58.243.247 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c10  | RUNNING | 10.58.243.93 (eth0)  |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c11  | RUNNING | 10.58.243.189 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c12  | RUNNING | 10.58.243.13 (eth0)  |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c13  | RUNNING | 10.58.243.90 (eth0)  |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c14  | RUNNING | 10.58.243.177 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c15  | RUNNING | 10.58.243.71 (eth0)  |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c16  | RUNNING | 10.58.243.248 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c17  | RUNNING |                      |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c18  | RUNNING |                      |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c19  | RUNNING |                      |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c20  | RUNNING |                      |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+

なんかおかしい

16個くらいコンテナを立ち上げたあたりからIPアドレスが振られていない。 c16とc17のプロセスの状態を比較してみる。

ubuntu@lxd01:~$ lxc exec c16 -- ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.2  37396  2552 ?        Ss   04:07   0:01 /sbin/init
root        46  0.0  0.1  33436  1652 ?        Ss   04:07   0:00 /lib/systemd/systemd-journald
root       214  0.0  0.0  16128    80 ?        Ss   04:09   0:00 /sbin/dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases -I -df
root       306  0.0  0.1  26068  1096 ?        Ss   04:09   0:00 /usr/sbin/cron -f
root       308  0.0  0.1  20100  1268 ?        Ss   04:09   0:00 /lib/systemd/systemd-logind
root       310  0.0  0.3 436792  3176 ?        Ssl  04:09   0:00 /usr/lib/accountsservice/accounts-daemon
daemon     311  0.0  0.0  26044   880 ?        Ss   04:09   0:00 /usr/sbin/atd -f
syslog     313  0.0  0.1 186900  1428 ?        Ssl  04:09   0:00 /usr/sbin/rsyslogd -n
message+   314  0.0  0.1  42896  1552 ?        Ss   04:09   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root       324  0.0  0.2  65520  2724 ?        Ss   04:09   0:00 /usr/sbin/sshd -D
root       325  0.0  0.7 263660  7796 ?        Ssl  04:09   0:00 /usr/lib/snapd/snapd
root       340  0.0  0.1  12844  1056 console  Ss+  04:09   0:00 /sbin/agetty --noclear --keep-baud console 115200 38400 9600 linux
root       363  0.0  0.3 277080  3292 ?        Ssl  04:09   0:00 /usr/lib/policykit-1/polkitd --no-debug
root       666  0.0  0.1  34424  1892 ?        Rs+  04:17   0:00 ps aux

ubuntu@lxd01:~$ lxc exec c17 -- ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  37264   808 ?        Ss   04:07   0:00 /sbin/init
root       203  0.0  0.1  34424  1596 ?        Rs+  04:17   0:00 ps aux

c17以降はsystemdが立ち上がっていないことがわかった。

試しにc17でsshdを起動してみる。

root@c18:~# systemctl start ssh
Failed to allocate directory watch: Too many open files
Job for ssh.service canceled.

コンテナ側ではほとんどファイルを開いていないので、ホスト側の全体でのリソースの制限に引っかかってしまったようだ。 Failed to allocate directory watch: Too many open files とか Failed to allocate directory watch: Too many open files とか出たら同様の現象の可能性がある。

ドキュメントをよく読むと

LXDの公式リポジトリを見ていたら production-setup.md にちゃんと本番導入する時に考慮しないといけないことが書いてあった。

github.com

対策

/etc/security/limits.conf に以下の内容を追記

*               soft    nofile          1048576
*               hard    nofile          1048576
root            soft    nofile          1048576
root            hard    nofile          1048576
*               soft    memlock         unlimited
*               hard    memlock         unlimited

/etc/sysctl.conf に以下の内容を追記

fs.inotify.max_queued_events = 1048576
fs.inotify.max_user_instances = 1048576
fs.inotify.max_user_watches = 1048576
vm.max_map_count = 262144

上記2点の作業をしたら再起動。

確認

どのコンテナもちゃんとプロセスが動いている事を確認。

ubuntu@lxd01:~$ lxc list
+------+---------+----------------------+------+------------+-----------+
| NAME |  STATE  |         IPV4         | IPV6 |    TYPE    | SNAPSHOTS |
+------+---------+----------------------+------+------------+-----------+
| c01  | RUNNING | 10.58.243.4 (eth0)   |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c02  | RUNNING | 10.58.243.10 (eth0)  |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c03  | RUNNING | 10.58.243.124 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c04  | RUNNING | 10.58.243.145 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c05  | RUNNING | 10.58.243.2 (eth0)   |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c06  | RUNNING | 10.58.243.174 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c07  | RUNNING | 10.58.243.252 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c08  | RUNNING | 10.58.243.218 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c09  | RUNNING | 10.58.243.247 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c10  | RUNNING | 10.58.243.93 (eth0)  |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c11  | RUNNING | 10.58.243.189 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c12  | RUNNING | 10.58.243.13 (eth0)  |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c13  | RUNNING | 10.58.243.90 (eth0)  |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c14  | RUNNING | 10.58.243.177 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c15  | RUNNING | 10.58.243.71 (eth0)  |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c16  | RUNNING | 10.58.243.248 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c17  | RUNNING | 10.58.243.36 (eth0)  |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c18  | RUNNING | 10.58.243.184 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c19  | RUNNING | 10.58.243.211 (eth0) |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+
| c20  | RUNNING | 10.58.243.23 (eth0)  |      | PERSISTENT | 0         |
+------+---------+----------------------+------+------------+-----------+


ubuntu@lxd01:~$ lxc exec c18 -- ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.3  0.1  37396  1184 ?        Ss   04:39   0:00 /sbin/init
root        45  0.1  0.1  33436  1172 ?        Ss   04:39   0:00 /lib/systemd/systemd-journald
root        50  0.0  0.0  41724   208 ?        Ss   04:39   0:00 /lib/systemd/systemd-udevd
root       240  0.0  0.0  16120    52 ?        Ss   04:40   0:00 /sbin/dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases -I -df
message+   312  0.0  0.0  42896   616 ?        Ss   04:40   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root       317  0.0  0.0  20100   392 ?        Ss   04:40   0:00 /lib/systemd/systemd-logind
root       319  0.1  0.0 198124     0 ?        Ssl  04:40   0:00 /usr/lib/snapd/snapd
root       322  0.0  0.0  27728   148 ?        Ss   04:40   0:00 /usr/sbin/cron -f
daemon     323  0.0  0.0  26044   264 ?        Ss   04:40   0:00 /usr/sbin/atd -f
root       324  0.1  0.0 634952   404 ?        Ssl  04:40   0:00 /usr/lib/accountsservice/accounts-daemon
syslog     325  0.0  0.0 186900   580 ?        Ssl  04:40   0:00 /usr/sbin/rsyslogd -n
root       326  0.0  0.0  65520   212 ?        Ss   04:40   0:00 /usr/sbin/sshd -D
root       348  0.0  0.0  14476   312 console  Ss+  04:40   0:00 /sbin/agetty --noclear --keep-baud console 115200 38400 9600 linux
root       376  0.0  0.0 277180   368 ?        Ssl  04:41   0:00 /usr/lib/policykit-1/polkitd --no-debug
root       466  0.0  0.1  34424  1648 ?        Rs+  04:42   0:00 ps aux

めでたしめでたし。