本番導入しようとして困った
某コンテストの予選の環境にLinuxコマンドの使えるホストが50個ほど必要だったので、LXD/LXCでまかなうことにした。
Failed to allocate directory watch: Too many open files
とか Failed to allocate directory watch: Too many open files
とかエラーが出て困っている人向け。
実験
環境
構築
$ sudo apt -y install software-properties-common $ sudo add-apt-repository ppa:ubuntu-lxc/lxd-stable $ sudo apt update && sudo apt dist-upgrade $ sudo apt -y install lxd zfs $ newgrp lxd $ sudo lxd init
LXDのコンテナを20個立ち上げてみる
手動で20回コマンド打つのはつらいのでスクリプト書いたほうが速いかもしれない。
ubuntu@lxd01:~$ lxc launch ubuntu:16.04 c01 Creating c01 Starting c01 ubuntu@lxd01:~$ lxc launch ubuntu:16.04 c02 Creating c02 Starting c02 ... ubuntu@lxd01:~$ lxc launch ubuntu:16.04 c20 Creating c20 Starting c20
立ち上がったコンテナを確認
ubuntu@lxd01:~$ lxc list +------+---------+----------------------+------+------------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +------+---------+----------------------+------+------------+-----------+ | c01 | RUNNING | 10.58.243.4 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c02 | RUNNING | 10.58.243.10 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c03 | RUNNING | 10.58.243.124 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c04 | RUNNING | 10.58.243.145 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c05 | RUNNING | 10.58.243.2 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c06 | RUNNING | 10.58.243.174 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c07 | RUNNING | 10.58.243.252 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c08 | RUNNING | 10.58.243.218 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c09 | RUNNING | 10.58.243.247 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c10 | RUNNING | 10.58.243.93 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c11 | RUNNING | 10.58.243.189 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c12 | RUNNING | 10.58.243.13 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c13 | RUNNING | 10.58.243.90 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c14 | RUNNING | 10.58.243.177 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c15 | RUNNING | 10.58.243.71 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c16 | RUNNING | 10.58.243.248 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c17 | RUNNING | | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c18 | RUNNING | | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c19 | RUNNING | | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c20 | RUNNING | | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+
なんかおかしい
16個くらいコンテナを立ち上げたあたりからIPアドレスが振られていない。 c16とc17のプロセスの状態を比較してみる。
ubuntu@lxd01:~$ lxc exec c16 -- ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.2 37396 2552 ? Ss 04:07 0:01 /sbin/init root 46 0.0 0.1 33436 1652 ? Ss 04:07 0:00 /lib/systemd/systemd-journald root 214 0.0 0.0 16128 80 ? Ss 04:09 0:00 /sbin/dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases -I -df root 306 0.0 0.1 26068 1096 ? Ss 04:09 0:00 /usr/sbin/cron -f root 308 0.0 0.1 20100 1268 ? Ss 04:09 0:00 /lib/systemd/systemd-logind root 310 0.0 0.3 436792 3176 ? Ssl 04:09 0:00 /usr/lib/accountsservice/accounts-daemon daemon 311 0.0 0.0 26044 880 ? Ss 04:09 0:00 /usr/sbin/atd -f syslog 313 0.0 0.1 186900 1428 ? Ssl 04:09 0:00 /usr/sbin/rsyslogd -n message+ 314 0.0 0.1 42896 1552 ? Ss 04:09 0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation root 324 0.0 0.2 65520 2724 ? Ss 04:09 0:00 /usr/sbin/sshd -D root 325 0.0 0.7 263660 7796 ? Ssl 04:09 0:00 /usr/lib/snapd/snapd root 340 0.0 0.1 12844 1056 console Ss+ 04:09 0:00 /sbin/agetty --noclear --keep-baud console 115200 38400 9600 linux root 363 0.0 0.3 277080 3292 ? Ssl 04:09 0:00 /usr/lib/policykit-1/polkitd --no-debug root 666 0.0 0.1 34424 1892 ? Rs+ 04:17 0:00 ps aux ubuntu@lxd01:~$ lxc exec c17 -- ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 37264 808 ? Ss 04:07 0:00 /sbin/init root 203 0.0 0.1 34424 1596 ? Rs+ 04:17 0:00 ps aux
c17以降はsystemdが立ち上がっていないことがわかった。
試しにc17でsshdを起動してみる。
root@c18:~# systemctl start ssh Failed to allocate directory watch: Too many open files Job for ssh.service canceled.
コンテナ側ではほとんどファイルを開いていないので、ホスト側の全体でのリソースの制限に引っかかってしまったようだ。
Failed to allocate directory watch: Too many open files
とか Failed to allocate directory watch: Too many open files
とか出たら同様の現象の可能性がある。
ドキュメントをよく読むと
LXDの公式リポジトリを見ていたら production-setup.md
にちゃんと本番導入する時に考慮しないといけないことが書いてあった。
対策
/etc/security/limits.conf
に以下の内容を追記
* soft nofile 1048576 * hard nofile 1048576 root soft nofile 1048576 root hard nofile 1048576 * soft memlock unlimited * hard memlock unlimited
/etc/sysctl.conf に以下の内容を追記
fs.inotify.max_queued_events = 1048576 fs.inotify.max_user_instances = 1048576 fs.inotify.max_user_watches = 1048576 vm.max_map_count = 262144
上記2点の作業をしたら再起動。
確認
どのコンテナもちゃんとプロセスが動いている事を確認。
ubuntu@lxd01:~$ lxc list +------+---------+----------------------+------+------------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +------+---------+----------------------+------+------------+-----------+ | c01 | RUNNING | 10.58.243.4 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c02 | RUNNING | 10.58.243.10 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c03 | RUNNING | 10.58.243.124 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c04 | RUNNING | 10.58.243.145 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c05 | RUNNING | 10.58.243.2 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c06 | RUNNING | 10.58.243.174 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c07 | RUNNING | 10.58.243.252 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c08 | RUNNING | 10.58.243.218 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c09 | RUNNING | 10.58.243.247 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c10 | RUNNING | 10.58.243.93 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c11 | RUNNING | 10.58.243.189 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c12 | RUNNING | 10.58.243.13 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c13 | RUNNING | 10.58.243.90 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c14 | RUNNING | 10.58.243.177 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c15 | RUNNING | 10.58.243.71 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c16 | RUNNING | 10.58.243.248 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c17 | RUNNING | 10.58.243.36 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c18 | RUNNING | 10.58.243.184 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c19 | RUNNING | 10.58.243.211 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ | c20 | RUNNING | 10.58.243.23 (eth0) | | PERSISTENT | 0 | +------+---------+----------------------+------+------------+-----------+ ubuntu@lxd01:~$ lxc exec c18 -- ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.3 0.1 37396 1184 ? Ss 04:39 0:00 /sbin/init root 45 0.1 0.1 33436 1172 ? Ss 04:39 0:00 /lib/systemd/systemd-journald root 50 0.0 0.0 41724 208 ? Ss 04:39 0:00 /lib/systemd/systemd-udevd root 240 0.0 0.0 16120 52 ? Ss 04:40 0:00 /sbin/dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases -I -df message+ 312 0.0 0.0 42896 616 ? Ss 04:40 0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation root 317 0.0 0.0 20100 392 ? Ss 04:40 0:00 /lib/systemd/systemd-logind root 319 0.1 0.0 198124 0 ? Ssl 04:40 0:00 /usr/lib/snapd/snapd root 322 0.0 0.0 27728 148 ? Ss 04:40 0:00 /usr/sbin/cron -f daemon 323 0.0 0.0 26044 264 ? Ss 04:40 0:00 /usr/sbin/atd -f root 324 0.1 0.0 634952 404 ? Ssl 04:40 0:00 /usr/lib/accountsservice/accounts-daemon syslog 325 0.0 0.0 186900 580 ? Ssl 04:40 0:00 /usr/sbin/rsyslogd -n root 326 0.0 0.0 65520 212 ? Ss 04:40 0:00 /usr/sbin/sshd -D root 348 0.0 0.0 14476 312 console Ss+ 04:40 0:00 /sbin/agetty --noclear --keep-baud console 115200 38400 9600 linux root 376 0.0 0.0 277180 368 ? Ssl 04:41 0:00 /usr/lib/policykit-1/polkitd --no-debug root 466 0.0 0.1 34424 1648 ? Rs+ 04:42 0:00 ps aux
めでたしめでたし。