Docker资源控制

本文主讲Docker的资源控制及其原理

老规矩~

图0. 呆头灰鸟

下面开始正文~

前言

这篇文章原本是写在我的另外一篇文章“Docker安全机制”里的,但是写着写着发现太长了而且把文章主题带偏了,因此我决定单独开出一篇文章写这个

全文按照内存、CPU、磁盘资源调控几部分进行描述,相关内容均参照官方文档

内存资源限制

Option(选项)Description(描述)
-m or --memory=The maximum amount of memory the container can use. If you set this option, the minimum allowed value is 4m (4 megabyte).
最大的可用内存
--memory-swap*The amount of memory this container is allowed to swap to disk. See --memory-swap details.
允许的swap大小
--memory-swappinessBy default, the host kernel can swap out a percentage of anonymous pages used by a container. You can set --memory-swappiness to a value between 0 and 100, to tune this percentage. See --memory-swappiness details.
默认情况下,主机内核可以交换容器使用的匿名页面的百分比。可以通过这个选项调整这个值
--memory-reservationAllows you to specify a soft limit smaller than --memory which is activated when Docker detects contention or low memory on the host machine. If you use --memory-reservation, it must be set lower than --memory for it to take precedence. Because it is a soft limit, it does not guarantee that the container doesn’t exceed the limit.
内存保留大小
--kernel-memoryThe maximum amount of kernel memory the container can use. The minimum allowed value is 4m. Because kernel memory cannot be swapped out, a container which is starved of kernel memory may block host machine resources, which can have side effects on the host machine and on other containers. See --kernel-memory details.
容器可使用的内核的最大的内存大小
--oom-kill-disableBy default, if an out-of-memory (OOM) error occurs, the kernel kills processes in a container. To change this behavior, use the --oom-kill-disable option. Only disable the OOM killer on containers where you have also set the -m/--memory option. If the -m flag is not set, the host can run out of memory and the kernel may need to kill the host system’s processes to free memory.
是否允许OOM发生时自动kill以释放内存,此选项默认开启

特别注意

  • If --memory-swap is set to a positive integer, then both --memory and --memory-swap must be set. --memory-swap represents the total amount of memory and swap that can be used, and --memory controls the amount used by non-swap memory. So if --memory="300m" and --memory-swap="1g", the container can use 300m of memory and 700m (1g - 300m) swap. 即,如果 --memory-swap 设置为一个正整数, --memory-swap 也必须一同设置。同时, --memory-swap 设置的是内存和swap的总量,假设这个值设置的1000M,memory设置300M,那么实际swap只有700M可用
  • If --memory-swap is set to 0, the setting is ignored, and the value is treated as unset. 如果设置为0.即为未设置
  • If --memory-swap is set to the same value as --memory, and --memory is set to a positive integer, the container does not have access to swap. See Prevent a container from using swap. 即,如果这两个选项设置相同的值(正整数),该容器将不被允许使用swap
  • If --memory-swap is unset, and --memory is set, the container can use twice as much swap as the --memory setting, if the host container has swap memory configured. For instance, if --memory="300m" and --memory-swap is not set, the container can use 300m of memory and 600m of swap. 即,如果只设置了memory而没有设置memory-swap,那么容器默认将只被允许使用两倍大小的memory
  • If --memory-swap is explicitly set to -1, the container is allowed to use unlimited swap, up to the amount available on the host system. 即,如果memory-swap设置为-1,容器将被允许使用无限的swap
  • Inside the container, tools like free report the host’s available swap, not what’s available inside the container. Don’t rely on the output of free or similar tools to determine whether swap is present. 特别注意:在容器内,类似free这样的工具报告的可用交换空间是宿主机的数据,而不是容器内可用的交换。不要依赖free或类似工具的输出来确定是否存在swap。

--memory-swappiness详细

在开始这个话题之前,先要指导一个东西—— anonymous page ,即匿名页。匿名页指的是没有文件映射的内存页,比如程序运行时产生的堆栈,这些数据在没有swap时需要常驻内存,有swap时可以回写到磁盘以减少内存使用,但是会增加额外的开销,因为有IO操作了。除去匿名页,还会存在有文件背景的内存页面,当文件为小文件时,常规操作是打开文件描述符后,将文件整个读入到内存中,但是这个操作对于大文件并不可行,大文件一般使用mmap来进行映射,省去了磁盘和内存间拷贝这个过程。下到操作系统,操作系统会进行缓存,命中缓存的文件加载速度可以加快,因为省去了读取文件系统,cached没有命中的情况下会继续向下到文件系统的buffer中,进而加快文件操作速度。

  • A value of 0 turns off anonymous page swapping. - 0关闭匿名页交换
  • A value of 100 sets all anonymous pages as swappable. - 设置100表示所有的匿名页允许被交换
  • By default, if you do not set --memory-swappiness, the value is inherited from the host machine. - 如果没有设置,这个值将会从宿主机继承

--kernel-memory详细

内核内存限制表示分配给容器的总可用内存,需要考虑如下内容

  • Unlimited memory, unlimited kernel memory(无限制的内存和内核内存): This is the default behavior. - 默认行为
  • Unlimited memory, limited kernel memory(无限制的内存,内核内存受限): This is appropriate when the amount of memory needed by all cgroups is greater than the amount of memory that actually exists on the host machine. You can configure the kernel memory to never go over what is available on the host machine, and containers which need more memory need to wait for it. - 当所有cgroup所需的内存量大于主机上实际存在的内存量时,这是合适的。配置内核内存以控制内存使用永远不会超过主机上可用的内存
  • Limited memory, unlimited kernel memory: The overall memory is limited, but the kernel memory is not. - 整体受限
  • Limited memory, limited kernel memory: Limiting both user and kernel memory can be useful for debugging memory-related problems. If a container is using an unexpected amount of either type of memory, it runs out of memory without affecting other containers or the host machine. Within this setting, if the kernel memory limit is lower than the user memory limit, running out of kernel memory causes the container to experience an OOM error. If the kernel memory limit is higher than the user memory limit, the kernel limit does not cause the container to experience an OOM. - 这里有一个大小问题,当内存大小小于内核内存大小时,该限制并无什么影响;当内存大于内核内存时,容器内可能会过早触发OOM,调试时可能会有用

CPU资源限制

CPU的资源调控方式共分为两种,一种是默认的CFS调度,一种是实时调度。

CFS调度模式

CFS调度器是Linux对于常规进程常用的一种调度方式,基于红黑树算法,允许对容器使用的CPU资源进行控制。

Option(选项)Description(描述)
--cpus=<value> Specify how much of the available CPU resources a container can use. For instance, if the host machine has two CPUs and you set --cpus="1.5", the container is guaranteed at most one and a half of the CPUs. This is the equivalent of setting --cpu-period="100000" and --cpu-quota="150000". Available in Docker 1.13 and higher.
指定容器可以使用的CPU资源
例如,有两个核心的CPU的情况下, --cpus="1.5"--cpu-period="100000" 以及 --cpu-quota="150000" (后两个选项新版中废弃)效果相同,仅仅在1.13后的docker可用
--cpu-period=<value> Specify the CPU CFS scheduler period, which is used alongside --cpu-quota. Defaults to 100 micro-seconds. Most users do not change this from the default. If you use Docker 1.13 or higher, use --cpus instead.
指定CFS调度器的调度周期,默认为100微秒
在1.13以上版本的docker中弃用
--cpu-quota=<value> Impose a CPU CFS quota on the container. The number of microseconds per --cpu-period that the container is limited to before throttled. As such acting as the effective ceiling. If you use Docker 1.13 or higher, use --cpus instead.
在1.13以上版本的docker中弃用
--cpuset-cpus Limit the specific CPUs or cores a container can use. A comma-separated list or hyphen-separated range of CPUs a container can use, if you have more than one CPU. The first CPU is numbered 0. A valid value might be 0-3 (to use the first, second, third, and fourth CPU) or 1,3 (to use the second and fourth CPU).
限制容器可以使用的核心
--cpu-shares
Set this flag to a value greater or less than the default of 1024 to increase or reduce the container’s weight, and give it access to a greater or lesser proportion of the host machine’s CPU cycles. This is only enforced when CPU cycles are constrained. When plenty of CPU cycles are available, all containers use as much CPU as they need. In that way, this is a soft limit. --cpu-shares does not prevent containers from being scheduled in swarm mode. It prioritizes container CPU resources for the available CPU cycles. It does not guarantee or reserve any specific CPU access.
容器调度权重
【不完全明白】

实时调度模式

仅可在Docker 1.13后的版本中使用,用于调度CFS无法调度的程序。

为了使用实时调度,需要确保两项工作:

确保:宿主机内核相关功能启用

验证 CONFIG_RT_GROUP_SCHED 选项是否被启用,文档中提供了两个办法检查该功能

方法1. zcat /proc/config.gz | grep CONFIG_RT_GROUP_SCHED

方法2. 检查 /sys/fs/cgroup/cpu.rt_runtime_us 文件是否存在

确保:已经配置Docker Daemon

需要在docker启动的服务中加上新的选项 --cpu-rt-runtime

这个选项将会设置容器调度的基本周期(微秒为单位)。当使用 --cpu-rt-runtime=950000 配置时,容器可以在每1000000微秒(1秒)中运行950000微秒

确保:容器经过单独配置

累死我了这里不翻译了,反正这部分不怎么常用,跳过

需要的朋友自己去官网看把

磁盘IO资源限制

限制大小:--storage-opt size=120G (docker run后使用)

运行时资源限制(IO)

Option(选项)Description(描述)
--blkio-weight=0 Block IO weight (relative weight) accepts a weight value between 10 and 1000.
IO权重(优先级),允许10 - 1000
--blkio-weight-device=""Block IO weight (relative device weight, format: DEVICE_NAME:WEIGHT)
不同设备间的权重
--device-read-bps="" Limit read rate from a device (format: :[]). Number is a positive integer. Unit can be one of kb, mb, or gb.
读取速度限制
--device-write-bps="" Limit write rate to a device (format: :[]). Number is a positive integer. Unit can be one of kb, mb, or gb.
写入速度限制
--device-read-iops=""Limit read rate (IO per second) from a device (format: :). Number is a positive integer.
读取IOPS限制
--device-write-iops=""Limit write rate (IO per second) to a device (format: :). Number is a positive integer.
写入IOPS限制

参考文献