Skip to content

Docker Focus Areas

Docker Engine

Component Description
Do not run dockerd on any networked socket
  • If anyone can reach the networked socket that Docker is listening on, they potentially have access to Docker (which runs as root)
  • The default docker behavior today is the safest assumption, which is to listen on a unix socket
  • Example:
    • Not recommended: $ dockerd -H "tcp://1.2.3.4:8080"
    • Recommended: $ dockerd -H "unix:///var/run/docker.sock"
Do not mount the Docker socket (/var/run/docker.sock) into containers
  • An attacker can execute any command that the docker service can run (gives access to the whole host system since dockerd runs as root)
  • Example:
    • $ docker run -it -v /var/run/docker.sock:/var/run/docker.sock ubuntu /bin/bash
Monitor dangerous mountpoints
  • Do not mount: /var/run/docker.sock/proc/dev
  • Set container FS to RO: docker run --read-only <image>
  • Set volumes to RO: docker run -v $(pwd):/secrets:ro

Harden Kernel

Component Description
Set cgroups
  • Control groups (cgroups) are a feature of the Linux kernel that allow you to control how much resources a process can use
  • Can prevent DoS via system resource exhaustion
  • Examples:
    • CPU: docker run -it --rm --cpuset-cpus 0 --cpu-shares 768 ...
      • --cpu-shares defines a share between 0-768
      • if a container defines a share of 768, while another defines a share of 256
      • the first container will have 50% share with the other having 25% of the available share total
    • Memory: docker run -it --rm --memory 128m ...
    • Storage: docker -d --storage-opt dm.basesize=5G
    • Disk I/O: --device-read-iops, --device-write-iops
Enforce MAC (Mandatory Access Control)
  • Enforce MAC to prevent undesired operations (both on host and on containers) at the kernel level (Seccomp, AppArmor, SELinux)
  • MAC can confine processes to a limited set of system resources or privileges:
    • Enable
      • In container: setenforce 1
      • For dockerd: --selinux-enabled
    • Policies:
      • --security-opt="label:user:USER"
      • --security-opt="label:role:ROLE"
      • --security-opt="label:type:TYPE"
      • --security-opt="label:level:LEVEL"
      • --security-opt="apparmor:PROFILE"
    • Example: docker run --security-opt=label:level:s0:c100,c200 -i -t centos bash
Drop Capabilities
  • Capabilities turn the binary "root/non-root" dichotomy into a fine-grained access control system
  • Drop capabilities that are not required (Docker - Capabilities)
  • Default capabilities:
    • chown, dac_override, fowner
    • kill, setgid, setuid, setpcap
    • net_bind_service, net_raw
    • sys_chroot, mknod, setfcap, audit_write
  • When launching a container (--cap-add=[] or --cap-drop=[]):
    • $ docker run --cap-add SYS_PTRACE ubuntu
    • $ docker run --cap-drop setuid --cap-drop setgid <container_name> /bin/sh
  • Do not use --privileged
    • Allows the container to access all devices on the host
    • Provides the container with a LSM (i.e SELinux or AppArmor) configuration that would give it the same level of access as processes running on the host
Enforce User Namespaces
  • Namespaces limit the maximum privileges of the containers over the host
    • Allows the Docker daemon to run as an unprivileged user on the host but appear as running as root within containers
    • Processes running within a container cannot see processes running in another container, or in the host system
    • Each container also gets its own network stack: all containers on a given Docker host are sitting on bridge interfaces
  • Do not run containers as root users:
    • NO (runtime): $ docker run -d ubuntu sleep infinity
      $ ps aux | grep sleep
      root ... sleep infinity
    • YES (runtime): $ docker run -d -u 1000 ubuntu sleep infinity
      $ ps aux | grep sleep
      1000 ... sleep infinity
    • YES (BUILD):
      FROM ubuntu:latest
      USER 1000
Option Description
run -it alpine ifconfig
  • When containers are launched, a network interface is created
  • This gives the container a unique IP address and interface
run -it --net=host alpine ifconfig Instead of the container's network being isolated with its interface, the process will have access to the host machine's network interface
Option Description
run -it alpine ps aux
  • The first container will run in its process namespace
  • The only processes it can access are the ones launched in the container
run -it --pid=host alpine ps aux The container can also see all the other processes running on the system
  • Providing containers access to the host namespace is regarded as bad practice
  • If it's required, use a shared namespace to provide access to only the namespaces the container requires
Option Description
run -d --name http nginx:alpine
  • The first container starts an Nginx server
  • This will define a new network and process namespace
  • The Nginx server will bind itself to port 80 of the new network interface
run --net=container:http benhall/curl curl -s localhost Other containers can now reuse this namespace using the syntax container: <name>
run --pid=container:http alpine ps aux It can also see and interface with the processes in the shared container