Chris Binnie

Cloud Native Security


Скачать книгу

sent to the kernel. Syscalls are used whenever a system resource requests anything from the kernel. That could involve access to a file, memory, or another process among many other things, for example. The manual explains that during the usual run of events on traditional Unix-like systems, there are two categories of processes: any privileged process (belonging to the root user) and unprivileged processes (which don't belong to the root user). According to the Kernel Development site (lwn.net/1999/1202/kernel.php3), kernel capabilities were introduced in 1999 via the v2.1 kernel. Using kernel capabilities, it is possible to finely tune how much system access a process can get without being the root user.

      By contrast, cgroups or control groups were introduced into the kernel in 2006 after being designed by Google engineers to enforce quotas for system resources including RAM and CPU; such limitations are also of great benefit to the security of a system when it is sliced into smaller pieces to run containers.

      uid=0(root) gid=0(root) groups=0(root)

      Even with other security controls used within a Linux system running containers, such as namespaces that segregate access between pods in Kubernetes and OpenShift or containers within a runtime, it is highly advisable never to run a container as the root user. A typical Dockerfile that prevents the root user running within the container might be created as shown in Listing1.1.

      Listing 1.1: A Simple Example Dockerfile of How to Spawn a Container as Nonroot

      FROM debian:stable USER root RUN apt-get update && apt-get install -y iftop && apt-get clean USER nobody CMD bash

      In Listing 1.1, the second line explicitly states that the root user is initially used to create the packages in the container image, and then the nobody user actually executes the final command. The USER root line isn't needed if you build the container image as the root user but is added here to demonstrate the change between responsibilities for each USER clearly.

      Once an image is built from that Dockerfile, when that image is spawned as a container, it will run as the nobody user, with the predictable UID and GID of 65534 on Debian derivatives or UID/GID 99 on Red Hat Enterprise Linux derivatives. These UIDs or usernames are useful to remember so that you can check that the permissions within your containers are set up to suit your needs. You might need them to mount a storage volume with the correct permissions, for example.

      Now that we have covered some of the theory, we'll move on to a more hands-on approach to demonstrate the components of how a container is constructed. In our case we will not use the dreaded --privileged option, which to all intents and purposes gives a container root permissions. Docker offers the following useful security documentation about privileges and kernel capabilities, which is worth a read to help with greater clarity in this area:

      docs.docker.com/engine/reference/run/

      #runtime-privilege-and-linux-capabilities

      For our example, we will choose two of the most powerful kernel capabilities to demonstrate what a container looks like, from the inside out. They are CAP_SYS_ADMIN and CAP_NET_ADMIN (commonly abbreviated without CAP_ in Docker and kernel parlance).

      The first of these enables a container to run a number of sysadmin commands to control a system in ways a root user would. The second capability is similarly powerful but can manipulate the host's and container network stack. In the Linux manual page (man7.org/linux/man-pages/man7/capabilities.7.html) you can see the capabilities afforded to these --cap-add settings within Docker.

      From that web page we can see that Network Admin (CAP_NET_ADMIN) includes the following:

       Interface configuration

       Administration of IP firewall

       Modifying routing tables

       Binding to any address for proxying

       Switching on promiscuous mode

       Enabling multicasting

      We will start our look at a container's internal components by running this command:

      $ docker run -d --rm --name apache -p443:443 httpd:latest

      We can now check that TCP port 443 is available from our Apache container (Apache is also known as httpd) and that the default port, TCP port 80, has been exposed as so:

      Having seen the slightly redacted output from that command, we will now use a second container (running Debian Linux) to look inside our first container with the following command, which elevates permissions available to the container using the two kernel capabilities that we just looked at:

      $ docker run --rm -it --name debian --pid=container:apache \ --net=container:apache --cap-add sys_admin debian:latest

      We will come back to the contents of that command, which started a Debian container in a moment. Now that we're running a Bash shell inside our Debian container, let's see what processes the container is running, by installing the procps package:

      root@0237e1ebcc85: /# apt update; apt install procps -y root@0237e1ebcc85: /# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 15:17 ? 00:00:00 httpd -DFOREGROUND daemon 9 1 0 15:17 ? 00:00:00 httpd -DFOREGROUND daemon 10 1 0 15:17 ? 00:00:00 httpd -DFOREGROUND daemon 11 1 0 15:17 ? 00:00:00 httpd -DFOREGROUND root 93 0 0 15:45 pts/0 00:00:00 bash root 670 93 0 15:51 pts/0 00:00:00 ps -ef

      We can see from the ps command's output that bash and ps -ef processes are present, but additionally several Apache web server processes are also shown as httpd. Why