Chris Binnie

Cloud Native Security


Скачать книгу

The sophisticated rkt offered what might be called hard tenancy between containers. This strict isolation enabled true protection for Customer B if Customer A was compromised; and although containers are, again, not virtual machines, rkt bridged a gap where previously few other security innovations had succeeded.

      A modern approach being actively developed, similar to that of rkt, is called Kata Containers (katacontainers.io) via the Open Stack Foundation (OSF). The marketing strapline on the website confidently declares that you can achieve the “speed of containers” and still have the “security of VMs.” Along a similar vein to rkt, MicroVMs are offered via an Open Source runtime. By using hardware virtualization the isolation of containerized workloads can be comfortably assured. This post from Red Hat about SElinux alerations for Kara Containers is informative: www.redhat.com/sysadmin/selinux-kata-containers. Its customers apparently include internet giants such as Baidu, which uses Kata Containers in production, and you are encouraged to investigate their offering further.

      Finally, following a slight tangent, another interesting addition to this space is courtesy of AWS, which, in 2020, announced the general availability of an Open Source Linux distribution called Bottlerocket (aws.amazon.com/bottlerocket). This operating system is designed specifically to run containers with improved security. The premise for the operational side of Bottlerocket is that creating a distribution that contains only the minimal files required for running containers reduces the attack surface significantly. Coupled with SElinux, to increase isolation between containers and the underlying host, the usual suspects are present too: cgroups, namespaces, and seccomp. There is also device mapper functionality from dm-verity that provides integrity checking of block devices to prevent the chances of advanced persistent threats taking hold. While time will tell if Bottlerocket proves to be popular, it is an interesting development that should be watched.

      We then looked at some hands-on examples of how a container is constructed and how containers are ultimately viewed from a system's perspective. Our approach made it easy to appreciate how any kind of privilege escalation can lead to unwanted results for other containers and critically important system resources on a host machine.

      Additionally, we saw that the USER instruction should never be set to root within a container and how a simple Dockerfile can be constructed securely if permissions are set correctly for resources, using some forethought. Finally, we noted that other technologies such as serverless also use containerization for their needs.

      In Chapter 1, “What Is A Container?,” we looked at the components that make up a container and how a system is sliced up into segments to provide isolation for the standard components that Linux usually offers.

      We also discussed the likely issues that could be caused by offering a container excessive privileges. It became clear that, having examined a container's innards, opening up as few Linux kernel capabilities as possible and stoically avoiding the use of Privileged mode was the way to run containers in the most secure fashion.

      In this chapter, we continue looking at developments in the container space that have meant it is no longer necessary to always use the root user to run the underlying container runtime(s). Consider that for a moment. In Chapter 1 we discussed how a compromised container can provide a significant threat to the underlying operating system (OS) and other containers running on the host. Additionally, we looked at how the root user on the host transposed directly to the root user within a container. If the container was subject to a compromise, then any resources that the container could access were also accessible on the host; and most alarmingly, they would have superuser permissions. For a number of years, to improve the Linux container security model, developers made great efforts to run containers without providing root user permissions. Relatively recent runtime innovations have meant that the Holy Grail is now a reality.

      Docker, beginning with v19.03 (docs.docker.com/engine/release-notes/#19030), offers a clever feature it calls rootless mode, in which Docker Engine doesn't require superuser privileges to spawn containers. Rootless mode appears to be an extension of a stable feature called user namespaces, which helped harden a container. The premise of that functionality was to effectively fool a container into thinking that it was using a host's user ID (UID) and group ID (GID) normally, when from a host's perspective the UID/GID used in the container was being run without any privileges and so was of much less consequence to the host's security.

      With rootless mode there are some prerequisites to get started; these have to do with mapping unprivileged users with kernel namespaces. On Debian derivatives, the package we need is called uidmap, but we will start (as the root user) by removing Docker Engine and its associated packages with this command (be careful only to do this on systems that are used for development, for obvious reasons):

      $ apt purge docker

      Then, continuing as the superuser, we will install the package noted earlier with this command:

      $ apt install uidmap

      Next, we need to check the following two files to make sure that a less-privileged user (named chris in this instance) has 65,536 UIDs and GIDs available for re-mapping:

      $ cat /etc/subuid chris:100000:65536 $ cat /etc/subgid chris:100000:65536

      The output is what is expected, so we can continue. One caveat with this experimental functionality is that Docker Inc. encourages you to use an Ubuntu kernel. We will test this setup on a Linux Mint machine with Ubuntu 18.04 LTS under the hood.

      kernel.unprivileged_userns_clone=1 # add me to /etc/sysctl.conf to persist after a reboot

      You would also be wise to use the overlay2 storage driver with this command:

      $ modprobe overlay permit_mounts_in_userns=1 # add me to /etc/modprobe.d to survive a reboot

      There are a few limitations that we will need to look at before continuing. The earlier user namespace feature had some trade-offs that meant the functionality was not suited for every application. For example, the --net=host feature was not compatible. However, that is not a problem, because the feature is a security hole; it is not recommended, because the host's network stack is opened up to a container for abuse. Similarly, we saw that the same applied when we tried to share the process table with the --pid switch in Chapter 1. It was also impossible to use --read-only containers to prevent data being saved to the