CSE 522S - Advanced Operating Systems

CSE 522S: Studio 7

Capabilities and User Namespaces

In a Linux system, a process's user and group IDs dictate the operations that the process may undertake. Processes must therefore run under the appropriate users and groups. Many processes run as the root user. However, best practices in software development encourage the doctrine of least-privileged rights, meaning that a process should execute with the minimum level of rights possible. This requirement is dynamic: if a process requires root privileges to perform an operation early in its life but does not require these extensive privileges thereafter, it should drop root privileges as soon as possible. To this end, many processes—particularly those that need root privileges to carry out certain operations—often manipulate their user or group IDs.

—Robert Love, Linux System Programming, 2nd Edition

Linux divides the privileges traditionally associated with root into distinct units known as capabilities. These allow processes to execute with only the subset of privileges that they need to perform specific administrative tasks, following the principle of least privilege. With the addition of user namespaces, capabilities become even more granular: privileges conferred by capabilities can be isolated to just the set of processes that share a user namespace.

In this studio, you will:

Explore the Linux capabilities, and discover how they differ from the traditional root/non-root dichotomy
Gain experience with binding privileges to executable binaries with the set-user-ID and file capability mechanisms
Be introduced to the user namespace type
Learn how to map user and group IDs in a user namespace
Expand on your simple container environment to incorporate isolation with user namespaces

Please complete the required exercises below. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.

As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.

Required Exercises

As the answer to the first exercise, list the names of the people who worked together on this studio.
On your Raspberry Pi, Look in the directory /bin. Find an interesting utility that requires administative privileges to perform certain functions. For example, you might choose date, which allows an administrator to set the system date and time, or hostname, which allows an administrator to set the system hostname.

Copy the binary somewhere into your home directory (e.g. to ~, ~/Desktop, or another location of your choice).

Attempt to run the utility in a way that requires administrative privileges, but without using sudo to gain elevated access. As part of the answer to this exercise, you will show any error messages produced.

Now, use the chown utility to set the ownership of your copy of the utility to root, e.g. with the command:

sudo chown root:root hostname

With ownership set to root, if the set-user-ID bit is set on the executable, it will run with administrative privileges for any user that has permission to execute it. Go ahead and set the set-user-ID bit, e.g. with the command:

sudo chmod u+s hostname

Now, run ls -l in the directory that holds your copy of the utility. As part of the answer to this exercise, you will show the output line that lists your utility.

Try running the utility again, without using sudo, in a way that would normally require administrative privileges. Be sure to run your copy of the utility, and not the copy in the shell's PATH, e.g. like:

./hostname newhostname
As the answer to this exercise, please (1) show the command you issued to run your utility without administrative privileges, (2) write the error message produced, (3) copy the ls -l entry for the executable file after setting the set-user-ID bit, (4) indicate what part of this entry shows that the set-user-ID bit is enabled, and (5) explain what you observed when you ran the utility with the set-user-ID bit enabled.
Setting ownership of an executable binary to root, and then setting the set-user-ID bit, enables the binary to execute with full administrative access to the system. Sometimes, you may want to perform certain administrative functions, and grant privileges to execute those functions, but without granting complete administrative rights. This follows what is known as "the principle of least privilege." Setting file capabilities is one way to accomplish this.

Make a second copy of the original executable from the /bin directory, into the same directory where you copied it the first time, e.g.

cp /bin/hostname ~/cse522/studio7/hostname2

Attempt to run the utility without using sudo to gain elevated access, and verify that the new copy works as expected (i.e. that you receive an error message related to insufficient privilege).

Now, you will need to figure out the capability that your chosen utility needs to execute the functionality you used in the previous exercise. There are a number of ways to determine this.

The most straightforward way is to look at the list of Linux capabilities, and the privileges they confer, and find the one that matches. The list is available on pages 800-801 of LPI, in the /include/uapi/linux/capability.h header, on the online man page, or by issuing the command:

man 7 capabilities

If you cannot find the appropriate capability just by looking at the list, you can also look at the man page for the utility, which will likely tell you which capabilities it requires, or which system calls it uses, in which case the man pages for those system calls will likely tell you which capabilities they require.

If the man pages are not useful, you could also search the Linux source code, e.g. on the Elixir Cross Referencer website, for the implementation of the relevant syscall, and find the corresponding capability check there.

If all else fails, you can trace (with strace) the execution of the utility to find out where it is failing.

Once you have determined which capability your utility needs, set it as a file capability for the utility. Make sure to add it to the file's Permitted set, and enable the file's Effective bit. You can use the setcap command-line utility. For more information, see the online man page or issue the command:

man 8 setcap

Now, run the utility without using sudo to verify the file capability has conferred the necessary privileges to the executable.

As the answer to this exercise, please (1) say which capability you added to the file, (2) show the setcap command(s) you used to add the capability, and (3) run getcap for your executable with file capabilities enabled, and show the output of this command.
The proc pseudo-filesystem provides introspection for process capabilities; a process's capability sets are listed in the /proc/PID/status file.

Write a simple program that prints its PID to the terminal, then loops indefinitely until interrupted. Compile and run the program.

In a different terminal window, open the /proc/PID/status file for the process. Write down the values given by the CapInh, CapPrm, and CapEff fields.

Now, run your program again, but this time as an administrator (i.e. with sudo). Again, write down the values of the CapInh, CapPrm, and CapEff fields in its /proc/PID/status file.

This time, add a file capability to your program's executable binary, enabling its Effective bit, similarly to the previous exercise. Run your program without sudo. Again, write down the values of the CapInh, CapPrm, and CapEff fields in its /proc/PID/status file.

As the answer to this exercise, please show the values of the CapInh, CapPrm, and CapEff fields for each run of your program. What do these values tell you about the process capabilities for each run? How do these values correspond to the file capability you applied for the third run? HINT: these values are in hexadecimal. Convert them to binary, then compare to the capability's corresponding number given in the /include/uapi/linux/capability.h header.
For this exercise, you will create a new user namespace, then look at its effective user and group IDs without establishing a user or group mapping.

First, open a terminal window, and issue the id -u and id -g commands to see your effective user and group IDs.

Then, use your ns_child_exec program from the previous studio to execute a shell (e.g. /bin/bash) in a new user namespace. Note: you do not need to run as root (e.g. with sudo) to create a new user namespace.

Next, issue the id -u and id -g commands to see the effective user and group IDs of the shell process.

As the answer to this exercise, show the output of the id commands, both from the root namespace and the new user namespace. Explain why they are different. Also, state which command-line flag is passed to ns_child_exec to specify a new user namespace.
User namespaces isolate the set of user and group IDs. Most notably, this allows a non-root user (typically the user that created the namespace) to be mapped to UID 0, giving that user root privileges within the namespace.

This mapping must be performed by a process with CAT_SETUID or CAT_SETGID capabilities in the new user namespace. The child process of a call to clone(CLONE_NEWUSER) (i.e., the first process in the new user namespace) is automatically granted full capabilities in that namespace. However, if this process exec's a new program, it loses capabilities (unless the exec'd has the appropriate file capabilities set). Therefore, any mapping that occurs must typically happen before the call to exec().

For this exercise, you will explore what happens if you create a new user namespace, but do not map its user and group IDs. Both attempts to create the new user namespace, then perform administrative actions in that namespace, should fail (though for different reasons). The goal of this exercise is to demonstrate when this occurs, and have you think about why it fails.

Hint: this exercise closely follows parts of the article, Namespaces in operation, part 5: User namespaces.

On your Raspberry Pi, retrieve and compile the userns_child_exec.c program. This program functions similarly to ns_child_exec, using clone() to launch a child process in new namespace(s). It also allows command line arguments that specify user and group mappings, which will be applied by the parent process for the newly-created user namespace before the child executes a shell command. As with ns_child_exec, the final command line argument(s) specify a command that is executed by the child process with execvp(). Please read through the program's code to understand how it works, then compile it.

Modify the function you created in the previous studio that, when called from the simple_init.c program, mounts several directories then establishes a new root directory for a mount namespace. In an unprivileged mount namespace (i.e., a mount namespace owned by a user namespace that was created by an unprivileged user), a bind mount operation must be performed recursively; otherwise, it can reveal the filesystem tree underneath one of the submounts of the directory being bound. Therefore, for every call to mount() that includes the flag MS_BIND, you should also include the flag MS_REC.
Run userns_child_exec with verbose output, having it launch your simple_init program in its own PID, UTS, and mount namespaces, passing it the path to the directory you created that served as the root of your container's filesystem in the previous studio. This time, do not run it with sudo.

Then, run userns_child_exec again without sudo, having it launch simple_init, again passing it the path to the directory you created that served as the root of your container's filesystem in the previous studio, but this time also have it create a new user namespace, in addition to its own PID, UTS, and mount namespaces, but do not define any user or group ID mappings.

As the answer to this exercise, please (1) write the terminal output of both runs of userns_child_exec, and (2) explain why you observed this behavior.
To give simple_init root functionality in its namespace, without running it as an administrator (e.g. with sudo), you will need to map your user and group IDs to 0 in its namespace. Run userns_child_exec again, having it launch simple_init, in its own user, PID, UTS, and mount namespaces. This time, provide it with the appropriate command-line arguments to define a mapping from your user and group IDs to 0.

A user mapping can be supplied with the -M flag, and a group mapping with the -G flag. The mapping will be written directly into the /proc/PID/uid_map and gid_map files, and should therefore be in the format:

ID-inside-ns ID-outside-ns range-length

The shell provided by simple_init has only very basic functionality, but it does allow you to launch the bash shell, i.e. by executing /bin/bash. Do so, then in your new container, run the commands id -u and id -g to see your effective user and group IDs within the user namespace.

Next, navigate into the home directory you created before in the shell's root directory, then navigate into it and create a few files in there. Run the command ls -l to view the files you've created. In another terminal window, navigate to the directory you created to be the root directory of the "container," enter the home directory you created, and run ls -l to view the files in that directory. (NOTE: The home directory in your container may be owned by root. In another terminal window, run sudo chown -R to recursively set ownership of the home directory, and the files it contains, to the user that you have mapped to root in your container.)

Also, run the ps command to verify that the proc filesystem is mounted at /proc. From a new terminal window, run ps to compare the processes from the perspective of the root namespace, to the processes listed by your program.

Then, from the shell provided by simple_init, issue the command:

hostname <newhostname>

to create a new hostname for your container. Then, issue the hostname command to verify that the new hostname was applied. In another terminal window, issue the hostname command to verify that the original hostname of your Raspberry Pi was not affected in the root namespace.

As the answer to this exercise, show the output of both terminals, i.e. the ps, ls, and hostname commands run from the bash shell launched by your container's simple_init, and from the terminal window in the root namespace. Also, state which command-line flag is passed to ns_child_exec to specify a new UTS namespace.

Be sure to keep the directory that serves as the root of your container intact! You will continue to use this simple container environment in future studios.

As the answer to this exercise, show the output of both terminals, i.e. the id -u, id -g, ps, ls, and hostname commands run from your container's simple_init shell, as well as the ls command run from the root namespace. Also, please explain what mechanisms the userns_child_exec program uses to guarantee that the user and group ID mapping occurs before simple_init is exec'd.
Things to Turn In:
- Your answers to the above exercises
- Your updated code for the function that mounts directories, then establishes a new root directory, for a new mount namespace

Page updated Tuesday, January 25, 2022, by Marion Sudvarg and Chris Gill.

CSE 522S: Studio 7

Capabilities and User Namespaces

Required Exercises

Things to Turn In: