Inside each container, you see a filesystem, network interfaces, disks, and other resources that all appear to be unique to the container despite sharing the kernel with all the other processes on the system. The primary network interface on the actual machine, for example, is a single shared resource. But inside your container it will look like it has an entire network interface to itself. This is a really useful abstraction: it's what makes your container feel like a machine all by itself. The way this is implemented in the kernel is with namespaces. Namespaces take a traditionally global resource and present the container with its own unique and unshared version of that resource.
—Sean P. Kane & Karl Matthias, Docker Up & Running, 2nd Edition
Namespaces allow groups of processes to share an isolated instance of a global resource. They are fundamental to the operation of containers, enabling the Linux kernel to establish isolated subsystems that, unlike virtual machines, share the same kernel, but still can appear from within, in many ways, to exist on their own. This studio introduces namespaces, and establishes the foundations by which containers use them.
In this studio, you will:
unshare
, setns
, and clone
system callsproc
pseudo-filesystem, isolating the ps
command to a PID namespaceinit
processchroot
and mount namespacesPlease complete the required exercises below. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.
As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas.
Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.
As the answer to the first exercise, list the names of the people who worked together on this studio.
As a simple demonstration, to introduce the concept of namespaces, you will write a program that creates a new UTS namespace, set its hostname, then observe that the hostname, from the perspective of processes in the broader system outside of that namespace, is unaffected.
Because namespaces (besides user namespaces, which we will cover next time) require administrative privileges to create, please complete this studio on your Raspberry Pi.
Hint: this exercise closely follows parts of the article,
Namespaces in operation, part 2: the namespaces API
and you may use any code (with attribution) from the
unshare.c
program referenced in that article.
Write a program that uses the unshare()
system call to create and join a new UTS namespace.
For more information, see the online man page
or issue the command:
man 2 unshare
The program should then change the hostname of its namespace with the sethostname()
system call.
For more information, see the online man page
or issue the command:
man 2 sethostname
The program should next call gethostname()
,
and print the system hostname (of its namespace) to the terminal.
It should finally enter an infinite loop such that it does not terminate until interrupted,
allowing the namespace to remain active.
Compile and run the program on your Raspberry Pi
(using sudo
to gain administrative privileges).
With it still running, open a new terminal window, and issue the hostname
command.
As the answer to this exercise, please show the output of your program, and the hostname shown in the other terminal. Do you observe the expected behavior? Why or why not?
This time, you will use the setns()
system call to allow a process to enter an existing UTS namespace.
Hint: this exercise closely follows parts of the article,
Namespaces in operation, part 2: the namespaces API
and you may use any code (with attribution) from the
ns_exec.c
program referenced in that article.
Modify your program (you do not need to create another copy) from the previous exercise so that,
after the call to unshare()
, but before calling sethostname()
,
it prints its PID to the terminal.
Write another program that takes two command-line arguments.
The first should be an integer that specifies a PID.
The program should attempt to open()
the corresponding /proc/PID/ns/uts
file (read only),
then join the corresponding namespace with the setns()
system call.
For more information, see the online man page
or issue the command:
man 2 setns
The second command line argument should be a command,
which the program should execute with execvp
in the namespace it has joined.
For more information, see the online man page
or issue the command:
man 3 execvp
To practice good C programming style, be sure to have your program check the number of command-line arguments and the return values from system calls, then report any errors appropriately and exit if necessary.
Compile and run the program that creates a new namespace,
(using sudo
to gain administrative privileges)
and observe the PID of the child process.
In another terminal window, compile and run the program that joins an existing namespace
(using sudo
to gain administrative privileges),
passing the PID of the child process, and having it launch a shell in the new namespace
(e.g. by passing /bin/bash
as the second command-line argument).
In that shell, issue the hostname
command to verify that it is, indeed,
in the new UTS namespace created by the first program.
As the answer to this exercise, please show the output from both programs.
For the rest of this studio, you will take existing programs, then modify them, to establish a set of tools that will enable you to create a very simple container. You will continue to build on these tools in the next several studios to gain experience with the foundational mechanisms that Linux provides for container environments, before using more advanced tools like Docker.
So far in this studio, you have explored two of the three primary system calls related to namespaces.
For this and the remaining exercise, you will instead use the clone()
system call.
For more information, see the online man page
or issue the command:
man 2 clone
On your Raspberry Pi, retrieve and compile the
ns_child_exec.c program.
This program creates a child process, using clone()
in new namespace(s), according to flags provided as command-line arguments.
Subsequent argument(s) specify a binary or command that the cloned child process
executes with execvp()
in those new namespace(s).
Please read through the program's code to understand how it works,
and please see the
Namespaces in operation, part 4: more on PID namespaces
article for information about, and examples for using, the ns_child_exec
program.
Create a new a program that uses the getpid()
and getppid()
functions
to print its PID and its parent PID to the terminal.
Compile this program, then run ns_child_exec
with verbose output
(using sudo
to gain administrative privileges),
to execute the program you wrote in its own PID namespace.
The verbose output option will cause ns_child_exec
to print the PID of the child process it creates,
from the scope of the root PID namespace.
The child process, on the other hand, will print its (and its parent's)
PIDs from the scope of the new namespace.
As the answer to this exercise, please show the output from both processes.
State which command-line flag is passed to ns_child_exec
to specify a new PID namespace
and which flag specifies verbose output.
Please also explain, briefly, why the parent PID of the child process changes,
and explain the mechanisms used by ns_child_exec
to ensure that its child process does not become an orphan.
A process launched in a new PID namespace can also inhabit a new mount namespace,
which enables it to mount a proc pseudo-filesystem for its PID namespace in the /proc
directory,
without affecting the mount visible to processes in the root namespace.
Hint: this exercise closely follows parts of the article,
Namespaces in operation, part 3: PID namespaces
and you may use any code (with attribution) from the
pidns_init_sleep.c
program referenced in that article.
Modify the program you created in the previous exercise that prints its, and its parent's, PIDs.
After it prints these values, it should mount the proc pseudofilesystem for its namespace into the /proc
directory.
Because the /proc
filesystem is likely marked as shared,
meaning that events will propagate to its peer groups outside the namespace,
you will first need to mark it as private.
To do so, you will first use the mount()
system call to update the namespace's propagation type to MS_PRIVATE
,
then use a second mount()
system call to mount its proc filesystem.
For more information, see the online man page
or issue the command:
man 2 mount
After mounting the proc filesystem, your program should,
with one of the exec
family of functions,
run the ps u
command.
For more information, you can look at the call to execlp
in the
pidns_init_sleep.c
program, see the online man page,
or issue the command:
man 3 exec
Compile your program, then run ns_child_exec
with verbose output
(using sudo
to gain administrative privileges)
to run it in its own PID and mount namespaces.
After it completes, call sudo ps u
.
Compare the outputs of both runs of the ps u
utility
(the one you called directly from the scope of the global namespace,
and the one called by your process in the scope of the new namespace)
to verify that ps
lists processes from the perspective of the namespace from which it was called.
Note: if you did not correctly remount the proc
filesystem as private,
before remounting it in the scope of the new PID namespace,
this has the effect of (1) propagating the remount event to the global mount namespace,
such that (2) when the program that created the new namespace exits, and the PID namespace is therefore destroyed,
the /proc
mount will be unmounted (since that namespace no longer exists),
which means that (3) it will also be unmounted in the parent namespace.
If this happens, then when you call sudo ps u
directly from the shell,
you may get an error message indicating that /proc
is not mounted.
In this case, make sure that your program that mounts /proc
in the new namespace
is correctly first remounting the /proc
filesystem as shared.
Then, from a terminal window (in the global namespace),
remount /proc
by issuing the following command:
sudo mount proc /proc -t proc
As the answer to this exercise, please show the output of your program,
as well as the output of your call to ps u
from the terminal.
Please explain any similarities and differences you observe.
Also, state which command-line flag is passed to ns_child_exec
to specify a new mount namespace.
The first process in a PID namespace has PID 1 in that namespace,
and, in many ways, becomes the "init
" process for that namespace,
meaning that it becomes the parent to, and reaps, orphaned processes in its namespace.
For this exercise, you will use an existing program that performs some of the "init
" functionality.
Hint: this exercise closely follows parts of the article,
Namespaces in operation, part 4: more on PID namespaces
and you may use any code (with attribution) from the
orphan.c
program referenced in that article.
On your Raspberry Pi, retrieve and compile the
simple_init.c program.
This is a simple init
-style program to be used as the init "init
" process in
a PID namespace. The program reaps the status of its children and
provides a simple shell facility for executing commands.
Write a new program that calls fork()
to create a child process.
For more information, see the online man page
or issue the command:
man 2 fork
The parent process should (1) print the child process's PID,
(2) print its own PID, and then (3) print its parent's PID.
It should then (4) sleep (e.g. with the sleep
function)
for one second before exiting, which ensures that it has not yet
exited when the child process becomes active.
For more information, see the online man page
or issue the command:
man 3 sleep
The child process should (1) print its PID then (2) print its parent's PID. Make sure to have the parent and child processes print appropriate messages so you can distinguish which process is printing. Then, the child process should (3) sleep for two seconds, guaranteeing that its parent has exited, making it an orphan. The child should then (5) print its parent's PID again before exiting.
Compile your program, then run ns_child_exec
with verbose output
to launch simple_init
in its own PID namespace
(using sudo
to gain administrative privileges),
passing the -v
flag as a command line argument to simple_init
to specify verbose output,
e.g. with the following command:
sudo ./ns_child_exec -pv ./simple_init -v
NOTE: Using CTRL+C will not exit the terminal provided by simple_init
,
but you can issue the exit
or quit
commands.
From the simple_init
shell, run your program that creates an orphaned child process.
As the answer to this exercise, please show all terminal output
(including the output of simple_init
when it is launched).
Please also explain, briefly, what mechanisms simple_init
uses to reap orphaned processes in its namespace.
A Linux container is typically given its own view of the filesystem, isolating it from the rest of the system. Besides the files specific to its functionality, and a set of directories into which it can store data, it also needs access to necessary OS files and directories that enable it to interface with the Linux OS.
For this exercise, you will simulate the creation of a container
by launching a shell in its own isolated section of the filesystem
by using the chroot
mechanism.
First, create a new directory that will serve as the root of your container's filesystem.
This directory should exist at the root of the filesystem, e.g. mkdir /containerfs
Inside this directory, use the mount
utility to bind-mount the following files and directories:
(* indicates to mount read-only)
([D] indicates to create a directory as the mount-point, [F] indicates to create an empty file as the mount-point)
For more information, see the online man page or issue the command:
man 8 mount
Now, use the chroot
command-line utility to launch a new shell,
with its root directory as the directory you just created.
For more information, see the online man page
or issue the command:
man 1 chroot
From the new shell, create a home directory in the shell's root directory,
then navigate into it and create a few files in there.
Run the command ls -l
to view the files you've created.
Also, run the ps
command to verify that the proc filesystem is mounted at /proc
.
In another terminal window, navigate to the directory you created to be the root directory of the "container,"
enter the home directory you created,
and run ls -l
to view the files in that directory.
As the answer to this exercise, show the output of both ls
commands and the ps
command.
NOTE: you can issue the exit
command to exit the chroot jail environment.
This time, instead of launching a shell in a chroot jail, you will create a shell process in its own mount namespace.
First, you will create a new program that sets up the mount namespace,
including bind-mounting the necessary directories and files listed in the previous exercise,
then pivoting to a new root directory.
This program will be linked against simple_init
,
and will provide a function that takes a const char *
argument
indicating the path to the directory that will serve as the root of the container's filesystem.
To ensure that mount events from the new user namespace do not propagate to the root namespace, the function should first recursively set all mounts, starting from the root, to private:
mount("","/","",MS_PRIVATE | MS_REC,NULL);
To allow the specified directory to become the root of the filesystem,
that directory needs to be a mount-point.
So, as a next step, the function should use the mount()
bind-mount the specified directory to itself.
Next, the function should use the chdir()
system call to set its working directory to the supplied path.
For more information, see the online man page
or issue the command:
man 2 chdir
Next, the function should use the mount()
system call to bind-mount
all necessary directories listed in the previous exercise into the specified directory.
Be sure to use the same settings (read-only where specified) as before.
This time, however, it should mount the proc pseudo-filesystem directly:
mount("proc","proc","proc",0,NULL)
Finally, to establish a new root directory, the function will need to use the
pivot_root
system call. This requires the new root to contain a subdirectory
into which the old root filesystem will be mounted.
Have your program create a directory called old-root
using the mkdir()
system call.
For more information, see the online man page
or issue the command:
man 2 mkdir
HINT: You can use S_IRWXU
as the mode
argument for mkdir()
.
Then, have your program call pivot_root()
to swap the current directory (.
)
with the old-root
directory.
For more information, see the online man page
or issue the command:
man 2 pivot_root
To practice good C programming style, be sure to have your function check the
return values from all system calls,
then report any errors appropriately and exit if necessary.
For the mkdir()
system call, an errno
of EEXIST
means the directory already exists, and is therefore likely not indicative of undesired behavior for your function.
Create a header file declaring the function,
and call this function from simple_init.c
before it enters its while
loop to read shell commands.
You can pass the following argument to the function:
argv[optind]
This allows you to specify the path to the container's root directory
as a command-line argument to simple_init
.
Your simple_init.c
program should verify that argv[optind]
is not null, i.e., that a path to a directory was provided as a command-line argument,
before calling the function. This allows you to run simple_init
without mapping a new root directory, if desired (e.g. if you are not running it in its own mount namespace).
Compile your modified simple_init
program,
linking the other C file containing your mount namespace initialization function against it,
then run ns_child_exec
to launch simple_init
in its own PID, UTS, and mount namespaces,
supplying the path to the container's root directory to simple_init
,
e.g. with the following command:
sudo ./ns_child_exec -pmuv ./simple_init /containerfs
Congratulations! You have created a simple container.
The shell provided by simple_init
has only very basic functionality,
but it does allow you to launch the bash shell, i.e. by executing /bin/bash
.
Do so, then in your new container,
create a home directory in the shell's root directory (if you have not done so already),
then navigate into it and create a few files in there.
Run the command ls -l
to view the files you've created.
In another terminal window, navigate to the directory you created to be the root directory of the "container,"
enter the home directory you created,
and run ls -l
to view the files in that directory.
Also, run the ps
command to verify that the proc filesystem is mounted at /proc
.
From a new terminal window, run ps
to compare the processes from the perspective of the root namespace,
to the processes listed by your program.
Then, from the shell provided by simple_init
,
issue the command:
hostname <newhostname>
to create a new hostname for your container.
Then, issue the hostname
command to verify that the new hostname was applied.
In another terminal window, issue the hostname
command
to verify that the original hostname of your Raspberry Pi was not affected in the root namespace.
As the answer to this exercise, show the output of both terminals,
i.e. the ps
, ls
, and hostname
commands
run from the bash shell launched by your container's simple_init
,
and from the terminal window in the root namespace.
Also, state which command-line flag is passed to ns_child_exec
to specify a new UTS namespace.
Be sure to keep the directory that serves as the root of your container intact! You will continue to use this simple container environment in future studios.
unshare()
and prints its PIDexecvp()
s a command/proc
simple_init
simple_init.c
program that calls this functionPage updated Tuesday, January 25, 2022, by Marion Sudvarg and Chris Gill.