To be sure, many users would love more memory. On modern systems, however, the problem is not really one of sharing too little among too many, but of properly using and keeping track of the bounty.
—Robert Love, Linux System Programming, 2nd Edition, Chapter 9, pp. 293.
As one of the most important resources in computer systems, memory must be managed carefully to efficiently utilize the resource. Misuses of memory, whether intentional (e.g. malicious memory overallocation) or accidental (e.g. programs with significant memory leaks) can lead to unwanted system interference. Understanding how the Linux kernel provides mechanisms to constrain the memory use of a process, or group of processes, is important for minimizing interference, especially for modular or componentized systems, e.g. those using containers and Docker.
In this studio, you will:
cgroups v2memory controller to constrain the memory use of a group of processes
cgroupsand its memory controller to observe out-of-memory events.
cgroupsinto your simple container environment.
Please complete the required exercises below. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.
As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas.
Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.
As the answer to the first exercise, please list the names of the people who worked together on this studio.
In this studio you will use a resource management feature of the Linux kernel,
cgroups, to apply limits on the amount of memory a process (or group of processes) can acquire.
cgroups feature consists of several subsystems (or controllers), each of which is responsible for a particular resource type (such as CPUs, memory, I/O, or networks).
It provides a pseudofilesystem through which users can get and set parameters and limits associated with a subsystem.
In particular, we will be using the
cgroups v2 memory controller.
The Raspberry Pi enables both
cgroups v1 and
v2 by default.
However, the Linux kernel does not allow both versions of the same controller to be active.
So, you will begin by configuring your Raspberry Pi's boot settings to
(1) disable the
cgroups v1 substem, and (2) enable the memory controller for use by
The Raspberry Pi OS launches the
systemd daemon during system startup.
This utility is responsible for configuring much of the kernel and userspace functionality of the Raspberry Pi,
including mounting the appropriate
Certain commands can be issued to
systemd to change its boot-time behavior via the
Note that commands in that file are separated by spaces.
To disable the
cgroups v1 subystem, add the following command to the end of the file:
Then, to enable memory cgroups, add the following commands:
To apply these settings, reboot your Raspberry Pi.
After the reboot, you can check that only the
cgroups v2 subsystem is mounted,
and verify that the memory controller is enabled,
by issuing the following two commands:
mount | grep cgroup
Do so, then as the answer to this exercise (1) show the output of these two commands,
(2) explain what each command does, and (3) indicate what the output tells you about
cgroups subsystem is mounted, and which controllers are enabled.
cgroups pseudofilesystem is arranged hierarchically.
By default, all tasks in the system are included in the root
which in this case is located at
cgroup.controllers file lists the controllers that are available
cgroup.subtree_control file defines
the list of controllers that are available to any children of the
In other words, a child's
cgroup.controllers file is a read-only copy of its parent's
For this studio, you will only be using the memory controller.
First, verify that it is available by inspecting the contents of
Then, enable it for any children by adding it to the
To do this, you will have to run the terminal in root mode, i.e. first issue either the
sudo su or
sudo bash command.
Enabling a controller involves writing a "+", followed by the controller name, to the
cgroup.subtree_control file, e.g.:
echo "+memory" > cgroup.subtree_control
Now, inspect the contents of
cgroup.subtree_control, and try to remove any controllers listed
(besides the memory controller you just added).
Note that you might not be able to remove some controllers
(you will note any error messages as part of the answer to this exercise).
Removing a controller is similar to adding it, except you write a "-" followed by the controller name, e.g.:
echo "-cpu" > cgroup.subtree_control
For this exercise you will create a child control group, contained within the root group,
which will monitor a task that we will write in the next exercise.
To create the child, simply create a subdirectory within the root
Navigate into this subdirectory, and list its contents.
Then, try to remove the memory controller from its
and note any error messages that appear.
As the answer to this exercise, please (1) write the contents of the
files in the root
cgroup, (2) write the contents of those files in the child
cgroup you created,
(3) show the list of files in the child
(4) write any error messages printed when you tried to remove controllers from the root's
then (5) write any error messages printed when you tried to write to the child's
and explain why you think you saw those errors.
Now, you will use the child
cgroup you created to monitor the memory usage induced by a program you will write.
This concept is an important notion for
a program can induce more memory usage than its corresponding process,
allowing it to bypass traditional resource limit mechanisms,
e.g., by forking several child processes (a classic forkbomb attack).
However, because a child process will automatically be a part of its parent's
the constraints enforced by
cgroup controllers are applied against the total resource usage induced by a program.
Write a program (outside of the
cgroups filesystem!) that implements a forkbomb:
the program should, in a loop, request a significant amount of memory (at least a page) by calling
malloc() without freeing,
delays for a short time (sufficiently long to allow for observability, e.g. a second),
then it should
fork() a child process.
fork() is also in the loop body,
this means that your program will generate an exponentially-increasing number of child processes.
Compile your program.
Open another terminal window, which you will use to run your program. Before doing so, you will write the PID of this terminal into the child control group you created. Issue the following command to see its PID:
Then, in your first terminal window (which should be running as root, i.e. with
sudo su or
write the PID into the
cgroup.procs file in your child
Then, print the contents of the
memory.current, and the
To verify that the shell running in your second terminal window is in the new
you can print the contents of the file
Please do so, note the output, and check that the reported
cgroup membership matches what you expect.
Still in your second terminal window, run the forkbomb program, wait a couple of seconds, then print the contents of those three files again, before terminating your forkbomb.
After terminating the forkbomb, print the contents of the same three files.
As the answer to this exercise, please show the contents of the three files,
before, during, and after the forkbomb program's execution.
Please explain the significance of the contents of the
and how those contents changed. Also, please pick one statistic from the
explain how it changed through the lifetime of the program, and explain its significance.
Additionally, please show the contents of the
when you inspected it from the second terminal window.
Now, in addition to using the memory controller to observe the forkbomb program,
you will constrain its memory usage using the
In your second terminal window, which is a member of your child
rewrite and recompile your forkbomb program so it no longer delays before each call to
In the first terminal window, running as root,
write a value into
memory.max that is sufficiently larger than the current value of
such that the forkbomb will cause this limit to be exceeded relatively quickly.
Then, print the value stored in
memory.max, as indicated by printing the contents of that file.
This value may differ slightly from what you wrote into the file.
Note: If you need to reset the
memory.max value, such that there is no longer an enforced maximum,
you can do so by writing the value "max" into the file.
If you need to remove a
cgroup, you can do so if the "populated" field of its
cgroup.events file has a value 0.
It can be removed by removing its directory using
Next, run your forkbomb program in the second terminal window. Observe what happens,
and, in the first terminal window, print the contents of the
As the answer to this exercise, please (1) tell us what value you wrote into
what value was subsequently printed from the contents of that file, and why you think those values might have been different.
Then (2) explain what happened to the forkbomb, show the contents of the
and explain the significance of those contents.
cgroups memory controller, in addition to providing a way to enforce a hard limit on memory usage,
also supplies the
This allows an administrator to define a memory usage threshold beyond which
(1) a "high" event, in the
memory.events file, is triggered, which subsequently
(2) signals to the Linux kernel that it should begin aggressively reclaiming memory from the processes in the
(though those processes will not be killed).
For this exercise, you will write a monitoring program that prints a notification when
cgroup exceeds its "high" memory threshold, or when the
cgroup changes its "populated" state.
The monitoring program should do the following:
Take exactly two command-line arguments (and print a helpful usage message if more or fewer are given)
which are the paths to the files named
memory.events within the child group, respectively.
memory.events file contains counters for when the low, high, and max memory thresholds are crossed.
cgroup.events file contains two binary values, which indicate whether the
cgroup is populated
(i.e. it, or its children, have member processes)
and whether it is frozen (i.e. its member processes have been placed in a suspended state).
Attempt to open both files, read-only, using the
open() system call.
If either file cannot be opened, it should print a helpful error message and exit.
inotify instance, then subsequently add watches for both files,
In a loop, watch for events on these files by reading from the file descriptor returned by the
Your program should associate the watch descriptors with the file descriptors returned by the opened files, so that, if either file has been changed, your program knows which is the corresponding opened file descriptor.
cgroup.events file has been changed, the monitor program should print a message indicating
cgroup is populated or not.
memory.events file has been changed, the monitor program should print the current value of its "high" field.
(NOTE: Remember that to perform subsequent reads of the entire contents of a file,
you need to use
lseek to set the file offset back to the beginning!)
Compile your monitor program, and rewrite and recompile your forkbomb program
so it again delays before each call to
This time, have three terminal windows open, (1) running as root,
(2) running your monitor program, and (3) which will be added to the
In (3), again print its PID to the terminal window with
In (2), launch your monitor program, then in (1) write the PID of terminal window (3) into
Then, also in (1), print the contents of
Write a value into
memory.high that is sufficiently larger than the current value of
such that the forkbomb will cause this limit to be exceeded relatively quickly.
Then, print the value stored in
memory.high, as indicated by printing the contents of that file.
Again, this value may differ slightly from what you wrote into the file.
Now, in (3), launch your forkbomb. Allow it to run until you see your monitor program begin to show output. Let the monitor program print a few messages, then kill your forkbomb, and close terminal (3).
As the answer to this exercise, please say (1) the value you wrote into
memory.high and the value it subsequently showed as being stored,
and why you think those values might have been different.
Then (2) please show the output of the monitor program, and explain what the output tells you about the behavior of the forkbomb,
and what happened when you closed the terminal.
For the final exercise, you will integrate
cgroups into the simple container environment from the previous studio,
cgroups namespaces to define a delegation boundary,
such that you can create your container without root privileges in its own
then allow it to create children of that
cgroup and place child processes into those children.
First, you will need to ensure that
cgroups v2 has been mounted with the
Issue the command:
mount | grep cgroup
If the list in parentheses does not include the "nsdelegate" option, you will need to remount
cgroups with that option:
sudo mount -t cgroup2 -o remount,nsdelegate /sys/fs/cgroup
Now, you'll create a hierarchy of
cgroups, and delegate ownership to the
This will allow that user (without administrative privileges)
to create individual
cgroups to manage resource allocation for individual containers,
launch containers in those
then allow those containers to further partition resource usage among the processes in their scope.
To begin with, open a root shell (e.g. with
then navigate to the
In the root shell, issue the following commands:
mkdir pi_containers (This creates a hierarchy of
in which the
pi user can launch containers.)
chown pi:pi pi_containers (This delegates control of the hierarchy to the
allowing it to create new children.)
chown pi:pi pi_containers/cgroup.procs (This allows the
to move processes within the hierarchy.)
chown pi:pi pi_containers/cgroup.subtree_control
(This allows the
pi user to move enable controllers throughout the hierarchy.
However, the root user still retains control over the controllers and interfaces at the root of the hierarchy,
allowing the administrator to allocate resources to the
which can then distribute those resources amongst its containers.)
Now, close the root shell. As the
pi user (without
navigate into the
In it, create a new directory, called
Recall that processes must only be in the leaf node of a
(with the exception of processes in the root
this directory allows processes to be moved into the
without being explicitely assigned to an individual container
Print the PID of the shell (
echo $$), then launch a root shell.
Write that PID into the
Exit the root shell, then print the contents of the
to confirm that the shell has, indeed, been added to the
Now that it is in the portion of the hierarchy controlled by the
that user can move it (and its children) freely within the hierarchy.
This means that, from within that same shell,
you can launch a new container as the
cgroup namespaces to constrain it to a
cgroup within the hierarchy.
To do this, please download the
This program is similar to the
userns_child_exec.c program from the previous studio,
but with additional functionality to allow the cloned child process to join a new
In doing so, it takes an additional command-line argument specifying a
and writes its PID to this file, before cloning the child into its new namespace.
This has the effect of isolating the child's view of the
to the specified
This means that, even if the child runs as the
pi user (which has permissions to a broader subtree of the hierarchy),
it cannot move itself out of the portion of the hierarchy into which it has been placed.
Please take a look at the
joinCgroup function, and the call to that function from
to understand how this functionality was programmed.
Compile the program, then (as the
pi user), create a new directory under
/sys/fs/cgroup/pi_containers (e.g., called
that will serve as the
cgroup for your container.
Now, run the program, creating new PID and user namespaces as before,
as well as a new
and have it join the new
cgroup you created, e.g. with the command:
./cgroupns_child_exec -pCU -M '0 1000 1' -G '0 1000 1' /sys/fs/cgroup/pi_containers/container1/cgroup.procs ./simple_init
(Make sure that
simple_init is in the same directory as
Congratulations! You have created a new container in its own
init prompt, launch
Open another terminal window. You will use both terminal windows (inside and outside the container)
to compare the scoped view of the container with the system configuration that has been applied to it.
From both terminal windows, print the contents of the
As part of the answer to this exercise, please show the contents, and explain why you think they are different.
Then, from your container, print the contents of the
From the other terminal window, the contents of the
where PID is the last PID that was listed from this terminal window (i.e. outside the container)
As part of the answer to this exercise, please show the contents, describe how the two files are related,
and explain why you think the contents are different.
Now, from your container, you will try to move a process back out of the
container1 cgroup, and into the
Because both directories are in the same subtree of the
cgroup hierarchy that is controlled by the
and because the container is running as the
cgroup delegation rules would suggest that this should be possible.
However, the scoping provided by
cgroup namespaces should prevent this from happening.
To see this behavior, issue the following command:
echo $$ > /sys/fs/cgroup/pi_containers/default/cgroup.procs
As the remainder of the answer to this exercise, please describe what happened,
and explain how the isolation provided by
cgroup namespaces impacted this behavior.
Page updated Monday, February 7, 2022, by Marion Sudvarg and Chris Gill.