Control groups, or cgroups for short, allow you to set limits on resources for processes and their children. This is the mechanism that Docker uses to control limits on memory, swap, CPU, and storage and network I/O resources. … Every Docker container is assigned a cgroup that is unique to that container. All of the processes in the container wil be in the same group. That means that it's easy to control resources for each container as a whole without worrying about what might be running. If a container is redeployed with new processes added, you can have Docker assign the same policy and it will apply to all of them.
—Sean P. Kane & Karl Matthias, Docker Up & Running, 2nd Edition
Constraining the CPU utilization available (or even the CPU cores accessible) to a group of processes is fundamental to isolation. Even under fair scheduling policies, processes that fork lots of CPU-intensive children (e.g. the Apache web server) can swamp available CPU bandwidth, and cause performance issues for other processes. CPU and CPUSet control groups provide powerful mechanisms to allocate CPU resources to groups of processes. Docker and other container environments make use of these mechanisms to isolate container resource usage, and the hierarchical nature of control groups makes it easy for containers to further apportion allocated resources to subgroups of processes within themselves.
In this studio, you will:
cgroups v2
CPU controller to apply weights to, or constrain the bandwidth of, a group of processestime
utility,
through the cpu.stat
interface, and with the Function Tracer and KernelShark
Please complete the required exercises below. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.
As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas.
Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.
As the answer to the first exercise, please list the names of the people who worked together on this studio.
In this studio you will use cgroups
to apply limits to the CPU usage of a process (or group of processes),
and observe that usage.
In particular, we will be using the cgroups v2
CPUSet controller to limit processes to a subset of the system's CPU cores,
and the CPU controller to constrain and observe those processes' usage of time on those cores.
Make sure, before proceeding, that you have enabled
cgroups v2
on your Raspberry Pi
according to the instructions in the
previous studio.
For this exercise, you will begin with using the cpu
controller.
First, navigate into the /sys/fs/cgroup
directory
and verify that it is available by inspecting the contents of cgroup.controllers
.
Then, enable it for any children by adding it to the cgroup.subtree_control
file.
To do this, you will have to run the terminal in root mode, i.e. first issue either the
sudo su
or sudo bash
command.
From there, issue the command:
echo "+cpu" > cgroup.subtree_control
Now, inspect the contents of cgroup.subtree_control
to verify that the controller has been added.
Next, create a child control group, contained within the root group,
which will monitor a CPU-intensive parallel task.
To create the child, simply create a subdirectory within the root cgroup
.
Navigate into this subdirectory, and list its contents,
verifying that (1) the cgroup.controllers
file lists the cpu
controller,
and that (2) the cpu.stat
file is present.
As the answer to this exercise, please list the contents of the subdirectory,
and show the contents of the cpu.stat
file.
To practice using the CPU controller,
you will use a program that generates heavy CPU usage on all available cores.
Please download the parallel_dense_mm.c program.
It takes a single command line argument, specifying the matrix size n
.
It creates two dense matrices of size n x n
with randomly-generated values,
then multiplies them, using OpenMP to run in parallel.
Compile it against the OpenMP library using:
gcc -Wall -o parallel_dense_mm parallel_dense_mm.c -fopenmp
You are going to run this program in the CPU cgroup
you created.
To do so, write another program, called exec_time
,
that (1) prints its PID;
(2) blocks on input from stdin
;
then, once it receives any input character, proceeds to exec the following command:
time ./parallel_dense_mm 500
Note that the time
command is not a standalone utility,
but is actually a command built into the bash shell,
and therefore cannot be executed from the exec
family of functions.
So, you will need to install a time
utility executable,
which you can do with the command:
sudo apt-get install time
Compile and run your program. After it prints its PID,
but before pressing a key to proceed,
add it to the CPU cgroup
you created by writing its PID into the
cgroup.procs
file in that subdirectory.
Then, allow the program to proceed. Once it completes,
compare its output (i.e., the output of the time
utility
that measured the execution time of the matrix multiply program)
to the values reported in the cpu.stat
file within the cgroup
.
As the answer to this exercise, show the values reported by time
and the contents of the cpu.stat
file.
Explain how these values compare.
Remember, the values in cpu.stat
are reported in microseconds.
CPU contention can affect the execution time of programs, and interference by a process that incurs heavy CPU usage can slow down other processes running on the system. To observe this phenomenon, you will run two concurrent instances of the parallel matrix multiply, and see how the resulting contention affects its timing.
Modify your exec_time
program so that it can be passed
a command-line argument specifying the size of the matrices,
which it will then pass as an argument to the exec
function,
instead of using a hard-coded value.
Compile the program.
Now, you will run two concurrent instances of your program: one which you will time, and the other which will cause CPU contention. Proceed as follows:
As the answer to this exercise, please copy the terminal output that shows the execution time of the second, smaller instance of the matrix multiply program. How does this compare to the measured time from the previous exercise? What does this tell you about the effect of CPU contention?
In addition to providing monitoring functionality,
CPU cgroups
can also control CPU access by applying weights
(similar to nice values under the CFS scheduler).
For this exercise, you will again create CPU contention by running two instances of your program,
so that all cores have two CPU heavy threads running concurrently.
This time, however, you will add one instance to a CPU cgroup
,
and give it a higher weight.
First, print the contents of the cpu.weight
file in the CPU cgroup
you created.
As part of the answer to this exercise, write the reported value.
Proceed as follows:
cgroup.procs
file in your CPU cgroup.cpu.weight
file than the current configured value;
recall that this value can range from 1-10000.As the answer to this exercise, please (1) report the original (default) value
in your cgroup
's cpu.weight
file,
(2) tell us what value you then set for that file,
then (3) show the execution time of the second, smaller instance of the matrix multiply program.
How does this compare to the measured time from the previous exercise?
What does this tell you about how CPU cgroups
can affect process access to CPU resources?
CPU cgroups
can also constrain CPU bandwidth
for a process, or group of processes,
running on contended processors or cores.
For this exercise, you will use the same technique to create CPU contention,
but this time you will constrain the bandwidth of one of the instances of the parallel matrix multiply program.
First, reset the value of the cpu.weight
controller file
to its original, default value (you should have retrieved this in the previous exercise).
Next, apply a bandwidth limit by writing into the cpu.max
interface file.
This file takes the format:
MAX PERIOD
Where MAX
indicate the maximum total time (in microseconds)
that processes in the cgroup
can execute on contended CPUs
for every PERIOD
of elapsed time.
This restricts the bandwidth of processes in that cgroup
to MAX/PERIOD
.
Use values that are sufficiently small that you will be able to see throttling behavior.
For example, if your exec_time
program measured an elapsed time of t seconds
to run parallel_dense_mm
with matrices of size 500x500,
then use a MAX
of t/5 seconds (converted to microseconds)
and a PERIOD
at least twice the MAX
value.
Now, proceed to measure the execution time the same way you did in the previous exercise,
running an instance of your program with large matrices,
and a second instance with 500x500 matrices, which is added to the cgroup
to constrain its bandwidth.
As the answer to this exercise, please (1) tell us what values you set in the cpu.max
file; (2) calculate the resulting bandwidth constraint, based on those numbers;
and (3) show the execution time of the second, smaller instance of the matrix multiply program.
How does this compare to the measured times from the previous exercises?
What does this tell you about the bandwidth constraint scales execution times?
To look more closely at the how the CPU cgroup
enforces bandwidth constraints,
you will use ftrace
(short for Function Tracer) to trace
the execution of parallel_dense_mm
when it has a bandwidth constraint applied,
then use KernelShark to view the results of the trace.
First, on your Raspberry Pi, install these utilities with the command:
sudo apt-get install trace-cmd kernelshark
The function tracer is extraordinarily powerful,
and a full exploration of its capabilities is beyond the scope of this studio.
However, to generate a basic trace, which will allow you to inspect the scheduler's behavior
when it applies a bandwidth constraint to a cgroup
,
you will use the command:
sudo trace-cmd record -e sched_switch ./exec_time 500
In particular, for this exercise, proceed as follows:
cgroup.procs
file in your CPU cgroup.Now, use KernelShark to inspect your trace.
Note: Running Kernelshark, which is required for the following questions, requires a GUI. This means that if you have a headless setup for your Raspberry Pi, you will need to connect to it with a VNC viewer (as detailed on Exercise 4 of Studio 2), or use X11 forwarding view your ssh client.
On Mac/Linux, this is done by simply passing '-X' to the ssh command line. You may also need to install an X11 client, such as XQuartz. Other clients should have similarly straightforward configuration options to enable.
On Windows, you may need to use a non-native ssh client, like PuTTY, in conjunction with an X11 server like Xming. To enable X11 forwarding in PuTTY, first ensure that Xming is running on your Windows computer. Then, in PuTTY, expand the Connection settings in the left sidebar, expand SSH, then click the X11 settings menu. Check the "Enable X11 forwarding" option. Now, you can click the Session menu in the left sidebar, and connect to your Pi via ssh.
Open the trace file that was produced in the previous exercise, with the command:
kernelshark trace.dat
By default, you will be looking at a timeline for each CPU core in the system. Each process in the system will be given a unique color so you can track individual processes as they are scheduled on and off of processors as well as when they may be migrated between cores.
Start by zooming in on the trace until you can make out discrete events. To zoom in: press and hold the left mouse button; drag the cursor to the right; and then release to define a zoom window. Zooming out is the reverse: press and hold the left mouse button; drag the cursor to the left; and then release the mouse button.
We can also enable a process-centric view rather than a CPU-centric
view. In the Kernelshark window, go to the Plots menu, select Tasks, and then find the process
parallel_dense_mm
and click on the check box to activate it.
Scroll down or enlarge the viewing window until you see the timeline for that process
at the bottom. This timeline only shows the activity of this one process, and
different colors represent execution time on different processors (red boxes on this
timeline represent time where this task was not scheduled on any processor).
You can use the CPU and task timelines to see exactly how your process executed over its lifetime. If you zoom in to where you can see discrete events, you can mouse over those events to see exactly when each thread of the process was preempted.
As the answer to this exercise, please discuss what the trace tells you about the behavior of the process when it is scheduled on contended CPUs, and how you observe the bandwidth limits being applied. Also, please take a screenshot of the complete trace, as well as a screenshot of a zoomed-in area that highlights the scheduler's behavior, and submit these with your answers.
The remaining exercises of this studio are intended to
introduce the CPUSet cgroup
controller,
and reinforce how cgroup
delegation
and namespace scoping can allocate resources (in this case, CPU cores) to a container,
which is able to then apportion those resource among its processes.
First, create a hierarchy of cgroups
, and delegate ownership to the pi
user.
These steps are similar to those given in the previous studio,
but you will also be enabling the cpuset
controller:
Open a root shell (e.g. with sudo su
),
then navigate to the /sys/fs/cgroup
directory.
In the root shell, issue the following commands:
mkdir pi_containers
(This creates a hierarchy of cgroups
in which the pi
user can launch containers.)
chown pi:pi pi_containers
(This delegates control of the hierarchy to the pi
user,
allowing it to create new children.)
chown pi:pi pi_containers/cgroup.procs
(This allows the pi
user
to move processes within the hierarchy.)
chown pi:pi pi_containers/cgroup.subtree_control
(This allows the pi
user to move enable controllers throughout the hierarchy.
However, the root user still retains control over the controllers and interfaces at the root of the hierarchy,
allowing the administrator to allocate resources to the pi
user,
which can then distribute those resources amongst its containers.)
echo "+cpuset" > cgroup.subtree_control
(This ensures that the CPUSet controller is available
to the pi
user's cgroup
hierarchy.)
Now, close the root shell. As the pi
user (without sudo
),
navigate into the /sys/fs/cgroup/pi_containers
directory.
From there, first add the cpuset
controller to the cgroup.subtree_control
so that the controller is available to containers created under this cgroup
.
Then, still from within /sys/fs/cgroup/pi_containers
,
create a new directory called default
.
Recall that processes must only be in the leaf node of a cgroup
(with the exception of processes in the root cgroup
);
this directory allows processes to be moved into the pi_containers
hierarchy
without being explicitely assigned to an individual container cgroup
.
Print the PID of the shell (echo $$
), then launch a root shell.
Write that PID into the /sys/fs/cgroup/pi_containers/default/cgroup.procs
.
Exit the root shell, then print the contents of the default/cgroup.procs
to confirm that the shell has, indeed, been added to the cgroup
.
Now that it is in the portion of the hierarchy controlled by the pi
user,
that user can move it (and its children) freely within the hierarchy.
Still as the pi
user, create a new directory under
/sys/fs/cgroup/pi_containers
(e.g., called container1
)
that will serve as the cgroup
for your container.
List the contents of this directory, and verify that the
cpuset.cpus
interface file is present.
You will use this interface to restrict your container to a subset of the available CPUs.
Write the value 2-3
into the file,
which restricts your container to CPUs 2 and 3,
then print its contents to verify that the write was successful.
Now, run the cgroupns_child_exec
program like in the previous studio,
having it create new PID, user, and cgroup
namespaces,
and have it join the new cgroup
you created, e.g. with the command:
./cgroupns_child_exec -pCU -M '0 1000 1' -G '0 1000 1' /sys/fs/cgroup/pi_containers/container1/cgroup.procs ./simple_init
From the init
prompt in your new container,
launch /bin/bash
.
To allow your container to further apportion its resources
(i.e. the CPU cores it has been allocated),
it will need to create nested cgroups
under pi_containers/container1
.
From the container's shell, create a new cgroup
under /sys/fs/cgroup/pi_containers/container1
.
This cgroup
will restrict any container processes that are moved into it
to only execute on CPU 2.
As such, in the following instructions, we assume you've called the directory cpu2
.
Remember, because processes can only be in leaf node cgroups
,
to enable controllers in cpu2
and move processes into it,
other processes in the container will need to be in their own leaf node
under container1
.
So (still from your container's shell),
create another cgroup
under container1
called default
.
Then, print the contents of container1/cgroup.procs
and write all PIDs present in that file into the
container1/default/cgroup.procs
file.
Note that the last entry in container1/cgroup.procs
is likely the PID of the process printing its contents (e.g. cat
)
and so you may not be able to write that PID into the container1/default/cgroup.procs
file.
Once you have done this, print the contents of both
container1/cgroup.procs
and
container1/default/cgroup.procs
.
You should see that there is still a process with PID 0 listed in
container1/cgroup.procs
that you cannot move out.
This is the parent of the container's init
process,
i.e., the cgroupns_child_exec
process.
Because it is outside of the container's scope,
you cannot move it from inside the container.
To move it out of that cgroup
,
so that only leaf nodes of that cgroup
contain processes,
open another terminal window (as non-root, but outside of the container).
From there, print the contents of /sys/fs/cgroup/pi_containers/container1/cgroup.procs
.
This will show you the PID of the cgroupns_child_exec
process
from the scope of the global PID namespace.
From this terminal window, move that process into the
/sys/fs/cgroup/pi_containers/default
namespace.
From the terminal running inside your container,
navigate into /sys/fs/cgroup/pi_containers/container1
.
Print the contents of the following files:
cgroup.procs
default/cgroup.procs
cpu2/cgroup.procs
At this point, only the default
container should have processes.
From the other terminal window outside the container,
print the contents of the
/sys/fs/cgroup/pi_containers/container1/default/cgroup.procs
file.
As part of the answer to this exercise, please show the contents of this file
when printed from inside and outside the container.
Explain why the PIDs listed do not match.
Then, from inside the container, print the contents of /proc/self/cgroup
.
From outside the container, print the contents of /proc/PID/cgroup
,
where PID is one of the PIDs listed in cgroup.procs
when printed from outside the container
(in other words, the global namespace PID of one of the processes in the container).
As the remainder of the answer to this exercise, please show the contents of both files,
and explain how cgroup
namespace scoping makes their contents differ.
Make sure to keep your container's terminal window open,
as you will continue to use it in the next exercise.
Now that your cgroup
hierarchy is configured
with a subtree for your container,
and all container processes are in a leaf node in that subtree,
your container should be able to split its resources among child processes.
For this exercise, you will have your container allocate only a single CPU core
to the parallel matrix multiply program,
then time its execution to confirm that the restriction works correctly.
From your container, navigate into the /sys/fs/cgroup/pi_containers/container1
directory. From there, enable the CPUSet controller for child cgroups
by writing + cpuset
into cgroup.subtree_control
.
Then, list the contents of the cpu2
directory to verify that the
cpuset.cpus
controller file is now present.
Write the value "2" into that file.
Now, in your container's bash shell,
you will launch a new shell by running /bin/bash
.
In other words, at this point, your container should be running the following hierarchy of processes:
simple_init → /bin/bash → /bin/bash
The nested bash shell will be used to constrain process execution to the cpu2 cgroup
.
Have the shell move itself into that cgroup
:
echo $$ > cpu2/cgroup.procs
Then, print the contents of the cpu2/cgroup.procs
file to verify that the move was successful.
Next, navigate into the directory where you compiled your
parallel_dense_mm
program.
Run it using the time
utility you installed,
on matrices of size 500x500, i.e.:
/usr/bin/time ./parallel_dense_mm 500
(If you get an error message stating that /usr/bin/time
does not exist, run which time
to see the path of the utility.)
As the answer to this exercise, please show the output produced by the time
utility.
What do the values show you about the parallelism of the execution in this context?
In other words, do the values confirm that the program was restricted to a single core,
and if so, how?
exec_time
program.Page updated Wednesday, February 9, 2022, by Marion Sudvarg and Chris Gill.