"You see," he continued, beginning to feel better, "once there was no time at all, and people found it very inconvenient. They never knew whether they were eating lunch or dinner, and they were always missing trains. So time was invented to help them keep track of the day and get places when they should. When they began to count all the time that was available, what with 60 seconds in a minute and 60 minutes in an hour and 24 hours in a day and 365 days in a year, it seemed as if there was very much more than could ever be used."
—The Phantom Toolbooth, Norton Juster, Chapter 3
In this studio, you will:
Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.
As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas.
Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.
The Intel x86 instruction set architecture provides a Time Stamp Counter (TSC) register,
which can be accessed directly from userspace with the readtsc
instruction,
like in the following code:
static inline unsigned long long rdtsc_get(void) { unsigned long high, low; asm volatile ("rdtsc" : "=a" (low), "=d" (high)); return ( (unsigned long long) low) | ( ( (unsigned long long) high ) << 32 ); }
On the Linux Lab cluster (accessed with the qlogin
command from shell.cec.wustl.edu
),
write a program that maintains an array of unsigned long long
integers of size 100.
In a loop, read the value from the TSC twice in a row, then write the elapsed number of cycles
(the difference between the two values) into the array.
Keep in mind that the TSC register can overflow, so your program should verify that the second value returned is greater than the first.
After doing this 100 times, your program should calculate and print
the minimum, maximum, mean, and standard deviation
of the elapsed cycles between rdtsc
calls.
As the answer to this exercise,
please report these values.
Your home directory resides on networked storage,
and therefore the contents remain the same on both shell.cec.wustl.edu
and all of the Linux Lab cluster machines.
This means that, besides using terminal-based text editors (nano, vim, emacs),
you can also use Visual Studio Code to edit your programs remotely.
ARM also provides a cycle counter, the Performance Monitors Cycle Count Register (PMCNTR) on the Performance Monitor Unit (PMU). Unlike the TSC register on x86, the PMCNTR is not, by default, accessible from userspace. However, it can be read from kernel code, e.g. by kernel modules.
First, download this driver file,
and place it in the arch/arm/include/asm
directory in your Linux kernel source tree on the Linux Lab cluster..
Then, still on the cluster, please
create a new directory to hold your kernel modules,
e.g. /project/scratch01/compile/your-username/modules
and cd
into it.
Save a copy of enable_ccnt_522.c
.
In that directory also
create a Makefile that contains the line
obj-m := enable_ccnt_522.o
and save that file. Now, modify the kernel module so that it measures the elapsed times between reads from the PMCNTR for 100 samples, and reports the minimum, maximum, mean, and standard deviation of the elapsed cycles between reads, similarly to the previous exercise.
You can use the pmccntr_get
function, which takes no arguments
and returns a uint64_t
value,
to retrieve the cycle counts.
As this is a kernel module, use printk
to print these values to the kernel log.
From LKD:
“When a user-space process uses floating-point instructions, the kernel manages the transition from integer to floating point mode. What the kernel has to do when using floating-point instructions varies by architecture, but the kernel normally catches a trap and then initiates the transition from integer to floating point mode.
“Unlike user-space, the kernel does not have the luxury of seamless support for floating point because it cannot easily trap itself. Using a floating point inside the kernel requires manually saving and restoring the floating point registers, among other possible chores. The short answer is: Don't do it! Except in the rare cases, no floating-point operations are in the kernel.”
For calculating the mean and standard deviation, you will likely need to use 64-bit division and a square-root function. The cross-compiler may not be able to generate the correct operations for 64-bit division on the target 32-bit platform.
For 64-bit division, use the
u64 div_u64(u64 dividend, u32 divisor)
function declared in
<linux/math64.h>
For an integer approximation of the square root, use the int_sqrt()
or int_sqrt64()
function declared in include/linux/kernel.h
.
Build the module by issuing the command
LINUX_SOURCE=path to your Linux kernel source code
(Note that the path above should end in something like linux_source/linux
)
(Note also that you can add the command above as an individual lines at the end of the file ~/.bashrc so that it is run automatically whenever you log in.)
and finally compile via
make -C $LINUX_SOURCE ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- M=$PWD modules
which if successful should produce a kernel module file named enable_ccnt_522.ko
Boot up your Raspberry Pi, open up a terminal window, create a directory to hold
your kernel modules, and use sftp
to get the enable_ccnt_522.ko
file you produced in the previous exercise.
Use the insmod
utility to load your kernel module into the
kernel, as in:
sudo insmod enable_ccnt_522.ko
If you recieved no error messages, then your module has been successfully loaded. To confirm your module was loaded, you can also issue the command
lsmod
to see a listing of all currently loaded kernel modules.
To see the values your kernel module reported for
the minimum, maximum, mean, and standard deviation of the elapsed cycles between reads,
issue the dmesg
command, which prints the contents of the kernel log to the terminal.
As the answer to this exercise, please show the output of the system log.
(Note: Since you may be frequently building kernel modules for this class,
you can create a bash alias for the make command, e.g.:
alias makemodule="make -C $LINUX_SOURCE ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- M=$PWD modules"
You can additionally add this as a line to the end of your ~/.bashrc
file.)
The Linux Kernel provides indirect access to hardware counters,
including the ARM PMU, via the perf_event_open
system call.
Create a program directly on your Raspberry Pi that uses this system call to get a file descriptor that, when read, supplies the current value of the PMCNTR.
HINT: Follow the instructions given under the
"Performance Information from a Linux Application" header
on this page,
using PERF_COUNT_HW_CPU_CYCLES
as the config
field of the
perf_event_attr
argument to the syscall.
Then, similarly to the previous exercises, take 100 samples of the elapsed cycles between two subsequent reads, then have your program print the minimum, maximum, mean, and standard deviation of these samples. As the answer to this exercise, report those values.
If you are using the Raspberry Pi 4 or 4B,
and if you haven't done so already in Studio 1,
you will need to modify the arm-pmu
entry in the device tree for the Pi 4's BCM2711 board,
which will allow the Linux kernel to load the hw perfevents
driver that will be used in a later studio.
To do so, open the file arch/arm/boot/dts/bcm2711.dtsi
,
find the entry for the arm-pmu
,
then set the "compatible" line as follows:
compatible = "arm,cortex-a72-pmu", "arm,cortex-a15-pmu", "arm,armv8-pmuv3";
Then, recompile the Linux kernel, and install the new build onto your Raspberry Pi 4 using the instructions in Studio 2. After rebooting your Raspberry Pi, verify that the driver has loaded by running the command:
dmesg | grep "perfevents"
You should see an event that contains the phrase, "hw perfevents: enabled"
The ARM PMU has a register, the Performance Monitor User Enable Register (PMUSERENR),
that, when set, allows direct access to the PMU from userspace.
To enable this register,
save another copy of enable_ccnt_522.c
,
but with a different name, and update your Makefile to compile this module.
Modify the module code, replacing the pmccntr_enable_once
function with a call to pmccntr_enable_once_user
.
Build your module, then load it on the Pi, confirming as before
(with lsmod
or dmesg
) that it has loaded correctly.
Next, retrieve the same driver file
onto your Raspberry Pi, and create a program
that #include
s it.
You may also need to include <stdbool.h>
Your program should call the pmccntr_get()
function directly,
taking 100 samples of the elapsed time between calls,
then prints the minimum, maximum, mean, and standard deviation as before.
As the answer to this exercise report these values. Please explain why you think they are different from (or similar to) the values reported in the previous two exercises.
To remove a kernel module, the rmmod
utility calls the underlying delete_module
system call.
Attempt to remove the kernel module you loaded earlier, but without elevating your privilege with sudo
, as in:
rmmod enable_ccnt_522
You should receive an error message. As the answer to this exercise, please write the error message you received.
Additionally, please take a look at the source code for the
delete_module
and the
init_module
system calls.
As the answer to this exercise, please (1) copy the lines of code in delete_module
that the kernel uses to verify that the calling process has permission to remove the module, and
(2) indicate which function is called, for this same purpose, in init_module
.
Please submit
readtsc
enable_ccnt_522.c
kernel module file.perf_event_open
from userspace.pmccntr_get
from userspace.Page updated Monday, November 29, 2021, by Marion Sudvarg and Chris Gill.