CSE 422S: Lab 1

Kernel Monitoring Framework


He then told them many remarkable stories, sometimes half as if speaking to himself, sometimes looking at them suddenly with a bright blue eye under his deep brows. Often his voice would turn to song, and he would get out of his chair and dance about. He told them tales of bees and flowers, the ways of trees, and the strange creatures of the Forest, about the evil things and good things, things friendly and things unfriendly, cruel things and kind things, and secrets hidden under brambles.

As they listened, they began to understand the lives of the Forest, apart from themselves, indeed to feel themselves as the strangers where all other things were at home.

The Fellowship of the Ring, Book I, Chapter 7

An important aspect of system development and management is to monitor (and possibly adapt) the operation of the system: real-world systems usually must conform to performance requirements beyond simply being free of bugs or producing correct output. For example, for high-performance systems it may be important to know computational throughput, for interactive systems it may be important to measure responsiveness to users, and safety-critical systems may need to track and maintain their ability to handle problematic events within bounded timeframes. In such situations, the system must be able to monitor itself and take actions as necessary to ensure its correct operation. In this assignment we will explore some ways to perform such monitoring tasks, using kernel modules, kernel timers, kernel threads, and other features we've looked at in this course.

In this lab, you will:

  1. Use kernel timers to schedule recurring events (e.g., thread wakeups) in the future.
  2. Use kernel threads to perform deferrable work inside the kernel.
  3. Create a simple monitoring framework that periodically captures basic information from kernelspace, using kernel timers and kernel threads to manage the frequency and context of such information gathering.
  4. Verify and evaluate the operation of your monitoring framework through tracing techniques that we will explore in upcoming studio exercises.

In this lab, please create a readme.txt file for your project, in which you should record your observations as you go, and then refine your readme.txt file into a cohesive lab report before submitting your lab solution.

Please make sure that the name of each person who worked on this lab assignment is listed at the beginning of it, and please make sure that in addition to completing the code for this project, you also carefully document your work in that readme.txt file, as indicated in the Assignment section below.


Readings and other resources

The following sources of information are likely to be useful as you work on this project.

Please read through the grading rubric for the lab assignments in this course, and please keep the points it makes in mind as you design, implement, and test your lab solution.

Assignment

  1. At the top of your readme.txt file, list the names of the people who worked together on this lab, and a working e-mail address for each of those people. You are encouraged to work in groups of 2 or 3 people on this lab assignment, though you may work individually instead if you prefer - groups larger than 3 are not allowed for this assignment.

  2. To get started, create a simple kernel module that uses the module_param macro to define two ulong module parameters, called log_sec and log_nsec, which will govern how frequently (in seconds and nanoseconds, respectively) your module will (re-)schedule a repeating timer. Declare and provide appropriate default values for two static unsigned long variables to store the values for these module parameters, so that in the event that your module is loaded without parameters, the timer defaults to expiring once every second.

    These module parameters should not be visible in the filesystem, so (in addition to setting up the module parameters so that is the case, when you use the module_param macro for each of them) your module implementation for this lab should not contain code for filesystem readable/writeable attributes, which you used in the studios that had those features - instead, please start with something like the simple module example shown on pages 338 and 339 of LKD, and then add features to it as the assignment specifies.

    Add a section titled "Module Design" to your readme.txt file, and in it describe briefly (in a paragraph or two) how you implemented these requirements.

  3. Now you will implement a recurring timer that (1) is initialized and started at module load time, (2) is canceled when the module is unloaded, and (3) repeatedly reschedules itself a specified interval later (as defined by the module parameters) each time it expires.

    Please review the hrtimers API web page for an overview of hrtimers and how they work. Since the API has changed somewhat since that page was posted, please then review the hrtimer.h and ktime.h header files that are found in the include/linux directory under the linux source directory that you downloaded and built on the Linux Lab machines in the first studio.

    Add static variables for the timer interval and the timer itself, of types ktime_t and struct hrtimer respectively, to your module.

    In your module, write a new static function for your timer's expiration with return type enum hrtimer_restart, which takes a pointer to struct hrtimer as its only parameter. In the body of that function, use the hrtimer_forward() function (or hrtimer_forward_now() if you prefer) and the module's static timer interval variable, to reschedule the timer's next expiration one timer inverval forward into the future. The timer expiration function then should return HRTIMER_RESTART to indicate that the timer is being restarted.

    In your module's init() function, pass the static unsigned long variables for the module parameters into a call to ktime_set(), and assign that function's result to the module's static timer interval variable. Then, your module's init() function should:

    1. Call hrtimer_init() with the module's timer variable and the CLOCK_MONOTONIC and HRTIMER_MODE_REL flags; then
    2. Set the function field of the module's timer variable (struct) to point to the timer expiration function you wrote; and finally
    3. Call hrtimer_start() with the module's timer and timer interval variables and the HRTIMER_MODE_REL flag, to schedule the first expiration of the timer (which then will forward the scheduling of its next expiration etc. until the module is unloaded – see next).

    In your module's exit() function, use hrtimer_cancel() to cancel the module's timer.

    Compile your kernel module (per the instructions given in Studio 4), and use insmod to load it (and rmmod to unload it) with different parameter values (including not passing any module parameters to insmod so that the defaults are used) on your Raspberry Pi. To pass parameters to your module via the insmod utility, you will need to give the name as well as the value of each parameter, as in:

    sudo insmod monitor_mod.ko log_sec=0 log_nsec=100000000

    For each of those runs, monitor and verify the timing of when your timer wakes up, using the system log. In the studios so far, we have accessed the system log through the command dmesg. However, the system log is also accessible through the file /var/log/syslog. This makes it possible to view real-time updates to the system log with the command:

    tail -f /var/log/syslog

    You can alternatively use the following command to see real-time updates to the system log. It has the advantage of providing the same formatting as dmesg, but it does not limit its initial output (as with tail) to 10 lines:

    dmesg -wH

    Hypothetically your timer should wake up and execute regularly (e.g., at each occurrence of whatever interval was specified in the module parameters, or every second if no parameters were passed via insmod). However, fluctuations in the actual timing of execution, called variations, may occur in practice. The dmesg command gives timestamps with six significant digits (i.e. microseconds resolution). Use those timestamps to estimate the timing variations in the intervals between expirations of your timer, for different module parameter values.

    Add a section titled "Timer Design and Evaluation" to your readme.txt file, and in it describe briefly (in a paragraph or two) how you implemented the requirements above. In that same section please also paste fragments of the system log that show your timer being rescheduled several times in each of a few runs with different parameter vales, and then comment briefly on how much the timing of your timer varies from the specified interval, based on the results you observed.

  4. Please review the kernel thread interface, which is found in include/linux/kthread.h, and is very similar to the POSIX threading interface.

    Add another static variable to your module, of type struct task_struct *, which will hold a pointer to the task struct for your module's (currently single) kernel thread.

    Add another static function to your module, which returns an integer and takes a single parameter of type void *. This function will run in one or more kernel threads when they are spawned by the module. In the body of that function, first use printk() to generate a message that contains the name you gave this function, to show that it ran, and the function then should return 0 to indicate it completed successfully.

    Note that the syntax int fxn (void *) is a nicely general and widely used style for writing functions that are run in threads, and also (per page 36 of the LKD text book) is the syntax that is expected by the kthread_create and kthread_run functions in Linux. It also is used for functions that create user space threads in various operating systems, as it allows you to pass a pointer to a generically typed payload of information that the thread can use (by casting the pointer to a pointer to the appropriate struct or data type), or if you don't want to pass information to a thread you can pass it a null pointer.

    Then, modify your module's init() function so that when the module loads it calls kthread_run() to spawn a kernel thread (which in turn runs the static thread function you wrote) and stores the result of the kthread_run()call into the module's static task struct pointer variable.

    Note that calls to kthread_run() and kthread_create() potentially may fail, so it's essential that you (1) initialize your static task struct pointer variable (or variables in the multi-threaded part later) to 0, and then (2) check it (or them in the multi-threaded part later) after any call to kthread_run() or kthread_create() to determine whether or not the call succeeded.

    Watch out for subtle races that may occur between the creation of the threads and the first expiration of the timer, especially when the timer is set to short invervals (like 1 millisecond or less). Generally speaking it's best to create all threads and check the pointers (as noted above) before starting the timer.

    At this point it's a good idea to compile your module and load and unload it on your Raspberry Pi and make sure it loads and unloads correctly and that the thread function is run once per load/unload of the module, before going farther with this assignment.

    It also is a good idea to make a separate backup copy of your code once you have confirmed it is working, so you have a backup point you could backtrack to if needed, in case you run into problems with any subsequent features you add.

    Modify your module's static thread function so that it repeatedly (in a loop):

    1. Uses printk() to generate a system log message indicating another iteration of the loop has started and giving the values of current->nvcsw and current->nivcsw (for debugging purposes, it also may be useful to track how many iterations have happened and add the iteration number to each of those system log messages); then
    2. Passes TASK_INTERRUPTIBLE into a call to set_current_state() and then immediately calls schedule() to suspend its execution until another piece of code wakes it up; and finally
    3. Uses kthread_should_stop() to test whether or not it should cease execution (and if so the thread function should exit the loop, print another system log message indicating it's terminating, and return 0).

    Also modify your module's exit function so that when the module is unloaded it uses kthread_stop() to alert the thread that it should cease execution. Here too it is potentially possible to introduce races in the termination of the module's thread(s) and the cancellation of its timer. Think carefully about the order in which those actions should be taken in your module's exit function, and in your writeup please describe how you addressed that issue.

    Although it may look like a variable name, current is actually a macro for the task_struct of the currently executing process. Since this code is executing in a kernel thread, and kernel threads are effectively treated like any other process in the system, current points to the task_struct that describes the module's kernel thread. The data fields nvcsw and nivcsw stand for Number of Voluntary Context SWitches and Number of InVoluntary Context SWitches, respectively, and they record the number of occurences of these two events within the kernel thread.

    A voluntary context switch occurs when a thread's execution is suspended due to its own actions (e.g., when a user space thread makes a system call that blocks on I/O or when a thread explicitly yields its execution: a user space thread would do so by calling sched_yield while a kernel thread would do that by calling the kernel level schedule function). An involuntary context switch occurs when the kernel level scheduler chooses another thread to run instead of the current one, which also is mediated through the kernel level schedule function though via a different chain of invocation.

    Modify your module's timer expiration function so that each time it is called (but before it forwards the timer's next expiration) it passes the module's static task struct pointer variable into a call to wake_up_process(), which will wake up the kernel thread so it can make another iteration of its loop (or end if the module is being unloaded).

    This may be another good point at which to compile your module and load and unload it on your Raspberry Pi and make sure that the measurements are occurring at the right intervals and are showing reasonable values, and then once it is working also save a separate backup copy of this code, before going farther with this assignment.

    Add a section titled "Thread Design and Evaluation" to your readme.txt file, and in it please describe briefly (in a paragraph or two) how you implemented the requirements above. In that same section please also paste fragments of the system log that show your kernel thread iterating several times in each of a few representative runs with different parameter values, and printing out the number of voluntary and involuntary preemptions it has experienced as of each of those iterations.

    The numbers of voluntary and involuntary context switches between threads are good indicators of contention for cores. Voluntary context switches occur when a thread finishes the function it is performing and yields or blocks its execution, whereas involuntary context switches occur when a thread isn’t finished performing its function yet, but is preempted by another thread.

    Make sure that your evaluation runs cover different enough time scales that you see meaningful differences in the relative numbers of voluntary vs. involuntary context switches. Especially, try running with shorter (towards 1 millisecond) and longer (towards a second) timer intervals to assess this.

    In your writeup, please comment briefly on (1) how much the timing of each loop's iteration varies from the specified intervals (especially in comparison to the variations you saw in the intervals between timer expirations), and (2) why you think your module's kernel thread may have been preempted voluntarily vs. involuntarily, based on the results you observed.

  5. At this point, you should have a simple functional, single-threaded monitoring framework. However, some operations only can be done (or done efficiently) on a per-CPU basis. (e.g., accesing per-CPU data structures.) If you look at a listing of other kernel threads in top or ps, you will notice that many of them end with "/0" or a similar string. This denotes that the kernel thread is pinned to (and thus only runs on) that core. For example, the thread ksoftirq/2 only executes on core 2, while migrate/1 only runs on core 1.

    Copy your existing module's .c file to another file with an appropriate name for a multi-threaded version (e.g., if your single-threaded module was implemented in monitor_mod.c you might name the new file mt_monitor_mod.c). You will turn in separate files for your single-threaded and multi-threaded monitoring modules.

    Modify the multi-threaded version of your module so that it has static task_struct pointer variables for four kernel threads, one for each core of your Raspberry Pi.

    Also modify your module's init() function so that instead of calling kthread_run() to spawn a single kernel thread, it instead:

    1. Calls kthread_create() four times (passing the thread function into each call) and stores the result of each call in a separate static task_struct pointer variable of the module;

    2. Calls kthread_bind() four times, to bind each of those threads to a separate core; and then

    3. Calls wake_up_process() four times (passing the appropriate static task_struct pointer variable each time) so that each thread begins to execute.

    Modify your module's exit() function so that it calls kthread_stop() four times, using the appropriate static task_struct pointer variable of the module for each of the threads.

    Modify your module's timer expiration function so that each time it is called (again before it forwards its next expiration into the future) it calls wake_up_process() four times, using the appropriate static task_struct pointer variable of the module for each of the threads, so each one wakes up an runs another iteration each time the timer expires.

    Compile and load (and unload) your multi-threaded module with different module parameter values. Verify correct behavior of your multi-threaded monitoring framework through the messages that are printed in the system log. Each of the four threads should be executing on a different processor (you can also verify what CPU a kernel thread is currently executing on with the function smp_processor_id()) within a relatively short time of one another.

    When you are satisfied that your kernel threads are behaving properly, add a section titled "Multi-threading Design and Evaluation" to your readme.txt file, and in it describe briefly (in a paragraph or two) how you implemented the requirements for multi-threading, above. In that same section please also paste fragments of the system log that show each of your kernel threads iterating several times in each of a few runs with different parameter values, and printing out the number of voluntary and involuntary preemptions it has experienced as of each of those iterations. Please choose meaningfully different period lengths such as 1s, 100ms, 1ms, 100us, etc., and be sure to include multiple runs with each period so you can get more just one data point to use in your observations. Please also make sure that your evaluations cover a sufficiently wide range of orders of magnitudes of those periods that you measure a significant change in the relative numbers of voluntary vs. involuntary context switches for at least one period compared to other periods.

    Then comment briefly on (1) how much the timing of each loop's iteration varies from the specified intervals (especially in comparison to the variations you saw in the intervals for the single-threaded case), (2) the relative frequency of voluntary vs. involuntary context switches each thread experienced for each of the period settings you evaluated (again especially in comparison to what you saw in the single-threaded case), and (3) for which periods the numbers of voluntary and involuntary context switches differed meaningfully from those for other periods (especially how the numbers of the different kinds of context switches varied relative to each other, as well as in absolute terms within each kind of context switch, over the different periods).

  6. Take a system trace with trace-cmd while your multi-threaded module is loaded. You can perform system tracing in free-running mode by simply omitting a program name. For example, the command sudo trace-cmd record -e sched_switch will trace the system until you interrupt it with CTRL-C. Take at least ten seconds of trace.

    Graphically verify your system's behavior with Kernelshark. Take a screenshot and save it in a file named trace.png. You can take a screenshot of the whole screen by hitting the Print Screen button on your keyboard, or you can install a graphical screenshot utility with sudo apt-get install gnome-screenshot (which will then show up under Accessories in the system menu).

  7. Add a section titled "System Performance" to your readme.txt file. Zoom in on several of the thread wakeups, and in that section of your readme.txt file discuss briefly (1) whether (based on what you see) your threads run to completion every time, or they are sometimes preempted, and (2) given that the threads are executing in kernel mode, what might be able to preempt them.

    Estimate the total execution time of one of your thread wakeups (the time from the start of the first thread wakeup to the end of the last thread) using Kernelshark's graphical interface. You can do this easily by setting markers with left-click and shift plus left-click. Once you have set two markers, the time between those two markers will appear in the "Delta" box. Measure three groups and report their timings. Take a screenshot of one such measurement and save it in a file named total_exec.png.

    Again looking at the groups of thread wakeups, notice that each thread does not start at exactly the same time. This variability in thread release times is called jitter. Measure the difference between the first and the last thread starting times for three wakeups and report the minimum, maximum, and mean values of those measurements in the "System Performance" section of your readme.txt file. Take a screenshot of one such group and save it in a file named jitter.png.

    Rather than using the graphical interface, you can also examine kernel traces in text format. The command trace-cmd report generates this text, and you can search for relevant log entries with grep. For example, the command sudo trace-cmd report | grep log_thread/3 would return all log entries related to the process log_thread/3. Use this technique to compute the actual running time over five wakeups of one particular thread running in your multi-threaded monitoring module, and in "System Performance" section of your readme.txt file report the minimum, maximum, and mean values you saw for that thread.

  8. At the bottom of your readme.txt file please add a section titled "Development Effort" and in it please provide an estimate of the amount of time (in total person-hours) that your team (or just you if you didn't work with anyone else) spent completing this assignment. Please also let us know if there were any parts of the lab assignment or any issues with it that led to much higher effort and/or time expenditure. This will allow the instructor to calibrate and possibly refine the content of this course in future semesters.

Things to turn in

Please make sure your readme.txt file lists the names of everyone who worked on the project. If you are working in a team, if possible have just one person submit the files - however, if you must submit your solution from multiple Canvas accounts, please make sure that exactly the same files (and versions of those files) are submitted in each case.