CSE 422S: Studio 8

Kernel Races, Atomicity, and Synchronization

"When it comes down to it, the problems with sharing data between threads are all due to the consequences of modifying data."

—Anthony Williams, C++ Concurrency in Action, Section 3.1, pp. 34

Like many userspace programs, the Linux kernel is designed to run concurrently with itself. In this case, kernel code must be careful not to create race conditions and concurrency bugs by accessing shared data without protection.

In this studio, you will:

  1. Create a race condition in a kernel module
  2. Remove the race condition by switching to an atomic data type
  3. Optionally, use kernel synchronization to resolve the race condition instead

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.

As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas. If you work in a group with other people, only one of you should please upload the answers (and any other files requested) for the studio, and if you need to resubmit (e.g., if a studio were to be marked incomplete when we grade it) the same person who submitted the studio originally should please do that.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.

Required Exercises

  1. As the answer to the first exercise, list the names of the people who worked together on this studio.

  2. Please review the kernel thread interface, which is found in include/linux/kthread.h and is very similar to the POSIX threading interface.

    Write code for a kernel module that creates one thread on each core of your Raspberry Pi, each of which simply calls the same static function and returns. In your module's init() function, call kthread_create to spawn 4 kernel threads (which each in turn runs the same static thread function you wrote) and stores the result of the kthread_create() calls in an array of 4 static task struct pointers. Then, for each thread that was created, invoke kthread_bind() to bind the thread to a particular core, and wake_up_process to start the kernel thread.

    Create your module's exit function so that when the module is unloaded it uses kthread_stop() to alert each of the threads that they should cease execution.

    At the end of the kernel thread function, repeatedly call kthread_should_stop() in a loop until the function returns true. This will happen once kthread_stop() has been invoked by the module teardown function. Within the body of the loop, it is a good idea to invoke schedule() to voluntarily preempt the thread so as to prevent the thread from constantly spinning on the CPU.

    As the answer to this exercise, show code snippets to demonstrate how you implemented this functionality.

  3. Now we will create a data race in the kernel, so that we can later resolve it. Declare a global volatile int variable named shared_data, as in:

    volatile int shared_data = 0;

    The volatile qualifier tells the compiler that this variable will be modified outside of the current execution context, and has the effect of disabling certain optimizations.

    Within the (common) function that is run by each of your threads, write a for loop that increments the shared_data variable one million (1,000,000) times, as in:

    #define iters 1000000

    for( i=0; i<iters; i++){



    In your module's exit function, print out the value of shared_data to the system log. As the answer to this exercise, explain briefly (1) why the value that should be printed if there are no data races should be 4,000,000 and (2) why a data race might cause a different value to be printed (e.g., based on the single variable data race example from the kernel synchronization I slides).

  4. Compile your module and then load and unload it three times on your Raspberry Pi, and record the value that was produced by each run. If you saw 4,000,000 consistently then you did not successfully create a race condition (make sure your variable is declared as volatile and that your threads really execute simultaneously on different cores).

    As the answer to this exercise report the values that were output by your module in each of those runs.

  5. Make a new copy of your kernel module and change the type of your global integer variable to be atomic_t (instead of volatile int), a type that is defined in include/linux/types.h. This type only should be accessed through special mutators and accessors. Initialize the variable using the function atomic_set() (in your module's init method instead of intializing it at its point of declaration), increment it with the function atomic_add() (in your module's thread function), and access it's value with the function atomic_read() (in your module's exit function). The prototypes for these functions are found in the include/asm-generic/atomic.h header file.

    As the answer to this exercise please show your code that implements these features.

  6. Compile your new module and on your Raspberry Pi load and unload it three times, and record the value that was output to the system log each time. If you saw any result other than 4,000,000 then you have not successfully resolved the race condition (make sure you used the correct type and only accessed it with the functions noted above).

    As the answer to this exercise please report the values that were output in each run.

  7. Have your threads print a statement to the system log just before they start their for loop and just after they finish it. Use this as a crude timestamp to determine how long it takes your code to execute. Compile your module, and then load and unload it on your Raspberry Pi, and as the answer to this exercise report how long it took for each of the threads to complete its loop.

  8. Things to turn in

    Optional Enrichment Exercises

  9. Make another copy of your original kernel module. This time, rather than modifying your global integer to be atomic, we will use a mutex to protect it. Use the macro DEFINE_MUTEX(mutex_name) (defined in include/linux/mutex.h) to statically declare a global mutex variable, and then use the functions mutex_lock(&mutex_name) and mutex_unlock(&mutex_name) inside of the loop within the thread function, to protect access to the shared_data variable.

    As the answer to this exercise please show the code that implements these features.

  10. Again, use kernel print statements as a crude timestamp for your thread functions. Compile your module, and load and unload it on your Raspberry Pi. As the answer to this exercise, explain briefly (1) how long your mutex-protected code took to complete compared to the atomic variable version, (2) why it might be necessary in some cases to use a mutex instead of an atomic variable.