CSE 422S - Operating Systems Organization

CSE 422S: Studio 5

Process Family Tree

There were many Bagginses and Boffins, and also many Tooks and Brandybucks; there were various Grubbs (relations of Bilbo Baggins' grandmother), and various Chubbs (connexions of his Took grandfather); and a selection of Burrowses, Bolgers, Bracegirdles, Brockhouses, Goodbodies, Hornblowers and Proudfoots. Some of these were only very distantly connected with Bilbo, and some of them had hardly ever been in Hobbiton before, as they lived in remote corners of the Shire.

—The Fellowship of the Ring, Book 1, Chapter 1

Processes (and userspace threads and kernel threads) are one of the most important abstractions in any operating system. They are the basis for scheduling, memory management, accounting, and more. Even the kernel spawns a variety of kernel threads to manage kernel-level services!

In this studio, you will:

Write simple userspace programs to learn more about working with processes
Learn how to do simple kernel module I/O
Write a kernel module that explores the task_struct process data structure

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.

As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas. If you work in a group with other people, only one of you should please upload the answers (and any other files requested) for the studio, and if you need to resubmit (e.g., if a studio were to be marked incomplete when we grade it) the same person who submitted the studio originally should please do that.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.

Required Exercises

As the answer to the first exercise, list the names of the people who worked together on this studio.
Write a short program called simple_fork.c that uses the fork() function to spawn a child process from a parent process. The parent process should print a statement before the fork and print out the child's PID after the fork. The child process should print out a statement after it has been spawned that uses the getpid() and getppid() functions to print out its PID and its parent's PID. Compile and run your program on your Raspberry Pi, and show your program's output as the answer to this question.
Write a second program called tree_fork.c that takes an integer as a command line parameter, and up to a fixed limit of 10 produces a "binary family tree" of processes with that many generations. This will create 2ⁿ - 1 processes, where n is the number of generations. For n=1 the program will create a single process, for n=5 it will create 31, and for n=10 it will create 1023 processes, etc. We limit the number of generations to be 10 at most, since larger numbers would take a long time to run, and could even freeze your Raspberry Pi!
If the command line parameter is not provided the program should output a helpful usage message and exit. Otherwise, it should convert the command line parameter into an integer and store it in a generations variable.
If the generations variable is less than 1 or greater than 10, the program should output a helpful usage message and exit. Otherwise, it (and any child processes it spawns) should do the following (with the original program being the first generation):
1. print out a line that says what generation it belongs to and its own PID;
2. increment the current generation;
3. if the last generation has been reached (according to the generations variable) simply return 0, or otherwise spawn two child processes;
4. use wait(0) to wait for any successfully spawned child processes to complete; and then
5. return 0 if both were successfully spawned, or -1 if either spawn (or both) failed.
Hint: it is important to check the pid_t return values from each call to fork() (or whatever call you used to spawn each child process) carefully, since (1) these calls may fail as indicated by a negative return value, (2) 0 indicates that the child process is running, and (3) a positive number indicates that the parent process is running.
Hint: based on this, it's straightforward to implement the logic for this exercise using either recursion, or a while loop with a couple pid_t variables (one for each child) and judicious use of the continue statement for code branches that are running in a child process.
As the answer to this exercise, show output of your program when it is run with 4 generations (which should spawn a total of 15 processes if implemented correctly).
Now we're going to return to kernel module design using these ideas. First, from your linux source directory, find and copy the file samples/kobject/kobject-example.c into the directory where you keep your kernel module code. This is a kernel module that uses a feature called kobjects to provide an interface to exchange data between the kernel and userspace. Each data item is called an attribute, and for each attribute you provide a show and store function that is called when userspace reads and writes these values, respectively.

This particular module provides three attributes: foo, baz, and bar. Once loaded, you can find them in the sysfs filesystem under the /sys/kernel/kobject_example/ directory.
Modify this file so that (by using printk) a system log message is printed when any of foo, baz, or bar is updated, and the old value as well as the new value of each variable that was updated is shown in the message.

Now we can build your modified module as we did previously. First, update your Makefile so it contains the appropriate .o file target for your new module, and then generate the .ko file for your new module, as follows:

module add arm-rpi

LINUX_SOURCE=path to your Linux kernel source code

make -C $LINUX_SOURCE ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- M=$PWD modules

Then, use sftp to copy the generated .ko file over to your Raspberry Pi, and load the module using sudo insmod.
From a terminal with root permissions you can then read values out of these attributes with the cat command, and you can write values into these attributes with the echo command, as in:

sudo bash

and then

cd /sys/kernel/kobject_example/

and then

cat foo

which will show the current (original) value of the foo attribute and then

echo 42 > foo

which will write the value 42 into the attribute foo and then

cat foo

which will show the new value of attribute foo. Note that you must have a root terminal (which sudo bash gives you) to write into these commands (i.e. sudo echo doesn't work).
As the answer to this exercise, show output that demonstrates your module successfully changing the values of foo, bar, and baz, using these commands.
Now we're going to write a kernel module that reads a PID through the sysfs interface and prints that process' ancestry in the system log.

Make a new kernel module file called family_reader.c that is based on your modified kobject-example.c file (i.e., to start with, please make a copy of it). This module should create a single system attribute (that's integer valued as in the previous exercise) under /sys/kernel/fam_reader/. When you write an integer to this attribute, your module should try to print out the PID's ancestry (i.e., its parent and then its parent's parent, etc. all the way up to the init task). There are a few steps involved:

Side note: The modern Linux kernel makes a distinction between "real" and "virtual" PIDs for the benefit of migrating processes across different virtual hosts. The virtual PID is the PID that a process sees from userspace.
1. You will need to convert your integer input into a proper kernel PID. Use the function find_vpid() (please see include/linux/pid.h and kernel/pid.c), which returns a pid*. This function can fail, so be sure to check its return value before dereferencing the pointer.
2. Next you can convert your pid* to a task_struct* by passing it into the function get_pid_task() (please see include/linux/pid.h and kernel/pid.c) along with the flag PIDTYPE_PID. This function can fail, so be sure to check its return value before dereferencing the pointer.
3. Once you have a task_struct*, you can access any of the data it stores. In particular, the real_parent field stores a task_struct* pointer for the process that spawned it, and the comm field is a string that gives the command name. Note: there is a separate field called parent, which is not what we want for this exercise. The parent is the logical parent that shares process group signals and allows for waiting between parent and child.
4. Work your way back up the family tree, printing out each task's PID and command name, all the way back to the init task with a PID of one.
Compile your new kernel module as you did for the previous one, and then use sftp to copy the resulting .ko file over to your Raspberry Pi. On your Raspberry Pi, install the module, and then use sudo bash to give your terminal session root access. In the directory for that module (which will be under /sys/kernel/), use cat and echo to read and write the value of that module's attribute, and then use dmesg to confirm that the system log shows your module is working correctly.
Use the ps command to find the PID of the sudo process that is running in your current terminal window, use echo to set the attribute of your module to that PID, and then use dmesg to see the ancestry of that process. As the answer to this exercise please show the messages from the system log that show the ancestry of the sudo process.
Optional Enrichment Exercises
The task struct includes a lot of interesting process data and process accounting. Try printing out other fields into the system log, and as the answer to this exercise please describe briefly what you printed out and show some of the system log messages from doing that.
The files in kernel/sched and include/linux/sched include a lot of facilities for working with tasks, including the ability to modify specific tasks or iterate over every task in the system. For example, include/linux/sched/signal.h defines macros such as for_each_process(). Try out some of those capabilities, and as the answer to this exercise please describe briefly what you did and what you observed when you did that.

Things to turn in

The answers to the above exercises
simple_fork.c
tree_fork.c
Your modified kobject_example.c
family_reader.c

CSE 422S: Studio 5

Process Family Tree

Required Exercises

Optional Enrichment Exercises

Things to turn in