CSE 422S: Studio 7

Tracing the Linux Kernel

"Yes," he said, "they are quite plain: a hobbit's footprints. Pippin's I think. He is smaller than the other. And look at this!" He held up a thing that glittered in the sunlight. It looked like the new-opened leaf of a beech-tree, fair and strange in that treeless plain.

"The brooch of an elven-cloak!" cried Legolas and Gimli together.

"Not idly do the leaves of Lórien fall," said Aragorn. "This did not drop by chance: it was cast away as a token to any that might follow."

The Two Towers, Book III, Chapter 2

One reality of working with the Linux kernel is that traditional troubleshooting tools such as debuggers are much less accessible. Even the time honored printk function may be too heavyweight to be used in time sensitive code paths such as interrupt handlers and scheduler code. However, there are very robust tracing utilities available to the kernel developer that allow you to record system behavior and review it after execution.

In this studio, you will:

  1. Use the Linux system call tracer strace
  2. Use the Linux kernel tracer (ftrace) via two utilities called trace-cmd and Kernelshark

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.

As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas. If you work in a group with other people, only one of you should please upload the answers (and any other files requested) for the studio, and if you need to resubmit (e.g., if a studio were to be marked incomplete when we grade it) the same person who submitted the studio originally should please do that.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.

Required Exercises

  1. As the answer to the first exercise, list the names of the people who worked together on this studio.

  2. First, write a C program that only returns zero (i.e. does nothing). We'll also be using the test programs we used in previous studios. You can download these programs here. Make sure the program compiles on your Raspberry Pi, and as the answer to this exercise, please show the code for this program.

  3. On your Raspberry Pi, maximize a terminal window (to full screen size) and in it invoke the system call tracer to run your executable do_nothing program, as in:

    strace ./do_nothing

    This will print the output of the strace program to standard error. To re-direct this output and make it easier to view, re-direct it to a file:

    strace ./do_nothing 2> syscalls.out
    cat syscalls.out | less

    This shows you all of the system calls that are made in the process of executing the program (in this case, all the effort it takes to call main and return 0 from it). Scroll through that file to see all those calls, and then when you are finished hit q to quit. You can get information about interpreting the output of strace on the man 1 strace manual page.

    Every system call listed in that output file has a manual page entry that will help you understand what it does. In this case, you should specify the second section of the man pages, which covers system calls, since otherwise it's possible you might get the wrong page. For example, the command

    man write

    will bring up the documentation for the program called write, while the command

    man 2 write

    instead gives you the documentation for the system call named write. See the documentation about the manual pages themselves (i.e. "man man") for more details about their layout.

    Pick out two of the system calls you see in the output from running your do_nothing program in the system call tracer, and as the answer to this exercise please describe what you think they are accomplishing in this context (based on the man page descriptions of those system calls).

  4. The system call tracer also lets us see how our program's commands are eventually implemented by intermediate libraries such as the C standard library. Use strace to trace the sing program from the previous studio, as in:

    strace ./sing 1

    As the answer to this exercise, please say what system call is invoked by the call to printf (hint: it's the one that is sending strings to the standard output stream towards the end of the trace).

  5. The system call tracer also can help you understand implementation details of libraries (as well as program binaries for which you lack the source code). Look at the trace results of running

    strace ./dense_mm 100 &> dense100trace.txt

    and compare that to the results of running

    strace ./dense_mm 300 &> dense300trace.txt

    Note that the &> operation redirects both the standard error stream and the standard output stream into a file - if you want to see the output while also sending it all to a file you can add the tee command to your command line.

    Once you have done that, you can observe the differences between the runs by comparing the output files, as in

    diff dense100trace.txt dense300trace.txt | less

    which lets you you scroll up and down to see all the places the output differs - most of the output is similar except for the relative memory addresses (since the programs likely were loaded at different base addresses), until you get to the point where the system calls that are being invoked differ.

    As the answer to this exercise please say which system calls are used by each program, in the place (or places) where the sets of system calls being used are different.

  6. Write and compile a short program that uses the C standard library function malloc to allocate ten memory regions of 100 bytes each. Profile this program with strace, and save the results to a file.

    Modify your program to allocate ten memory regions of 1,000,000 (one million) bytes each. Again profile it with strace, and save the results to a file.

    Compare the results files for these two different runs, and as the answer to this exercise please say which different system calls are used (in the places where the system calls that are being used are different).

  7. Now we're going to switch to a different tracing utility. The tracer itself is called ftrace (short for Function Tracer) and is a part of the Linux kernel. We will be using two utilities called trace-cmd and Kernelshark to access the function tracer.

    On your Raspberry Pi, install the trace-cmd utility with the command

    sudo apt-get install trace-cmd

    and as the answer to this question please say approximately how long it took for the trace-cmd utility to install on your Raspberry Pi.

  8. The function tracer is extraordinarily powerful, and a full exploration of its capabilities is beyond the scope of this studio. However, to generate your first trace, and to get an idea of what it can do, execute the command

    sudo trace-cmd record -e sched_switch ./dense_mm 100

    The record option specifies that we want to record a new trace. The -e sched_switch option specifies that the event we want to trace is invocations of the sched_switch kernel function (which responsible for switching processes). A file called trace.dat is generated. For more information, see the man 1 trace-cmd and man 1 trace-cmd-record man pages.

    Now, use the

    trace-cmd report

    utility to inspect the trace.dat file. The second column gives the CPU id where each sched_switch event occurred. The last column shows the processes that were scheduled in/out during the event.

    As the answer to this question, please indicate how many sched_switch events occurred on each CPU core (there are 4 cores on your Raspberry Pi).

  9. Now we will use Kernelshark to inspect our trace.

    Unfortunately, on recent versions of the Raspberry Pi OS there appears to be an incompatibility between QT5 and the OpenGL direct rendering infrastructure that prevents Kernelshark from rendering the event time line correctly, and so we will run Kernelshark on the Linux Lab machines instead, via the Academic Linux Desktop web interface. Please (1) sftp your trace.dat file from the previous exercise onto one of the Linux Lab machines, (2) open up a browser and go to https://linuxlab.engr.wustl.edu/, and then open up the Academic Linux Desktop (found on the Interactive Apps menu at the top) and cd into the directory where you put the trace.dat file. Then, run Kernelshark on that file in the Academic Linux Desktop by issuing the command

    kernelshark trace.dat

    By default, you will be looking at a timeline for each CPU core in the system. Each process in the system will be given a unique color so you can track individual processes as they are scheduled on and off of processors as well as when they may be migrated between cores.

    Start by zooming in on the trace until you can make out discrete events. To zoom in: press and hold the left mouse button; drag the cursor to the right; and then release to define a zoom window. Zooming out is the reverse: press and hold the left mouse button; drag the cursor to the left; and then release the mouse button.

    As the answer to this question, please give the names of at least two different processes that were running on the same core according to this trace.

  10. We can also enable a process-centric view rather than a CPU-centric view. In the Kernelshark window, go to the Plots menu, select Tasks, and then find the process dense_mm and click on the check box to activate it. Scroll down or enlarge the viewing window until you see the timeline for that process at the bottom. This timeline only shows the activity of this one process, and different colors represent execution time on different processors (red boxes on this timeline represent time where this task was not scheduled on any processor).

    You can use the CPU and task timelines to see exactly how your process executed over its lifetime. If you zoom in to where you can see discrete events, you can mouse over those events to see exactly which processes are switched. As the answer to this exercise, please say which other tasks preempted the dense_mm process.

  11. One of the CPUs is likely to have spent most of its time executing the dense_mm process. Look at the other three CPU cores, and as the answer to this exercise please answer the following questions. What did the other cores spend their time doing? How does the frequency of sched_switch events differ, either between processors or on a given processor throughout the trace?

  12. Odds are that dense_mm doesn't create a really interesting trace because it can only use one processor at a time. Generate a new (parallel) program trace with the command

    sudo trace-cmd record -e sched_switch ./parallel_dense_mm 200

    and open it with Kernelshark. Identify a spot in the timeline where you feel the system is not using CPU resources as effectively as it is in other areas. As the answer to this exercise, please explain briefly what you think is happening in that place, and why you think it's happening.

  13. Things to Turn In:

    For this studio, please turn in the following:

    Optional Enrichment Exercises

  14. The system call tracer strace can be used on any binary. Try using it on common programs such as ls, touch, or su. You can even attach it to running programs such as the ssh authentication daemon with the -p option. This is a great first step towards reverse engineering a binary or looking for security vulnerabilities, though we don't cover that in this course. As the answer to this exercise please explain briefly what you did and what you observed when you did that.

  15. The list option can be used with trace-cmd to query ftrace about what kinds of tracers are currently available on the system. The command "sudo trace-cmd list -e" will print out all available events. You can narrow the possible events with simple pattern matching: the command "sudo trace-cmd list -e sched" will list all events with "sched" in their name, which includes every event in the scheduler subsystem as well as a few others. See the man 1 trace-cmd-list man page for more info. Try this, and as the answer to this exercise please explain briefly what you saw and what you learned when doing so.

  16. Ftrace contains many different tracers that let you measure different aspects of your system. For example, you can measure total kernel stack size, how long it takes interrupts to run, how long interrupts and kernel preemptions delay your processes, and more. The combination of Ftrace, trace-cmd, and Kernelshark is extremely powerful. Try searching the documentation (or online) to find examples for doing all of the above, as well as other neat things people have done. As the answer to this exercise, please describe briefly what you found.