CSE 422S: Studio 14

Program Execution and Debugging


"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"

—Brian Kernighan, The Elements of Programming Style

This studio has two goals. First, it should make you familiar with the compilation, linking, and loading processes. By the end, you should be comfortable with the differences between static and dynamically linked programs, including binary size and execution time, and better understand how the runtime linker works. Optionally, you will learn how to create your own dynamic library.

Second, this studio will introduce you to several important utilities for analyzing and debugging programs. If you took CSE 361, you should already be familiar with GDB, but this studio should help refresh your aptitude using that tool. Additionally, it will guide you through the use of several other (possibly new) utilities, such as pmap, nm, readelf, and ldd.

In this studio, you will:

  1. Compile a program that has several library function calls, with both static and dynamic linking.
  2. Investigate the differences in binary size and execution time.
  3. Learn how to use the pmap, nm, readelf, and ldd utilities to compare the binaries.
  4. Debug program execution with GDB

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.

As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas. If you work in a group with other people, only one of you should please upload the answers (and any other files requested) for the studio, and if you need to resubmit (e.g., if a studio were to be marked incomplete when we grade it) the same person who submitted the studio originally should please do that.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.


Required Exercises

  1. As the answer to the first exercise, list the names of the people who worked together on this studio.
  2. First, you will need a program to compile, link, and debug. Please download a code package that includes five programs here.

    Use sftp to transfer that file over to shell.cec.wustl.edu.

    Alternatively, you may download the package directly onto shell.cec.wustl.edu by issuing the command:

    wget https://classes.engineering.wustl.edu/cse422/studios/test_programs.tgz

    Once you have done that, log in to shell.cec.wustl.edu, then connect to the Linux Lab cluster by issuing the command:

    qlogin -q all.q

    Issue the command gcc -v and make sure that the compiler version is GCC 8.3.0 (if not, please follow the instructions in the first studio, which had you add the command module add gcc-8.3.0 to your ~/.bashrc file, which will then run the command whenever you log in).

    Now, unzip the package. For this studio, you will be using the program arr_search.c. As the answer to this exercise, describe briefly what this program does.

  3. Now, you will compile two different binaries for this program, using no compiler optimizations.

    Issue the following commands:

    gcc arr_search.c -o arr_search_dynamic -O0 -lm
    gcc arr_search.c -o arr_search_static -O0 -static -lm

    Now, issue the command ls -lh

    This will tell you the size of each compiled binary. As the answer to this exercise, please list the size of each one, and explain briefly why they are (or are not) different.

  4. The time utility allows you to measure a program's execution time.

    Use this utility to measure the execution time of your dynamically linked program first. Please use a number of iterations that allows it to run for more than 0.1s. We recommend a command like the following:

    time ./arr_search_dynamic 100000

    Once you have an appropriate number of iterations, write down this value, as well as the "real" value reported by the time utility.

    Now, measure the execution time, with the same number of iterations, for your arr_search_static binary.

    As the answer to this exercise, please tell us:

  5. The ldd utility, which stands for "List Dynamic Dependencies," can be used to print the shared objects or libraries required by a provided binary. You can read about its syntax and usage with man 1 ldd

    You'll now use ldd to compare your two binaries. Issue the commands:

    ldd arr_search_static
    ldd arr_search_dynamic

    As the answer to this exercise, please copy the output of each command, and explain why they are (or are not) different.

  6. The nm utility, short for "names," prints the symbol table (name list) of the provided binary. You can read about its syntax and usage with man 1 nm

    Now, use nm to compare your two binaries. Issue the commands:

    nm arr_search_static nm arr_search_dynamic

    Compare the outputs, and as the answer to this exercise, please tell us what you noticed.

    Additionally, please tell us how many lines of output resulted from each command. You can use the wc (Word Count) utility to compare lines of output, like so:

    nm arr_search_dynamic | wc

    This will list the respective line, word, and character counts of the output.

  7. The readelf utility displays information about ELF (Executable and Linking Format) binary files. While similar to nm, it displays information differently. You can use man 1 readelf to learn more.

    Please use readelf to examine your dynically linked binary, like so:

    readelf -a arr_search_dynamic

    As the answer to this exercise, please tell us how readelf differs from nm, and when you might prefer to use one utility over the other.

  8. The pmap utility, which stands for "Process Map," reports the memory map of a running process. You can read more about it with man 1 pmap

    Make a copy of arr_search.c, then modify the copy so it attempts to read a value from stdin before returning from main(), for example with the scanf function. This will allow you to use pmap to see its memory map before it exits.

    Compile your new program dynamically.

    Now, run it in the background, like so:

    ./arr_search_copy 1 &

    When you run a program in the background in this way, your shell should report the PID of the program, something like:

    [1] 28394

    Where 28394 is the PID. You can look at its memory map with the command:

    pmap <pid>

    Replacing <pid> with the reported pid.

    As the answer to this exercise, please copy the output from pmap

    You can terminate your background job by issuing the command:

    kill -9 <pid>

  9. The Gnu Debugger (GDB) is a utility that should be in every Linux programmer's toolkit. Every programmer makes mistakes, and debugging is just as important as writing code. You should be familiar with GDB from CSE 361, but if not (or if you need a refresher), we will examine some of its functionality.

    In this studio, we will use breakpoints and watchpoints. Breakpoints allow us to interrupt execution of the program when a specific instruction address is reached, while watchpoints allow us to interrupt execution when a certain memory address is modified. Breakpoints and watchpoints are both extremely powerful, and in this studio we are just scratching the surface of what can be done with them.

    First, you will need to recompile the original arr_search.c with extra debugging information:

    gcc arr_search.c -o arr_search_debugging -O0 -g -lm

    Next, have GDB load the compiled binary:

    gdb ./arr_search_debugging

    As the answer to this exercise, give the output that was produced by this command.

  10. Now, we will add a breakpoint to our program. This will cause program execution to pause when the breakpoint is reached. Add a breakpoint to the function call library_calls with the command:

    b library_calls

    You can have GDB run your program with the command r. Any values provided after r are passed as arguments to your program. Run your program for two iterations by issuing the command:

    r 2

    As the answer to this exercise, show the output from this command.

  11. At this point, GDB should have stopped at the breakpoint you specified.

    The c command allows you to continue to the next breakpoint. Issue this command, and as the answer to this exercise, please describe what happened

  12. The n command allows you to proceed to the next line of code. Issue this command so that the malloc call occurs and the values pointer variable points to a region of dynamically allocated memory.

    The p command allows you to print the value of a variable. Use this to get the address that is stored in the values variable by issuing the following command:

    p values

    As the answer to this exercise, please show the output from this command.

  13. Now, you will set a watchpoint on the values array. Whenever this location is modified, GDB will pause execution. Set a watchpoint at this address using the command:

    watch *(float *) <address>

    For example, if the address of values is 0x602010, use the command:

    watch *(float *) 0x602010

    Now, we will also set a watchpoint on the next element of the values array. You can do this with the command:

    watch *(float *) (values + 1)

    Once you have set these watchpoints, press c to continue execution. Continue pressing c each time GDB interrupts program execution until the program execution exits.

    As the answer to this exercise, please say what values the program assigned to the watched addresses.

  14. Things to Turn In:

    For this studio, please turn in the following:


    Optional Enrichment Exercises

    As you continue your computer science career, you may find yourself writing functions that you reuse in many programs. These exercises will teach you how to compile these functions for use in your own dynamic library. As the answer to any of these optional exercies that you would like to try, please briefly describe what you learned (and give answers to any questions the exercise contains).

  15. Every library consists of header files and source files. Source files will be compiled and linked into a shared object library, but header files will need to be included in any program making use of your library. In your home directory, create separate directories for your header files and source files for your library.

    Navigate to your header file directory. In it, create a ".h" file to declare two functions.

    In our examples, we will use the following functions:

    extern int lib_max(int * arr, unsigned cnt)

    extern int lib_min(int * arr, unsigned cnt)

    These functions return the maximum (and minimum) values from an integer array arr with cnt elements.

    Note that we do not use the names max and min to avoid conflicts with existing C library functions.

    Navigate to your source file directory. In it, create a source file for each of the functions you declared, e.g. "max.c" and "min.c"

    Write the functions using the same signatures declared in your header file (but without the extern keyword).

    As the answer to this exercise, please show the full paths of your header and source directories.

  16. Now, you will compile these into a shared object library.

    First, compile your files using position independent code with the following command:

    gcc -c -fpic max.c min.c

    The -c compiler flag instructs GCC to compile each source file without linking.

    The -fpic flag generates Position Independent Code. Position Independent Code can be loaded into any virtual address and still execute correctly. This is in contrast to code with branch and jump instructions that address absolute addresses. Position Independent Code is necessary in shared libraries, whose functions are linked dynamically into calling programs.

    Next, link these into a shared library with the following command:

    gcc -shared -o libraryname.so max.o min.o

    Be sure to replace libraryname.so with the library name of your choice. The name should start with "lib" and end with ".so."

    You should now see the shared object library in your source directory.

    As the answer to this exercise, please write the name of your shared object library file.

  17. Now, in your programs directory, create a program, "sotest.c" that uses the two library functions you created. For example, you might create an array of integers, then use the library functions to find the minimum and maximum values, which are then printed to STDOUT with calls to printf().

    Make sure to include your library's header file in your program.

    As the answer to this exercise, please show the complete code of the program you wrote.

  18. Now, it's time to compile and run your program.

    Issue the following command:

    gcc sotest.c -o sotest -I<Header Directory Path> -L<.so Directory Path> -l<Library Name>

    The -I flag instructs GCC to look for additional headers in the specified directory.

    The -L flag instructs GCC to look for additional libraries in the specified directory.

    The -l flag instructs GCC to link a specific shared object library starting with "lib" and ending with ".so". For example, specifying -ltest would link the file "libtest.so".

    Now, try to run your program. As the answer to this exercise, please show the error message that you see.

  19. The dynamic linker needs to know where to look for shared object files at runtime. Issue the following command to add your library file's directory to the list of directories it searches:

    export LD_LIBRARY_PATH=<Source Directory Path>:$LD_LIBRARY_PATH

    Now, try to run your program again. As the answer to this exercise, please show your program's output.