CSE 522S: Studio 4

Observing File System Events


Linux provides an interface, inotify, for monitoring files -- for example, to see when they are moved, read from, written to, or deleted. [...] With inotify, the kernel can push the event to the application the moment it happens.

—Robert Love, Linux System Programming, 2nd Edition, Chapter 8, pp. 283.

Applying scientific analysis to system behavior (often called system forensics, even if such "detective work" is being used for purposes beyond the scope of legal investigation suggested by the term's dictionary definition) begins with detailed observation. For example, it may not be enough to know that a file no longer is found in a certain directory - for instance, was it moved to another directory, or was it deleted from the file system? Also, what was the file's "history"? Were its contents ever updated since it was created? Were its metadata ever changed?

Further, such observation requires a careful understanding of the nature of the file and the filesystem on which it resides. What does a directory's path tell us about the physical filesystem (if any) on which the directory resides? Can a filesystem be entirely virtual -- i.e. backed only by structures in memory, and not on disk? What does it mean that a file "is found" within a directory? Does the existence of the file depend on the existence of the directory, or does the directory merely contain a pointer to the file? What are the implications? Can a file exist in two directories at once (or none at all)?

Answering even such basic questions requires us to track both (1) the backing datastructure corresponding to a file, (2) the filesystem on which this datastructure resides, and (3) what happened to that file over time. The Linux df and ls utilities, among others, can help us address (1) and (2). Its inotify interface provides a means to accomplish (3), event by event with respect to the events of iterest.

In this studio, you will:

  1. Learn to use the df, ls, and mount utilities,
  2. Write a C program that emulates some of the behavior of the ln utility,
  3. Establish multiple links and bind mounts to the same file, and
  4. Initialize an inotify instance, add watches to it, read events from it, and based on those events report what they were and for some of them add new watches.
  5. Optionally, as an enrichment exercise, write a C program that emulates some of the behavior of the mount utility

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.

As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.


Required Exercises

  1. As the answer to the first exercise, list the names of the people who worked together on this studio.

  2. Log into the linuxlab cluster (where you built your kernel in the first studio exercise), and cd into the individual directory you created under /project/scratch01/compile/<username>/ to hold your code.

    Now run the command df -h

    The df utility is part of the GNU coreutils. Its primary function is to report file system disk space, but it is also useful for showing which filesystems are mounted, and where the corresponding mountpoints reside in the VFS hierarchy. For more information, see the online man page or issue the command:

    man 1 df

    As the answer to this exercise, please find the filesystems that correspond to (1) your home directory, (2) the /project/scratch01 directory, and (3) the cgroups directory (/sys/fs/cgroup). Write the values reported by the df utility for these three filesystems. What do these values tell you about where these filesystems reside and how they are attached? What do these values tell you about the size of the filesystems, and why this size may be important? What does the percentage of used space suggest?

  3. Now, on your Raspberry Pi, create a directory somewhere convenient (e.g. in your home directory or on the desktop). This directory will contain three subdirectories, each containing files and links to those files. Create those three subdirectories. Inside the first of those three subdirectories, create a file using the touch command, e.g. touch myfile.

    When you have done this, you should have a directory structure that looks something like this:

        .
        |--studio3
           |--subdir0
           |  |--myfile
           |--subdir1
           |--subdir2
        

    Now, you will create hard links to the new file you have created, one using the ln utility, and the other with the link() system call.

    Navigate into the second subdirectory you created, and create a hard link here to the file in the first subdirectory, e.g. with the command:

    ln ../subdir0/myfile

    For more information, see the online man page or issue the command:

    man 1 ln

    Next, write a user-space C program that emulates (some of) the behavior of the ln utility.

    It should take, as a command-line argument, a path to the file to which the link will be created. A second (optional) command-line argument can be supplied with a path to the link itself. If the second argument is not provided, the program should attempt to create a link in its working directory, with the link's name matching the file name.

    The program should then use the link() system call to establish the link.

    For more information, see the online man page or issue the command:

    man 2 link

    To practice good C programming style, be sure to have your program check the number of command-line arguments and the return values from system calls, then report any errors appropriately and exit if necessary.

    Compile and run your program, creating a link to the same file, this time in the third subdirectory you created.

    Now that you have three hard links to the same file, each in a separate subdirectory, navigate to the first directory you created (i.e. the directory that contains the three subdirectories) and run the following command:

    ls -i *

    As the answer to this exercise, please copy the resulting output of this command. What do you notice about the inode number reported for each file?

  4. Bind mounts allow an existing directory or file to be mounted at another location in the filesystem. Unlike a hard link, establishing a bind mount does not write to the backing filesystem, and so does not persist across reboots.

    In the same directory as the previous exercise, create two more subdirectories. In each of these, use the touch command create a file with the same name as the file in the first subdirectory. When you have done this, you should have a directory structure that looks something like this:

        .
        |--studio3
           |--subdir0
           |  |--myfile
           |--subdir1
              |--myfile (link to subdir0/myfile)
           |--subdir2
              |--myfile (link to subdir0/myfile)
           |--subdir3
              |--myfile (new file just created)
           |--subdir4
              |--myfile (new file just created)
        

    Navigate into one of the subdirectories you created for this exercise (e.g. subdir3), and bind mount the file in the first subdirectory you created in the previous exercise to the file in the working directory, e.g. with the command:

    mount --bind ../subdir0/myfile myfile

    For more information, see the online man page or issue the command:

    man 8 mount

    Now, navigate back up a level in the directory hierarchy (e.g. with cd ..) to the first directory you created (i.e. the directory that now contains the five subdirectories), and bind mount the first subdirectory you created in the previous exercise to the other directory you created for this exercise, e.g. with the command:

    mount --bind subdir0 subdir4

    You should, by now, have established three hard links to the same file, a bind mount to this file, and a bind mount to its enclosing directory. Like in the last exercise, run the following command:

    ls -i *

    As the answer to this exercise, please copy the resulting output of this command. What do you notice about the inode number reported for each file?

  5. You will now explore the inotify interface for monitoring files.

    On your Raspberry Pi, write a user-space C program that takes a single command line argument and if it is run with fewer or more command line arguments generates a helpful usage message showing how to run it correctly. In your program, first initialize an inotify instance via the inotify_init1() system call. Store the value it returns (a file descriptor if successful or -1 on failure) in an int variable, and if it was successful print out a message to the standard output stream indicating that it was initialized successfully and giving the file descriptor value.

    Using the command line argument that was passed to your program as a path to a file or directory, use inotify_add_watch() to add a watch for all events (IN_ALL_EVENTS) involving that filesystem object to the inotify instance. That system call (like the one to initialize the inotify instance) will return a file descriptor if successful or -1 on failure: print out a message giving the file descriptor and the command line argument if it succeeded, or if not say that a watch could not be added for the command line argument that was given.

    Your program should use one of the I/O multiplexing mechanisms (i.e. select, poll, or epoll) introduced in CSE 422S to watch for new events from the inotify instance's file descriptor (not the file descriptor for the watch). This should be contained in a never-ending loop that keeps going until the program is terminated with Ctrl-C.

    Whenever new events are available, it should (1) read the events from the file descriptor, and then (2) iterate through all the inotify_events that were obtained by each read and prints out each event's watch descriptor, mask, cookie, name field length, and (if that length is non-zero) name. For each event the program also should also print out a descriptive message for each inotify event type that is set in the event's mask.

    Hint: you can use a function macro with stringification to quickly check the event mask against a flag, then print the string representation of that flag, e.g. with code like the following:

    #define CHECKFLAG(event,flag) if (event->mask & flag) printf("%s ", #flag);

    CHECKFLAG(event,IN_CREATE)

    Be careful with the read buffer's length and alignment, with handling bitmasks and with C99 "zero-length" arrays (like the one at the end of the inotify_event struct), and with correctly advancing a pointer to the next event position in the read buffer. LSP Chapter 8 pp. 288-289 has a good code example that illustrates many of these issues, though you may want to improve even further on it, adding your own messages for events not mentioned there, and replacing the hard-coded 4 in the read buffer's declaration with the more portable __alignof__(struct inotify_event) macro.

    Open up another terminal window on your Pi, and in it create a new empty directory and cd into it. Compile and run your program in the original terminal window, passing it a path to the directory you just created. In the new terminal window, use the ls command and see what events your program sees it generate. Use touch to create a new file in that directory, use mv to move that file out of the directory and then to move the file back into that directory, and then again use touch on that file (which will update its timestamp). As the answer to this exercise, please show what appeared in both terminal windows, and please explain briefly what events were generated by each use of ls, touch, and mv.

  6. For this exercise, you will explore how various filesystem operations interact with links and bind mounts, by observing both the behavior of these operations directly, any error messages they may produce, and what corresponding events are generated via the inotify interface.

    Run your program again, but this time, as its command-line argument, pass it the path to the first subdirectory you created in exercise (3), i.e. the one containing the file to which you established two hard links.

    Open another terminal window on your Pi, and attempt to perform the following actions, noting any commands that generate error messages:

    As the answer to this exercise, please show the output of your program, and please explain briefly what events were generated by each command (including both inotify events or command-line error messages).

  7. Modify your program so that it multiplexes for new data available on standard input, in addition to the file descriptor for the inotify watch. Standard input will be used to provide a second path for your program's inotify instance to watch. When new data is available, it should (1) read from standard input, then (2) attempt to add a new watch to the path supplied to standard input. Again, check the return value from inotify_add_watch; on failure, your program should print an appropriate error message (including the value of errno) then try again when new data is available on standard input. On success, the program can stop monitoring standard input for data.

    Run your program, again passing it the path to the first subdirectory you created in exercise (3), then provide it the path to the second or third subdirectory through standard input, so that it monitors both subdirectories. Again, experiment with using touch and mv on the files in both directories.

    As the answer to this exercise, please show the output of your program, and briefly explain its behavior.

  8. You will now apply kernel-enforced limits to the inotify interface, and see how this affects your program's behavior.

    On your Raspberry Pi, write a user-space C program that opens the file /proc/sys/fs/inotify/max_user_watches

    It should read from this file, print its value to the terminal, and store the value to a variable so the original value can be restored when the program terminates. It should then write the value 1 to the file, then loop indefinitely.

    Establish a signal handler for SIGINT (as introduced in introduced in CSE 422S) so that when the program is interrupted by CTRL+C, it writes the original max_user_watches value back into the associated file.

    Compile and run this program in one terminal window, then run your inotify program in another terminal window. Supply the inotify program with a second directory to watch using standard input, and observe what happens.

    As the answer to this exercise, please show the output of your inotify program. What errno value was reported? What does this error tell you?


  9. Things to Turn In:


    Optional Enrichment Exercises

  10. Next, write a user-space C program that emulates (some of) the behavior of the mount utility, in particular, for creating bind mounts.

    It should take, as a command-line argument, a path to the file (or directory) which will be mounted, and as a second command-line argument, the path to the target file (or directory) that will serve as the mount point.

    The program should then use the mount() system call, with the MS_BIND flag, to establish the bind mount.

    For more information, see the online man page or issue the command:

    man 2 mount

    To practice good C programming style, be sure to have your program check the number of command-line arguments and the return values from system calls, then report any errors appropriately and exit if necessary. In particular, pay careful attention to, and report appropriately, the following scenarios:

    Compile and run your program to test it, replicating exercise (4) above, but using your program instead of the mount utility.

    Like in that exercise, run the following command:

    ls -i *

    As the answer to this exercise, please copy the resulting output of this command.

  11. Modify your C program from the previous optional enrichment exercise, so that it takes, as a third command-line argument, an option that allows the mount to be established read-only.

    If this argument is "ro", it should additionally pass the MS_RDONLY flag to the mount() system call. If the argument is "rw", or is not provided, it should establish the mount as writable. If any other argument is passed, the program should exit with an appropriate error message.

    Compile and run your program to test it, establishing a read-only bind mount to a directory that already contains a file.

    Attempt to create a new file in the directory, and attempt to write to the file already in the directory, both through the bind mount. As the answer to this exercise, please report any errors reported when you attempt these writes.


Page updated Tuesday, December 21, 2021, by Marion Sudvarg and Chris Gill.