CSE 422S: Studio 13

VFS Layer


All filesystems rely on the VFS to enable them not only to coexist, but also to interoperate.

— Robert Love, Linux Kernel Development, 3rd Edition, Chapter 13, pp. 261.

The virtual filesystem (VFS) layer allows a wide range of filesystems to be used within Linux, even if their implementation details vary significantly. Each filesystem is required to implement a common set of abstractions, which in turn allows Linux to handle them in a uniform manner. This also means that how a process views a filesystem is also standardized, allowing even kernel threads and other specialized processes to interact with different filesystem abstractions in a common and portable (at least within Linux) manner.

In this studio, you will:

  1. Write a simple kernel module that accesses the filesystem mounted on your Raspberry Pi, via a kernel thread's process descriptor (task_struct).
  2. Extend that kernel module to explore some of the VFS data structures, including directory entries for the current working directory and the root directory among others.
  3. Extend your kernel module further to do the same for userspace task.

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.

As you work through these exercises, please record your answers, and when you finish them each and every person who worked on them should please log into Canvas, select this course in this semester, and then upload a file containing them and also upload any other files the assignment asks for, and submit those files for this studio assignment (there should be a separate submission from each person who worked together on them).

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.


Required Exercises

  1. As the answer to the first exercise, list the names of the people who worked together on this studio.

  2. Write a kernel module that spawns a single kernel thread. That thread should use the current macro to access its own process descriptor (struct task_struct declared in include/linux/sched.h) and print out the values (i.e., the addresses they contain) of three of its task_struct's fields to the system log: fs, files, and nsproxy.

    These fields give a process direct access into the virtual filesystem. Respectively, these fields are pointers to the process's filesystem structure (struct fs_struct, declared in include/linux/fs_struct.h), its open file table structure, (struct files_struct, declared in include/linux/fdtable.h), and its namespace proxy structure (struct nsproxy, a struct declared in include/linux/nsproxy.h that wraps the pointer to the mnt_namespace struct described in the text book).

    Compile your module and load it on your Raspberry Pi, examine the system log to see your module's output, and then unload it. As the answer to this exercise, please show the lines of the system log that contain your module's output, including the values of the three pointers.

  3. Modify your code so that the kernel thread uses the fs field of its process descriptor to access two fields of the process' filesystem structure (struct fs_struct): pwd and root.

    These fields are path structures (struct path, declared in include/linux/path.h) for the process' current working directory and the root directory, respectively. Each of these path structures contains two fields: mnt which points to a VFS mount structure (struct vfsmount, declared in include/linux/mount.h) and dentry which points to a directory entry structure (struct dentry, declared in include/linux/dcache.h).

    Modify your module so that its kernel thread prints out the values (the addresses of the locations they point to) of both of those path structures' dentry fields. Your module should also check if the values of those pointers differ; if so, print out the strings in the d_iname fields of the directory entry structures to which they each point. Otherwise, if they point to the same directory entry structure, print out the string for its d_iname field.

    Compile your module and load it on your Raspberry Pi, examine the system log to see your module's output, and then unload it. As the answer to this exercise, please explain, based on your module's output to the system log, whether or not (and why or why not) you think the process' current working directory is the same as its root directory.

  4. You will now modify your module so that it prints the names of all of the files and directories that are within the root directory. To do so, its kernel thread will have to traverse the list of directory entries whose head is in the d_subdirs field of the directory entry structure to which the path struct for the root directory points. Because d_subdirs points to a child entry, once you have reached the first child directory entry, you will then have to traverse the list of its siblings. This list's head is d_child field of the directory entry structure.

    Special functions are needed to traverse Linux kernel data structures, as described in Chapter 6 of the LKD course text book. Review the discussion and examples in that chapter, and use the appropriate functions in your module's kernel thread to iterate over each child entry. To do so, provide the d_subdirs field as the pointer to the list head, then use d_child as the name of the subsequent member list to traverse. For each child entry, your kernel thread should print the value of its d_iname to the kernel log.

    Compile your module and load it on your Raspberry Pi, examine the system log to see your module's output, and then unload it. Then, in a terminal window on your Raspberry Pi, list the contents of the root directory using the command:

    ls -l /

    As the answer to this exercise, please show the output from your module that contains the names of the entries in the root directory. Does this differ from the output of ls?

  5. Now, modify your module so that, as its kernel thread traverses the list of directory entries in the d_subdirs field of the root directory entry, it only prints the value of an entry's d_iname field if that entry's d_subdirs list is non-empty.

    Compile your module and load it on your Raspberry Pi, examine the system log to see your module's output, and then unload it. Again, compare the output to the output of the command:

    ls -l /

    As the answer to this exercise, please show the output from your module. This time, does it differ from the output of ls? Please explain why you think there is, or is not, a difference.

  6. Modify your module so that it takes in an optional parameter of a PID so that if a valid process ID is passed in, the module instead launches a kernel thread that writes to the process's standard output. The thread function should:
    1. Declare a pointer to struct pid, then set it with a call to find_get_pid(), which is declared in include/linux/pid.h. If the pointer is NULL (i.e. the PID was invalid), the thread function should return.
    2. Declare a pointer to struct task_struct, then sets it with a call to get_pid_task, which is declared in include/linux/pid.h, using PIDTYPE_PID for the pid_type. If the pointer is NULL, the thread function should return.
    3. Declare a pointer to struct files_struct, then set it to the corresponding member of the task_struct.
    4. Declare a pointer to struct file, which will be used to point to the underlying file structure for the process's standard output. Set the pointer using the function fcheck_files, which is declared in include/linux/fdtable.h. This function uses appropriate RCU semantics to access the underlying file structure for a given file descriptor belonging to a given files_struct. Remember that standard output should have file descriptor 1.
    5. Declare a pointer to file_operations, which is declared in include/linux/fs.h, then set it to the corresponding member of the struct file.
    6. Declare a variable of type loff_t, and set it to 0, e.g. loff_t offset = 0
    7. Use a const char * variable to define a message to write to the target process's standard output, then use strlen() to get the string's length (being sure to add 1 for the null terminator character).
    8. Call the write member function pointer of your file_operations pointer, passing your file pointer, your string, the string's length, and the address of your loff_t variable.

    Now, write an additional program that simply prints its PID to the terminal, flushing output with the "\n" character, then uses a while() loop to spin indefinitely.

    Compile your module and the separate program. Run the program on the Raspberry Pi, then load the module in a separate terminal window, passing in the PID reported by the program. Examine the output from your program. Was the module able to successfully write its message to the target process's standard output?

    As the answer to this exercise, please copy all output from your target program.


  7. Things to turn in

    Optional Enrichment Exercises

  8. Make a copy of your kernel module and modify that module's kernel thread so that it does a full (recursive) depth-first traversal of the mounted filesystem, starting at the root directory. When it reaches the directory entry structure for a non-directory file, it should simply print out a line to the system log with the file's name. When it reaches the directory entry structure for a directory, it should print out that directory's name and then recursively explore that directory before visiting any of the other directory entry structures within its parent directory. As the answer to this exercise please show a fragment from the system log that demonstrates depth-first traversal of the filesystem.

  9. Make a copy of your kernel module and modify that module's kernel thread so that it does a full (recursive) breadth-first traversal of the mounted filesystem, starting at the root directory. The system log messages for this version should print out all of the directory entries within the root directory, then all the directory entrys for the first sub-directory of the root directory, then all the directory entries for the second subdirectory of the root directory, etc. As the answer to this exercise please show a fragment from the system log that demonstrates breadth-first traversal of the filesystem.