CSE 422S: Studio 16

Buffer and File I/O


All system calls for performing I/O refer to open files using a file descriptor, a (usually small) nonnegative integer.

— Michael Kerrisk, The Linux Programming Interface, Chapter 4, pp. 69.

User space programs running atop Linux may interact with files through a combination of Linux syscalls and standard library functions that support a wide range of input and output operations. Exploring the different ways in which user space programs can input data from files into variables and character buffers, and can output data from variables and character buffers into files using those calls is the focus of today's material.

In this studio, you will:

  1. Write simple user space programs that can output different kinds of data to a file, and input different kinds of data from a file.
  2. Invoke low-level Linux syscalls to perform basic operations such as opening and closing a file, and reading data from it and writing data to it.
  3. Call portable functions provided by the standard input/output library to do more sophisticated things like formatted input and output with both files and character buffers.

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.

As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas. If you work in a group with other people, only one of you should please upload the answers (and any other files requested) for the studio, and if you need to resubmit (e.g., if a studio were to be marked incomplete when we grade it) the same person who submitted the studio originally should please do that.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.


Required Exercises

  1. As the answer to the first exercise, list the names of the people who worked together on this studio.

  2. Write a simple user space program that checks the number of arguments that was passed to it (which is given by the expression argc - 1).

    If the program was passed at least 2 command line arguments it should (1) use fopen to open up the file that is named in the first argument (which is found in argv[1]) for writing (discarding any previous contents if the file already exists), (2) use fprintf with the appropriate formatting string to output each of the subsequent program arguments (found in argv[2] through argv[argc - 1] inclusive) to the file on its own line, (3) use fclose to flush the file stream buffer and close the file, and then (4) return 0 to indicate success.

    If it was not passed at least two arguments the program should use printf to output a helpful usage message that uses its name (which is found in argv[0]) and illustrates how to invoke it correctly, and then should return a non-zero value to indicate that it did not complete successfully.

    Compile and run your program with different command lines, and for any well formed command line that did not generate the usage message, use the cat shell command to show the contents of the file that was produced. As the answer to this exercise please show the output from one unsuccessful run and one successful run of your program, as well as what the file contained after the successful run.

  3. Make a new copy of your program and modify that new version so that instead of using fprintf to print the additional arguments to the file it uses fputs instead, to produce the same file formatting as your original program. Note that this will require you to use an additional character string besides the program arguments, to add line breaks.

    Compile your new program and run it to confirm that it can produce the same output format as the original program. As the answer to this exercise please show the code for the new program.

  4. Make a new copy of your program from the previous exercise, and modify that new version so that instead of using the fopen, fclose, and fputs library functions to print the additional arguments to the file, it instead calls open, close, and write to produce the same file formatting as your original program. Note that this will require you to use an integer variable to hold a file descriptor, instead of a file stream pointer variable, and you will need to specify the appropriate flags in the call to open per the table given on pp. 74 in section 4.3.1 of the LPI textbook. You will still need to use a separate string to produce line breaks, and each call to write will need to pass in the number of bytes that should be written from each string (which the strlen library function can provide).

    Compile your new program and run it to confirm that it can produce the same output format as the original program. As the answer to this exercise please show the code for the new program that uses those lower level syscalls.

  5. Make a new copy of your original program (the one that used fprintf) and modify that new version so that instead of replacing the file's contents if they already exist, it appends to the end of the file. Also modify the program so that it outputs a space instead of a line break in between each of the subsequent program arugment strings and only after the last one outputs a line break, so that each time it runs it adds a single new line to the file.

    Compile your new program and run it a few times with the same file name but with different subsequent arguments, and after that cat the contents of the file to confirm that the arguments were correctly concatenated into a separate line that was appended for each run of the program. As the answer to this exercise please show the output of those runs to confirm that.

  6. Make a new copy of your program from the previous exercise, and modify that new version so that before writing the arguments out to the line it counts how many bytes (characters) long the line will be and uses fprintf to output to the file (all on the same line): (1) an unsigned long integer value for the length of the line that follows that number, then (2) a space character, then (3) the program argument strings with spaces in between them, and then (4) the line break.

    Compile your new program and run it a few times with the same file name but with different subsequent arguments, and after that cat the contents of the file to confirm that each line's length value and formatting are correct. As the answer to this exercise please show the output of those runs to confirm that.

  7. Write a new user space program that takes a single command line argument that it treats as the name of a file that is formatted according to the previous exercise.

    The program should declare two unsigned integer variables, both initialized to 0, and a pointer variable that is also initialized to 0, with which to manage data of previously unknown length that it will read from the file.

    One of the unsigned integer variables should be used to remember the largest length value it has seen so far (which also should be the size of the dynamically allocated memory to which the pointer variable points), and the other variable should be used to read in each new length value from the file.

    The program should open the file for reading and until it reaches the end of the file it should repeatedly:

    1. read in the newest length value, as an unsigned integer;
    2. if (and only if) the newly read length value is greater than the largest length value that the program has seen so far, then (a) if (and only if) the largest previously seen length value is non-zero call free to free the dynamically allocated memory to which the pointer variable points, then in any case (b) store the newly read length as the new largest length that has been seen, and then (c) dynamically allocate memory that can hold that new largest length and store its address in the pointer variable;
    3. read in the rest of the line, including the line break, into the memory pointed to by the pointer variable; and
    4. print out (to the standard output stream) the contents of the memory that was just read in, using the line length value to avoid printing out extraneous data if for example a shorter line was read in after a longer one.

    Note that instead of using free and then malloc to resize the dynamically allocated memory in step 2 of this exercise, after the first call to malloc you could use realloc alternatively to accomplish the same thing.

    Note that if the program ever uses malloc to allocate memory dynamically, it should call free to release that memory before it exits.

    Compile your new program and run it with the name of a file that was produced in the previous exercise, to confirm that data from each line of the file is printed out on its own line, and that no extraneous characters are printed. As the answer to this exercise please show the output of the program to confirm that.


  8. Things to turn in

    Optional Enrichment Exercises

  9. Repeat any of the required exercises using scatter/gather I/O syscalls. As the answer to this exercise, please say which ones you tried, and for any that you were able to repeat successfully using scatter/gather I/O please explain briefly how you did that, and if you ran into any complications doing that please describe those briefly as well.

  10. Repeat any of the required exercises using I/O operations to move data into and/or out of a character buffer instead of a file. As the answer to this exercise, please say which ones you tried, and for any that you were able to repeat successfully using a character buffer as the repository for the data, please explain briefly how you did that, and if you ran into any complications doing that please describe those briefly as well.