CSE 422S - Operating Systems Organization

CSE 422S: Studio 13

Shared Memory

Shared memory is a low-level, fast, and powerful method for sharing data between concurrently executing processes. This technique allows two separate processes to read and write the same physical memory, providing an inter-process communication (IPC) mechanism that facilitates the development of efficient multi-process/multi-threaded programs.

In this studio, you will:

Create fixed-size shared memory regions across processes
Implement a basic but robust concurrency protocol to manage concurrent reads and writes
Clean up the shared memory regions safely
Benchmark shared memory speed

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.

As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas. If you work in a group with other people, only one of you should please upload the answers (and any other files requested) for the studio, and if you need to resubmit (e.g., if a studio were to be marked incomplete when we grade it) the same person who submitted the studio originally should please do that.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.

Required Exercises

As the answer to the first exercise, list the names of the people who worked together on this studio.
Once constructed, the basic interface to a shared memory region is just a void* (pointer to an unspecified type). As we've seen in past exercises, void* is the generic, type-agnostic interface to a contiguous region of virtual memory. As we've done in the past, we will start by declaring a structure that we will use to impose order, and cast this structure into the void* regions that we allocate. Our basic shared data structure will be a constant-size array.

Create a header file, and in it do the following.
- Define (e.g., using a #define statement) a constant string that will serve as the name for your shared memory region. By convention the first character of that string should be a forward slash and there shouldn't be any other slashes in that string. Specifically, quoting from the man 3 shm_open page:
  
  "For portable use, a shared memory object should be identified by a name of the form /somename; that is, a null-terminated string of up to NAME_MAX (i.e., 255) characters consisting of an initial slash, followed by one or more characters, none of which are slashes."
- Define (e.g., using a #define statement) a symbolic value for the size of the shared array: for testing, a size of around 10 should be sufficient.
- Declare a structure that will organize your shared data, and in it declare four fields:
  - volatile int write_guard
  - volatile int read_guard
  - volatile int delete_guard
  - volatile int data[shared_mem_size]
As the answer to this exercise, explain briefly why it is important that both of the processes that will be sharing the memory region see the same layout for this structure (which they can achieve trivially by including the same header file).
Our concurrency approach in these exercises will implement a classic leader/follower solution. We will assume that one process, called the leader, is the one that is always executed first and creates the shared memory region. Otherwise, we would have to account for a concurrency race for which process creates and sets up the region.

Create a new file called leader.c. This file should create the shared memory region with the following steps. General documentation for Linux shared memory regions is found in man 7 shm_overview as well as in the individual man page for each library function (e.g., man 3 shm_open).
1. Create a shared memory file descriptor with shm_open(). The first parameter should be the const char * variable created in the header file. You should use the flags O_RDWR | O_CREAT, which specify that this shared memory is both readable and writable and that this call should create the region if it does not already exist. It is sufficient to use S_IRWXU for the third parameter (which defines user permissions to the shared region).
2. The previous step creates the shared memory region, but it is created with a capacity of zero bytes. Use the function ftruncate() to resize the memory region. Since we are organizing our shared memory via the struct declared in your header file, set the size to be the sizeof this struct.
3. Now we need a way to read and write to the newly created region. Use the mmap() function to map the shared memory into your program's address space. The addr and offset fields should be NULL and 0, respectively. The permissions parameter should specify both PROT_READ and PROT_WRITE, and the flags parameter should specify that this is a shared mapping with MAP_SHARED.
4. Finally, we want to treat our shared memory region as though it were a struct of the type declared in our header file. Create a struct pointer and use it to cast the return value of mmap(). Now, you can read and write your shared structure via this pointer. Make sure to check for errors in the mmap() call, and note that error return values are not NULL as you might expect (see the man 2 mmap page).
Define an array the same size as the data[] array in your shared struct. In your program, use the srand() and rand() functions to populate this local array. Then, copy this array into the shared struct- either with the memcpy() function, or though element-wise assignment. Have your program print out the local array. If you use memcpy, be careful to copy only the data array portion of the struct, and not the entire struct.
As the answer to this exercise, please explain whether you think element-wise assignment or using memcpy() is likely to be more efficient, and why.
Make a copy of your leader program named follower.c. This program will again access to the shared memory region in nearly the same way, but with a few modifications. First, the call to shm_open() should not specify O_CREAT. Second, the call to ftruncate() is unnecessary and should be removed (as long as you don't change the size of the region it doesn't hurt anything, but it potentially could be a source of inconsistency if its value were changed). Third, instead of generating random numbers, inserting them into the program's local array, and copying those over into the shared memory array, just use a single memcpy to move the data from the shared memory array into the follower program's local array (and then print out the contents of the local array as before).

Build both programs, and then execute the leader and follower programs, in that order. Verify that the program output is identical. Copy and paste the output of both programs as the answer to this exercise.
As the answer to this exercise, please explain briefly why it was necessary to remove both O_CREAT and the call to ftruncate() from the follower program.
Right now the processes are effectively interacting as though they were reading and writing a shared file, but we would like them to interact more dynamically. In particular, we want each of our processes to react to events that occur in the other. The desired execution is as follows:
1. The leader creates the shared memory region, intializes the guard variables in it to 0, and waits for the follower to be created.
2. The follower is created, notifies the leader to start writing, and waits for the data to be written to the shared struct.
3. The leader writes the data to the struct, notifies the follower to start reading, and waits for the follower to finish reading.
4. The follower prints the data to the console, notifies the leader that it is finished, and unlinks itself.
5. The leader destroys the shared memory region.
The purpose of the other non-data fields in our shared struct is to facilitate the waiting and notification of these events between processes. For example, in the sequence above the leader must wait for the follower to be created before it starts writing data into the shared region. The leader can wait on the value of the write_guard variable by spinning,
while( shared_ptr->write_guard == 0 ){}
and the follower can notify the leader it is safe to proceed by modifying the value,
shared_ptr->write_guard = 1;
Modify your program to reflect the sequence of events given above, using the write_guard, read_guard, and delete_guard variables. Also, once it is safe to do so, the leader should remove the shared region with the function shm_unlink()

Note that the shared memory region we created lives outside of either process and persists when neither program is running (existing shared memory regions can be found under the directory /dev/shm/). The above synchronization code relies on the fact that variables are intialized to zero, which may not be true if your shared memory region is not properly destroyed after being used. If you have inexplicable program bugs, you can verify that this is not the issue by manually checking and deleting your shared memory region in the above directory.
As the answer to this exercise, please explain briefly why you believe your concurrency protocol correctly avoids data races, deadlocks, and other hazards.
Once you are convinced that your concurrency protocol is correct, modify the follower so that, rather than printing the contents of the shared structure to the console, you simply copy the shared data into a local array. With the protocol described above, the follower process lives just long enough for the leader to write data into the shared array and for the follower to copy data out of the shared array. That is, the lifetime of the follower process is approximately one complete transfer of data through shared memory.

Use the time command with the follower program to obtain a rough estimate of the bandwidth through shared memory, in bytes per second. Take measurements where the shared array size is one million integers and two million integers. Repeat these measurements both on shell.cec.wustl.edu and on your Raspberry Pi.

As the answer to this exercise, please report your recorded values.

Things to Turn In:

The exercises above
leader.c
follower.c
Your shared header file