"Many that live deserve death. And some that die deserve life. Can you give it to them? Then do not be too eager to deal out death in judgement. For even the very wise cannot see all ends."
—Gandalf, The Fellowship of the Ring by J. R. R. Tolkien, Book 1, Chapter 2
Memory management is one of the fundamental operations required of any modern operating system kernel. In nearly all real world computer systems today, a layer of abstraction called virtual memory exists above the raw physical memory resources in the machine. Virtual memory provides a convenient abstraction that makes it both easier to develop portable applications, and easier for the kernel to isolate workloads from each other. However, these benefits are counter-balanced by some challenges, and designing a high performance virtual memory management system is a complex task that requires the kernel to make tradeoffs between different objectives.
In this lab, you will investigate two such objectives: (1) making memory allocation fast, and (2) ensuring high performance for workloads that require significant memory resources. You will study the tradeoff between these two objectives by implementing a kernel module that provides two different techniques for virtual memory management: (1) demand paging, and (2) allocation time paging (herein refered to as "pre-paging").
In this lab, you will:
mmap()system call to perform virtual memory mappings for a process. Processes will interact with your module via a special device file, in this case
/dev/paging. When a process issues the
mmap()call to this file, your module code will be invoked.
mmapcall, and the runtime of a matrix multiplication application that uses the memory you map for it.
For this lab you are encouraged to work with one or two other people as a team, though you may complete this assignment individually if you prefer. Teams of more than 3 people are not allowed for this assignment.
In this lab assignment, you will submit a report that unifies your observations and thoughts, and since part of the assignment will be to present graphs of results obtained from trials you will run to assess performance, you may choose to use a non-text file format (e.g., .docx or .pdf) for your report if you'd like. For more information please see "What to turn in" below.
Please complete the exercises described below. As you work through them, please take notes regarding your observations, and when you are finished please write-up a cohesive report and upload it along with your code files, graph files, etc. that are requested in the lab assignment to the appropriate assignment page on Canvas.
mmap()system call by executing
man 2 mmapon linuxlab.
include/linux/mm_types.hfor relevant kernel data structures related to memory management.
You will be implementing a kernel module to
study kernel behaviors and design tradeoffs at the kernel level. The user level
component will interact with your kernel module by issuing system calls to a
special device file,
The ways in which your user user level and kernel level components interact are diagrammed in the image below.
As shown, your module will create the
/dev/paging device file,
which a user process can access via the
open() system call. Once
the user has a valid file descriptor with which to access your module, it can
then issue the
mmap() system call in order to request the
allocation of new virtual memory segments. As the image above shows, when a
mmap(), the kernel's system call handler first
inspects the arguments to the call, and if everything is valid it creates a
new virtual memory segment and stores the information in a data structure called
struct vm_area_struct (or just "
vma" in Linux kernel
developer terminology). Because one of the arguments to the system call is a
file descriptor referencing the
/dev/paging file, the kernel then
invokes a special function called a callback that allows your module to
perform some custom processing on the new
vma before returning
the new virtual address to the user process.
In your module's
mmap() callback, you will be implementing some
logic to map the new virtual address range specified by the
actual physical memory. In the default mode, your callback will simply
register a custom page fault handler so that when the user process eventually
tries to access the new virtual address segment, a page fault will be triggered
by the hardware and the kernel will allow your handler to run. At that point, your
page fault handler will be responsible for allocating new physical memory, and
updating the process' page tables to map the
vma to the new physical
memory that you allocated. This interaction is illustrated further via the image
This mode of virtual memory management mirrors what Linux does for regular memory allocations, and is called demand paging. That is, physical memory is allocated asynchronously as new virtual memory segments are accessed by the process. However, in addition to this mode of operation, your module will also implement an alternative to demand paging (which we refer to as "pre-paging") where physical memory is allocated and page tables are updated directly during your module's mmap() callback. The code for these two modes of operation will be very similar, except that with pre-paging the page fault handler will never be invoked by the hardware because the process will always be accessing virtual addresses that already have valid virtual-to-physical address translations in the page tables.
The instructor has provided a
framework for your kernel module as well as a user and header file designed
to make it easy to get a basic version of your project up and running. In its initial
configuration, the kernel module creates the
/dev/paging device file, and registers an
callback that simply installs a no-op page fault handler when processes try to
allocate memory via the device file. This page fault handler does not perform
memory allocation or update page tables, but rather tells the kernel to abort
You should first download the code, compile the kernel module and the
dense_mm program, load the module and run the program.
Add a section to your project report titled "Initial Configuration"
and in that section please give the kernel dmesg output as well as the
output of the
dense_mm program when executed on your
Raspberry Pi, and what you observed when you built and ran both of
those pieces of code.
mmap()callback and the eventual invocation of our page fault handler. Our approach will be to use a special pointer embedded in the
vm_private_data; i.e., given a pointer "
vma" of type
struct vm_area_struct *, this data structure can be accessed as
vma->vm_private_data. This field is of type
void *and allows us to allocate a pointer to an arbitrary data type in the
mmap()callback, store it in
vma->vm_private_data, and access it from the page fault handler.
For example, assume
fn2 are invoked on the
same vma, with
fn1 being called before fn2. Then,
void fn1(struct vm_area_struct * vma)
void * ptr = x; // x being some dynamically allocated state
vma->vm_private_data = ptr;
void fn2(struct vm_area_struct * vma)
void * ptr = vma->vm_private_data;
// ptr now has the value 'x' assigned above
The state stored in this private data field is needed in order to remember what
physical memory has been allocated for a process. You are responsible for
designing a new
struct data structure to track this information.
This data structure should be dynamically allocated in the
callback, stored in the
vm_private_data field of the VMA, and
eventually freed when the VMA is deleted. It is up to you to determine
precisely what information should be stored in this data structure. The only
constraints are the following:
atomic_t) in it.
struct page *) allocated in your page fault handler must ultimately be freed.
Finally, we need to update the two remaining callbacks that the kernel can
invoke on our
vma: when a new
vma is created as a
copy of an existing
vma, the kernel invokes the
callback, and when a vma is being deleted, the kernel invokes the
open callback should simply retrieve a pointer to our new
data structure via
vma->vm_private_data and increment the reference count
of its atomic variable. Note: the vmas you use in this assignment are not (always)
created by your page fault handler, so the
open callback may not
be invoked. So, you will need to initialize the value of your reference count to 1 in
your mmap call. The
close callback should retrieve your data
structure, decrement its reference count, and if it becomes zero, free the dynamically
Add a new section to your report titled "Datatype Design" and in it describe how you designed this datatype, and explain what information it tracks. Note that you may not be able to complete this explanation until you finish parts 3 and 4 of this lab assignment.
remap_pfn_range(struct vm_area_struct * vma, unsigned long vaddr, unsigned long pfn, unsigned long length, vm_flags_t pg_prot);
This function updates the page tables to map the virtual address (starting at
vaddr) to the physical address (starting at page frame
length bytes with access bits as specified in
pg_prot. The first, fourth, and fifth values will be the same for every invocation of the
vma, PAGE_SIZE, and
vma->vm_page_prot, while the second and third
arguments will vary based on the faulting address and the page frame number that you allocated.
For this step, you may find the following functions/macros valuable:
PAGE_ALIGN(addr): get the page-aligned address for a virtual address (i.e., round an address down to the nearest 4KB boundary)
page_to_pfn(struct page *): determine the page frame number for a
struct page *
VM_FAULT_NOPAGE(this sounds like an error return, but it tells the kernel that your callback has internalized memory allocation and page table management)
Once these steps are successfully completed, you should be able to run the
dense_mm program to completion! However, note that in
its current state, our module has a serious memory leak, because we never free
the physical pages that are allocated via our page fault handler. However, at
this point it is a good idea to see if your program will run to completion.
Add a new section to your report titled "Page Fault Handler" and in it please
describe how you implemented the page fault
handler, including how you calculated the values for
pfn to pass to remap_pfn_range. In that same section
of your report please also show the output given by your user-level
closecallback function to appropriately:
This function should perform the above steps if and only if there are no remaining references
vma. We can do this by decrementing the reference count and checking whether or not
it is equal to 0.
Finally, add two new static
atomic_t variables to your kernel
module (not associated with any vma or tracker structure, just static globals).
The first variable should be incremented by 1 every time you allocate a new
struct page, and the second variable should be incremented by 1
every time you free a
struct page. In your module's exit function,
read these two values and print out their values. If your module is functioning
correctly, they should be equal.
Add a new section to your report titled "Close Callback" and in that section
please describe how you implemented the
After running the
dense_mm program with an input of 256,
unload your module and print the number of pages that you allocated and freed.
mmapcallback. In this case, the page fault handler should never be invoked.
You should add a new module parameter to your kernel module called
demand_paging as follows:
static unsigned int demand_paging = 1;
module_param(demand_paging, uint, 0644);
In this way, you can disable demand paging and enable "pre-paging" by
passing a value of 0 to
demand_paging when you insert the module via
The implementation of pre-paging should be straightforward, because all of
the operations you need to do have already been performed in your page fault
handler. The main difference is that you will be allocating and mapping memory
for every page of virtual address space in the
vma, rather than just
one page as you've done in the page fault handler.
There are two possible approaches to memory allocation for pre-paging:
alloc_pages()that gets all of the pages you need in one call, and then invokes
remap_pfn_rangeone time to map the entire range, or
remap_pfn_rangewith each iteration handling one page at a time.
As with demand paging, the behavior of the mmap call should be as follows (which will require your module to remember what happened during pre-paging):
VM_FAULT_NOPAGEand the mmap call in userspace will automatically return the address of the new mapping
alloc_pages) failed, your handler should return
VM_FAULT_OOMand the mmap call in userspace will then return
errnowill automatically set to
VM_FAULT_SIGBUSand the mmap call from userspace will return
errnoautomatically set to
Add a new section to your report titled "Pre-Paging" and in that section please describe how you implemented pre-paging, focusing especially on how you performed physical memory allocation.
dense_mmprogram to perform two timing measurements: the sum of the amount of time taken to execute all three calls to
mmap, and the amount of time taken to perform the matrix multiplication. Insert calls to
gettimeofdayto perform these measurements.
On your Raspberry Pi run representative trials that explore how the two different module modes perform: demand paging, and pre-paging. You should run the following sets of experiments in each mode:
dense_mmwith a value of 64
dense_mmwith a value of 128
dense_mmwith a value of 256
dense_mmwith a value of 512
Run each experiment 10 times to collect the minimum, maximum, mean, median, and standard deviation of the 10 runs, for both the mmap system call time and the matrix multiplication time.
Now, create three line graphs (e.g., using Excel or gnuplot), each with matrix size on the x-axis and (i) one with just system call performance on y-axis, (ii) one with just matrix multiplication time on the y-axis, and (iii) one with system call + matrix multiplication time on the y-axis. You may find it useful to format the axes in log-scale to make the graphs more readable, as runtime will increase exponentially in the matrix multiplication case with each separate value. Each data point on the plot should represent the mean and standard deviation (as error bars) of the 10 associated experiments -- if you can't figure out how to plot error bars, you can put standard deviation values in a separate table.
Please add a section titled "Experiments" to your report, and in it please give the minimum, maximum, mean, median, and standard deviation for the 10 runs in each experiment. In that same section please summarize your findings, including what the results you obtained say about the relative merits of demand paging and pre-paging (i.e., did one of them consistently outperform the other, or did they perform similarly, and were there any special conditions under which the performance differed?). In that section please give the names of the graph files you are turning in with those plots, and if you would like to use a non-text format please also feel free to include pictures of the plots within your discussion.
Your report file should include:
For an additional 5% on this lab, select another user-level program of
interest to you, configure it to perform memory allocation via
/dev/paging, and perform the same analysis steps as above to
compare the system call latency with the performance of the application. Please
add another section at the end of your report titled "Extra Credit" and in it please
analyze whether the results differ across workloads, and if so please suggest reasons
why this is the case -- that is, what about the application's behavior or resource
requirements would explain the differences in performance results?