"This is not to say that, from a programmer's point of view, kernel memory allocations are difficult -- just different."
—Robert Love, Linux Kernel Development, 3rd Ed., Chapter 12, pp. 231
The Linux kernel offers several approaches to memory management, including a variety of memory allocation and deallocation facilities for use in kernel space, which offer different trade-offs between memory utilization and performance, as well as different guarantees on physical contiguity. In today's studio you will explore physical memory allocation via the page allocator, and will modify the contents of these pages via the kernel's virtual memory mapping.
In this studio, you will:
Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete. We encourage you to please work in groups of 2 or 3 people on each studio (and the groups are allowed to change from studio to studio) though if you would prefer to complete any studio by yourself that is allowed.
As you work through these exercises, please record your answers, and when finished upload them along with the relevant source code to the appropriate spot on Canvas. If you work in a group with other people, only one of you should please upload the answers (and any other files requested) for the studio, and if you need to resubmit (e.g., if a studio were to be marked incomplete when we grade it) the same person who submitted the studio originally should please do that.
Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each e2ercise.
The module takes a
single unsigned integer parameter, which can be specified as an argument to
insmod
:
sudo insmod kernel_memory.ko nr_structs=some value
The value defaults to 2,000 if no value is provided for it. See the discussion of module parameters on pp. 346 in LKD, chapter 17 for more information. The module's initialization function also creates a single kernel thread that simply outputs a message to the system log with the value of the module parameter.
Compile your module (don't forget to say module add arm-rpi
), choose any positive value for
nr_structs
and load the module on your Raspberry Pi and check that
the message appears in the system log. Note that this initial compilation will
likely have a warning due to a function that you're not using now but will later in this
studio - for now you can ignore that warning since we'll be using the function later.
Now, inside your module, declare a struct type that contains an array of
8 unsigned integers, as in:
#define ARR_SIZE 8
typedef struct datatype_t {
unsigned int array[ARR_SIZE];
} datatype;
In your module's thread function, print a second message to the system log that
prints:
PAGE_SIZE
)
sizeof(datatype)
to do this.
datatype
structs that will fit in a single page of memory
Recompile your module and load it on your Raspberry Pi, and as the answer to this exercise, show 'dmesg' traces that print out these values.
struct page * alloc_pages(gfp_t gfp_mask, unsigned int order);
{order}
pages of
physical memory. As discussed in the text, the kernel does not allow you to
allocate arbitrary numbers of contiguous pages, but restricts you to
allocations based on a construct called the order
, which is the
base 2 logarithm of the number of pages you wish to allocate. So, if you want
to allocate 16 pages, the order
of the allocation would be
log2(16)=4.
Inside your kernel thread function, you are going to allocate memory that can
hold the the user-configurable number (nr_structs
) of datatype
structs. To do so, you will need to use the
alloc_pages
function (passing GFP_KERNEL
as the first
argument), and so you must calculate the following information:
PAGE_SIZE
kernel macro)
nr_structs
(recall how integer division works -- you will need to use the modulo operator)
order
that you should pass
to the page allocator
Note that the module already implements a function
my_get_order
that takes a single integer argument,
nr_pages
, and returns the order
that must be set to
the base 2 logarithm of the next power of 2 equal to or above
nr_pages
. You do not need to modify this function, but should
understand how it works. Consider the following points:
nr_pages
is a power of 2? E.g. how do the
decimal and binary representations of those powers relate?
while (value > 0) {
value >>= 1;
}
Once you understand how this function operates, modify your module to
include three new static unsigned int
variables,
nr_pages
, order
, and nr_structs_per_page
that your kernel thread function should calculate based on the user-specified
nr_structs
. Modify your kernel thread to print out the
value of these three variables.
Recompile your module and load it on your Raspberry Pi, and as the
answer to this exercise, explain what the function
my_get_order
does, and show the relevant output in 'dmesg' output
when the module is loaded with 1000, 2000, and 4000 as
nr_structs
alloc_pages
function to allocate
enough memory pages to contain the number of structs given in the module
parameter - i.e., the smallest power of 2 that would be sufficient based on
your calculation of the number of objects per page in the previous exercise.
Recall that the return value from alloc_pages
is a pointer to an array
of struct page
, which is of size 2^{order
} total
pages. We are now going to iterate through this memory and use address
translation and pointer arithmetic to initialize the contents of memory with
values in the array field of the datatype
type. First,
declare another global static variable
static struct page * pages;
, and in the thread function set
pages
to the return value of alloc_pages
. Make sure
this function succeeds by checking the return value for NULL
.
Because code must always access memory via its virtual address, and we
currently only have struct page *
to identify the memory, we must
perform the following operations before accessing the memory:
struct page *
to a page frame number
You will need to use the following translation functions/macros to accomplish this:
unsigned long page_to_pfn(struct page * page);
unsigned long PFN_PHYS(unsigned long pfn);
void * __va(unsigned long physical_address);
Once you have a pointer via __va()
, you are ready to access the memory that you
allocated. For example, to access the first datatype
struct in first page
of the allocated memory, declare the following pointer variable in your module:
static struct datatype_t * dt;
and then say:
dt = (struct datatype_t *)__va(....);
At this point it's a good idea to build your module, and fix any warnings or errors before proceeding.
Now, we are going to have your module's thread function iterate through each struct in each physical memory page and set each integer value to a specific number, according to the following pseudo-code:
for (i = 0; i < nr_pages; i++) {
unsigned long cur_page =
physical address of the allocated memory +
the offset to the ith page
for (j = 0; j < nr_structs_per_page; j++) {
unsigned long cur_struct = cur_page +
offset to jth struct
datatype_t * this_struct =
convert physical address in cur_struct to a virtual address
for (k = 0; k < ARR_SIZE; k++) {
this_struct->array[k] = i*nr_structs_per_page*ARR_SIZE + j*ARR_SIZE + k;
}
}
}
Finally, also modify the inner k
loop to print out the value of
every element for which both j=0
and k=0
. Compile and
then load and unload your module on your Raspberry Pi with the value of 200
and as the answer to this exercise please show the values printed in dmesg.
The final step of this studio is to perform the same operations in the
3-dimensional nested loop above, but in the module teardown function. After
invoking kthread_stop()
to break the kernel thread out of its
schedule()
loop, perform the same address translation steps as
in the thread function. In the
body of the inner k loop, simply ensure that each value matches its
expected value with a check like:
if (this_struct->array[k] != (i*nr_structs_per_page*ARR_SIZE + j*ARR_SIZE + k)) {
print error message
}
Now, after the loop, invoke __free_pages(struct page *, unsigned int
order);
to free the pages allocated by the kernel thread. If all values
match their expected values, print out a success message.
As the answer to this exercise, run your program with values of 1,000, 10,000, and 50,000, and show the output in 'dmesg', which should just be statements showing the values of global variables and a success statement on module unload.