Building a Linux Kernel Module

Last night a friend of mine asked for help for her homework on operating systems. It’s about building a simple Linux kernel module and linked list operations. I hadn’t worked on a kernel module before but somehow knew the basics are simple to grasp. This is the transcript of my experience.
The Basics
Kernel modules have two entry points: init and exit. The init function runs when you run insmod <module> and exit function runs when you run rmmod <module>.
#include <linux/init.h> #include <linux/kernel.h> #include <linux/module.h> MODULE_LICENSE("GPL"); MODULE_AUTHOR("egegunes"); MODULE_DESCRIPTION("Hello"); static int __init hello_init(void) { printk(KERN_INFO "Hello World"); // If your init function returns a non-zero value, // kernel won't load your module. return 0; } static void __exit hello_exit(void) { printk(KERN_INFO "Goodbye, cruel world, I'm leaving you today"); } module_init(hello_init); module_exit(hello_exit);
That’s all the code you need to create a kernel module. To build it, you also need a Makefile
:
obj-m := hello.o KDIR := /lib/modules/$(shell uname -r)/build build: $(MAKE) -C $(KDIR) M=$(PWD) clean: $(MAKE) -C $(KDIR) M=$(PWD) clean
This Makefile
is for using kbuild
build system used by the Linux kernel. For more information, see kernel documentation.
I built and loaded the module in fedora/31-cloud-base v31.20191023.0
Vagrant box without installing any additional packages.
$ make make -C /lib/modules/5.3.7-301.fc31.x86_64/build M=/home/vagrant/hello make[1]: Entering directory '/usr/src/kernels/5.3.7-301.fc31.x86_64' Building modules, stage 2. MODPOST 1 modules make[1]: Leaving directory '/usr/src/kernels/5.3.7-301.fc31.x86_64' $ sudo insmod ./hello.ko $ sudo dmesg ... [ 519.920518] Hello World $ sudo rmmod ./hello.ko ... [ 519.920518] Hello World [ 523.976396] Goodbye, cruel world, I'm leaving you today
Oops
The actual kernel module we’ve worked on was not that simple, though. It was processing items of a linked list on init and exit. When I tried to remove the module, the kernel replied with a cold message:
$ sudo rmmod module Killed
Seeing Killed
message automatically triggers the “out of memory?” question for me, but no, it was a different case. I looked at kernel logs and encountered the first-ever kernel panic that directly caused by my code:
$ sudo dmesg [ 100.472826] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 100.472992] #PF: supervisor write access in kernel mode [ 100.473118] #PF: error_code(0x0002) - not-present page [ 100.473241] PGD 0 P4D 0 [ 100.473306] Oops: 0002 [#1] SMP NOPTI [ 100.473397] CPU: 0 PID: 2394 Comm: rmmod Tainted: G OE 5.3.7-301.fc31.x86_64 #1 [ 100.473623] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 100.473818] RIP: 0010:hello_exit+0xc/0x1000 [hello] [ 100.473938] Code: Bad RIP value. [ 100.474018] RSP: 0018:ffffbd14406cbed8 EFLAGS: 00010246 [ 100.474143] RAX: 000000000000002b RBX: 00000000000000b0 RCX: 0000000000000000 [ 100.474312] RDX: 0000000000000000 RSI: ffff94b8bda17908 RDI: ffff94b8bda17908 [ 100.474496] RBP: ffffffffc0345000 R08: ffff94b8bda17908 R09: 00000000000001b9 [ 100.474664] R10: 0000000000000001 R11: ffffffff92ee37c0 R12: 0000000000000000 [ 100.474833] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 100.475002] FS: 00007f37d424a740(0000) GS:ffff94b8bda00000(0000) knlGS:0000000000000000 [ 100.475192] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 100.475329] CR2: ffffffffc0342fe2 CR3: 000000007b780000 CR4: 00000000000406f0 [ 100.475503] Call Trace: [ 100.475575] __x64_sys_delete_module+0x13d/0x280 [ 100.475698] ? exit_to_usermode_loop+0xa7/0x100 [ 100.475811] do_syscall_64+0x5f/0x1a0 [ 100.475912] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 100.476037] RIP: 0033:0x7f37d4379fab [ 100.476127] Code: 73 01 c3 48 8b 0d dd fe 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ad fe 0b 00 f7 d8 64 89 01 48 [ 100.476569] RSP: 002b:00007fff8489a3f8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 100.476748] RAX: ffffffffffffffda RBX: 000055e696d84780 RCX: 00007f37d4379fab [ 100.477316] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055e696d847e8 [ 100.477860] RBP: 00007fff8489a458 R08: 0000000000000000 R09: 0000000000000000 [ 100.478397] R10: 00007f37d43edac0 R11: 0000000000000206 R12: 00007fff8489a620 [ 100.478948] R13: 00007fff8489c7a8 R14: 000055e696d842a0 R15: 000055e696d84780 [ 100.479486] Modules linked in: hello(OE-) e1000 joydev i2c_piix4 video ip_tables vboxvideo(OE) drm_kms_helper ttm drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw vboxguest(OE) ata_generic pata_acpi [ 100.481065] CR2: 0000000000000000 [ 100.481567] ---[ end trace ad23453eb9e8484f ]--- [ 100.482050] RIP: 0010:hello_exit+0xc/0x1000 [hello] [ 100.482562] Code: Bad RIP value. [ 100.483004] RSP: 0018:ffffbd14406cbed8 EFLAGS: 00010246 [ 100.483509] RAX: 000000000000002b RBX: 00000000000000b0 RCX: 0000000000000000 [ 100.484062] RDX: 0000000000000000 RSI: ffff94b8bda17908 RDI: ffff94b8bda17908 [ 100.484602] RBP: ffffffffc0345000 R08: ffff94b8bda17908 R09: 00000000000001b9 [ 100.485122] R10: 0000000000000001 R11: ffffffff92ee37c0 R12: 0000000000000000 [ 100.485661] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 100.486178] FS: 00007f37d424a740(0000) GS:ffff94b8bda00000(0000) knlGS:0000000000000000 [ 100.486734] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 100.487224] CR2: ffffffffc0342fe2 CR3: 000000007b780000 CR4: 00000000000406f0
Look how messy it is! I was so freaked out when I saw this that I didn’t even try to understand what it means at first. I tried debugging by trial and error until I gave up.
To be clear, it is not the actual panic. I injected an error to our example to demonstrate:
static void __exit hello_exit(void) { int *p; printk(KERN_INFO "Goodbye, cruel world, I'm leaving you today"); p = NULL; *p = 0; }
After some research, I understood some of the messages enough to track down the bug. “Oops
” means the severity of the issue is high.
BUG: kernel NULL pointer dereference, address: 0000000000000000
shows the actual error,RIP: 0010:hello_exit+0xc/0x1000 [hello]
shows the CPU register of the instruction that error happened.
Using this information, I could find the problematic code with gdb
:
$ gdb hello.ko (gdb) list *(hello_exit+0xc) 0x70 is in hello_exit (/home/vagrant/hello/hello.c:22). 17 static void __exit hello_exit(void) { 18 int *p; 19 printk(KERN_INFO "Goodbye, cruel world, I'm leaving you today\n"); 20 21 p = NULL; 22 *p = 0; 23 } 24 25 module_init(hello_init); 26 module_exit(hello_exit);
Author: Ege Güneş
Date Published: Jan 29, 2021

Resources
- https://www.tldp.org/LDP/lkmpg/2.6/html/c38.html
- https://jvns.ca/blog/2014/09/18/you-can-be-a-kernel-hacker/
- https://opensourceforu.com/2011/01/understanding-a-kernel-oops/