Introduction to Linux Kernel debugging with drgn

This article describes how to use the drgn debugger for Linux Kernel debugging.

Debugging Linux kernel code can be challenging. There are existing solutions with GDB for live kernel debugging and crash to debug core dumps. While these solutions have been very helpful in the past, for some type of problems they are not flexible enough. This is where drgn can be very useful. Drgn is a newer debugger that has been introduced by Omar Sandoval, a former colleague at Meta.

Existing debuggers have provided the ability to extend them with scripts. Drgn takes a different approach to debugging than existing solutions: it integrates directly with the python interpreter by provding additional API’s to python.

The python interpreter is the command shell for drgn. This allows to use the interpreter for small interactive introspection, as well as writing scripts or integrating it into analysis and debugging tools. The approach is to write small scripts to report or query the information you are interested in. The python interpreter is extended so core dumps can be loaded and introspected.

Most Linux distributions already provide packages, so drgn is installed easily and quickly. In addition the software hosted on github and can also be built from source. In addition the software needs kernel debugging symbols. Install the debugging symbols for your Linux distribution. The documentation has a good description of the different approaches.

For Arch Linux you will do:

1
2
3
sudo pacman -S drgn
sudo pacman -S --needed libelf
source /etc/profile.d/debuginfod.sh

The first step in analyzing a running kernel or a kernel dump is to start drgn with the corresponding parameters. The default is to debug the running kernel.

The running kernel can be loaded with:

1
2
3
4
5
6
7
8
9
drgn

drgn 0.0.31 (using Python 3.13.3, elfutils 0.192, with debuginfod (dlopen), with libkdumpfile)
For help, type help(drgn).
>>> import drgn
>>> from drgn import FaultError, NULL, Object, alignof, cast, container_of, execscript, implicit_convert,offsetof, reinterpret, sizeof, stack_trace
>>> from drgn.helpers.common import *
>>> from drgn.helpers.linux import *
>>>

or

1
drgn -k

A core dump can be loaded with:

1
drgn -c <core-dump-file>

Drgn can also debug user processes. With the following command an individual process can be loaded:

1
drgn -p <pid>

drgn has very good documentation. I will not repeat the contents of the documentation here, but highlight a few key points. drgn introduces a drgn.Program object. This object can be used to access the values, constants and types of the Linux kernel. This can return references or values. Values can be accessed directly, while for references, the .value_() needs to be called.

To get information for the definition of a type the following can be used:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
>>> prog.type("struct task_struct")
struct task_struct {
struct thread_info thread_info;
unsigned int __state;
unsigned int saved_state;
void *stack;

>>> prog.type("struct task_struct")
struct task_struct {
struct thread_info thread_info;
unsigned int __state;
unsigned int saved_state;
void *stack;
...

The value of variables can be accessed with:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> prog["init_task"]
(struct task_struct){
    .thread_info = (struct thread_info){
        .flags = (unsigned long)16384,
        .syscall_work = (unsigned long)0,
        .status = (u32)0,
        .cpu = (u32)0,
    },
    .__state = (unsigned int)0,
    ...

Values in structures, are accessed with the . notation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
>> prog["init_task"].mm
(struct mm_struct *)0x0
>>> prog["init_task"].active_mm
    *(struct mm_struct *)0xffff88810533b9c0 = {
        .mm_count = (atomic_t){
            .counter = (int)7,
    },
    .mm_mt = (struct maple_tree){
        .ma_lock = (spinlock_t){
            .rlock = (struct raw_spinlock){
                .raw_lock = (arch_spinlock_t){
                    .val = (atomic_t){
                    ...

To print references, the .value_() function needs to be used. For interactive usage this is not that important, but when writing scripts, it might be required to only get the value and not also the dataytpe:

1
2
3
4
5
6
7
8
>> prog["init_task"].active_mm.mm_count
(atomic_t){
    .counter = (int)7,
}
>>> prog["init_task"].active_mm.mm_count.counter
(int)7
>>> prog["init_task"].active_mm.mm_count.counter.value_()
7

Sometimes its necessary to get address of reference. The address of a reference is returned with the address_of_() function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>> prog["init_task"].active_mm.mm_count.address_of_()
*(atomic_t *)0xffff88810533b9c0 = {
.counter = (int)7,
}

>>> prog["init_task"].active_mm.mm_count.address_of_().value_()
18446612686452275648

>>> hex(prog["init_task"].active_mm.mm_count.address_of_().value_())
'0xffff88810533b9c0'

If drgn is used in interactive mode, it is generally enough to use the prog object to print its contents. However when writing scripts it can be necessary to print values. Values can be printed with the print command.

To print the contents of references it is necessary to use the value_() function.

1
2
3
4
>>> print(prog["init_task"].min_flt)
(unsigned long)0
>>> print(prog["init_task"].min_flt.value_())
0

To print string values they need to be first converted to strings and then decoded:

1
2
3
4
5
6
7
8
>>> print(prog["init_task"].comm.string_().decode())
swapper/0
>>> print(prog["init_task"].comm)
(char [16])"swapper/0"
>>> print(prog["init_task"].comm.string_())
b'swapper/0'
>>> print(prog["init_task"].comm.string_().decode())
swapper/0

Sometimes it is necessary to construct an object at a specific address. This can be achieved by constructing a drgn object at the requested address. The following example creates a zone object at <zone_address>:

1
my_zone = Object(prog, "struct zone *", int(zone_address, 0))

Drgn has added so called helper API’s to access and interate over key Linux kernel data structures. Among them are:

  • Double-linked lists
  • Linked lists
  • Lockless lists
  • Maple Trees
  • Priority-Sorted lists
  • Readix trees
  • Red-black trees
  • XArrays

The API documentation covers all the helper functions. In addition there are addtional helpers for other kernel structures and subsystems.

The following two paragraphs give two example on how to use these helpers. These are only two short examples, however they show how easy it is to query and iterate over data structures. The helper documentation is available here.

The following shows an example of how to iterate over the task list:

1
2
3
4
5
6
>>> for p in for_each_task(prog):
...     print(p.pid)
...
(pid_t)1
(pid_t)2
...

To iterate over all the memory_nodes the for_each_node() function can be used:

1
2
3
4
5
>>> for i in for_each_node():
...     node = prog['node_data'][i]
...     print("NID: {0}, #Zones: {1}".format(node.node_id.value_(), node.nr_zones.value_()))
...
NID: 0, #Zones: 3

The call stack of a process can be obtained with the stack_trace command. The pid of the process has to be specified as the first parameter.

The call stack of a process can be reported with the following command:

1
2
3
4
5
6
7
8
9
>>> stack_trace(304)
#0  context_switch (kernel/sched/core.c:5369:2)
#1  __schedule (kernel/sched/core.c:6756:8)
#2  __schedule_loop (kernel/sched/core.c:6833:3)
#3  schedule (kernel/sched/core.c:6848:2)
#4  worker_thread (kernel/workqueue.c:3406:2)
#5  kthread (kernel/kthread.c:389:9)
#6  ret_from_fork (arch/x86/kernel/process.c:147:3)
#7  ret_from_fork_asm+0x1a/0x1f (arch/x86/entry/entry_64.S:244)

Its also possible to print a so-called annotated stack trace:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
>>> print_annotated_stack(stack_trace(304))
STACK POINTER     VALUE
[stack frame #0 at 0xffffffff820ca31a (__schedule+0x52a/0x1082) in context_switch at kernel/sched/core.c:5369:2 (inlined)]
[stack frame #1 at 0xffffffff820ca31a (__schedule+0x52a/0x1082) in __schedule at kernel/sched/core.c:6756:8]
ffffc90000427e20: ffff888100078c00 [slab object: kmalloc-1k+0x0]
ffffc90000427e28: ffff888106a23300 [slab object: task_struct+0x0]
ffffc90000427e30: 0100000000000282
ffffc90000427e38: 0000040200000000
ffffc90000427e40: 0000000000000000
ffffc90000427e48: 0000000000000004
ffffc90000427e50: ffffffff8242fad0 [object symbol: cpu_bit_bitmap+0x30]
ffffc90000427e58: 0000000000000000
ffffc90000427e60: 0000000000000002
ffffc90000427e68: 39d5cfeb65f0c600
ffffc90000427e70: 0000000000000000
ffffc90000427e78: ffff888104667ac0 [slab object: kmalloc-192+0x40]
ffffc90000427e80: ffff888100078c28 [slab object: kmalloc-1k+0x28]
ffffc90000427e88: ffff888106a23300 [slab object: task_struct+0x0]
ffffc90000427e90: 0000000000000000
ffffc90000427e98: ffff888106a23300 [slab object: task_struct+0x0]
ffffc90000427ea0: ffffffff820caebb [function symbol: schedule+0x2b]
[stack frame #2 at 0xffffffff820caebb (schedule+0x2b/0x1c9) in __schedule_loop at kernel/sched/core.c:6833:3 (inlined)]
[stack frame #3 at 0xffffffff820caebb (schedule+0x2b/0x1c9) in schedule at kernel/sched/core.c:6848:2]
ffffc90000427ea8: ffff888104667a80 [slab object: kmalloc-192+0x0]
ffffc90000427eb0: ffff888100078c00 [slab object: kmalloc-1k+0x0]
ffffc90000427eb8: ffffffff8115e190 [function symbol: worker_thread+0x90]
[stack frame #4 at 0xffffffff8115e190 (worker_thread+0x90/0x31b) in worker_thread at kernel/workqueue.c:3406:2]
ffffc90000427ec0: ffff888104667ac0 [slab object: kmalloc-192+0x40]
ffffc90000427ec8: 0000000080000000
ffffc90000427ed0: ffff888100ecb100 [slab object: kmalloc-128+0x0]
ffffc90000427ed8: 0000000000000000
ffffc90000427ee0: ffff888106a23300 [slab object: task_struct+0x0]
ffffc90000427ee8: ffffffff8115e100 [function symbol: worker_thread+0x0]
ffffc90000427ef0: ffff888104667a80 [slab object: kmalloc-192+0x0]
ffffc90000427ef8: ffffffff81167de5 [function symbol: kthread+0x105]
[stack frame #5 at 0xffffffff81167de5 (kthread+0x105/0x12a) in kthread at kernel/kthread.c:389:9]
ffffc90000427f00: ffffffff81167ce0 [function symbol: kthread+0x0]
ffffc90000427f08: ffffc90000427f58 [vmap stack: 304 (kworker/u32:4) +0x3f58]
ffffc90000427f10: ffff888102131b40 [free slab object: kmalloc-64+0x0]
ffffc90000427f18: 0000000000000000
ffffc90000427f20: 0000000000000000
ffffc90000427f28: 0000000000000000
ffffc90000427f30: ffffffff810be056 [function symbol: ret_from_fork+0x46]
[stack frame #6 at 0xffffffff810be056 (ret_from_fork+0x46/0x5d) in ret_from_fork at arch/x86/kernel/process.c:147:3]
ffffc90000427f38: ffffffff81167ce0 [function symbol: kthread+0x0]
ffffc90000427f40: 0000000000000000
ffffc90000427f48: ffff888102131b40 [free slab object: kmalloc-64+0x0]
ffffc90000427f50: ffffffff8107d3fa [symbol: ret_from_fork_asm+0x1a]
[stack frame #7 at 0xffffffff8107d3fa (ret_from_fork_asm+0x1a/0x1f) at arch/x86/entry/entry_64.S:244]
ffffc90000427f58: 0000000000000000

The annotated call stack does not only contain the function information, but also the local variables and their addresses.

This is where drgn shines. Based on the earlier information, we use the task helper to iterate over all the tasks, obtain the pid of each task and then report the stack trace for each task:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
>>> for p in for_each_task(prog):
...     pid = p.pid.value_()
...     print("\nCall stack for PID {0}".format(pid))
...     try:
...         stack_trace(pid)
...     except ValueError:
...         print("\tCannot unwind current task")
...

Call stack for PID 1
#0  context_switch (kernel/sched/core.c:5369:2)
#1  __schedule (kernel/sched/core.c:6756:8)
#2  __schedule_loop (kernel/sched/core.c:6833:3)
#3  schedule (kernel/sched/core.c:6848:2)
#4  do_wait (kernel/exit.c:1696:3)
#5  kernel_wait4 (kernel/exit.c:1850:8)
#6  do_syscall_x64 (arch/x86/entry/common.c:52:14)
#7  do_syscall_64 (arch/x86/entry/common.c:83:7)
#8  entry_SYSCALL_64+0xab/0x148 (arch/x86/entry/entry_64.S:121)
#9  0x7f2ddcb89be2

Call stack for PID 2
#0  context_switch (kernel/sched/core.c:5369:2)
#1  __schedule (kernel/sched/core.c:6756:8)
#2  __schedule_loop (kernel/sched/core.c:6833:3)
#3  schedule (kernel/sched/core.c:6848:2)
#4  kthreadd (kernel/kthread.c:755:4)
#5  ret_from_fork (arch/x86/kernel/process.c:147:3)
#6  ret_from_fork_asm+0x1a/0x1f (arch/x86/entry/entry_64.S:244)
...

The code contains a try-except block. This is required to guard against a possible exception: we can’t get the current stack of the running process. This can happen if we debug the live kernel.

This article has given a short overview on the drgn debugger and how to get started with using drgn. The article should help with getting you over some of the initial challenges on using drgn.

If one wants to experiment with drgn using a virtual machine like qemu and virtme-ng is a great way to do this.

Future articles will describe some utilities that use the drgn API.