Kernel Debugging With Qemu and virtme

This article describes how to debug the Linux kernel with qemu, virtme and gdb.

Being able to debug the Linux kernel helps the developer to gain a better understanding of the code, data structures and also defects. Using the kernel debugger directly with the Linux debugger has its own challenges. A better solution is to start the kernel in a virtual machine and attach the debugger to this virtual machine.

If you followed the earlier series about how to setup your Linux Kernel development environment, then you already setup qemu and virtme. If you haven’t done so, now would be a good time to catch up. Once qemu and virtme have been setup and are working, the preconditions are met.

In most Linux distributions gdb is installed by default. If gdb hasn’t been installed yet, this can be done with the following command (assuming an arch-based distribution).

1
sudo pacman -S -y gdb

It is a good idea to clone the kernel-gdb github repository. The repository contains two scripts to invoke the debugger kernel.gdb and ‘blk.gdb’. As a starting point kernel.gdb can be used. The add-auto-load-safe-path for the vmlinux-gdb.py script needs to be updated for your kernel build environment.

It makes sense to define an alias to start virtme with the necessary parameters so gdb can be attached.

1
2
alias vdbg 'virtme-run --kimg arch/x86_64/boot/bzImage  -a "nokaslr" 
  --qemu-opts -m 16384 -smp 4 -qmp tcp:localhost:4444,server,nowait -s -S'

The command disables kaslr (kernel address space randomization), uses 16GB of memory, 4 CPU’s, and defines the port on how to communicate with qemu. Disabling kaslr (nokaslr) can be helpful when debugging. It also specifies to wait for the debugger to attach (-S). The option -s is a shorthand for specifying -gdb tcp::1234. The various qemu options are described in the qemu documentation.

To be able to see the symbols the kernel needs to be built with symbols. The easiest is to clone the kernel-configs repo and use the kernel config from the debug directory. Then rebuild your kernel.

To debug the kernel it is the easiest to use two windows: one to start the virtual machine with vdbg and the other one to start the debugger. It is assumed that the following is executed from the Linux source kernel base directory.

In window 1:

1
vdbg

This will only show an empty prompt.

In the second window, the following command is entered:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
gdb -command=kernel.dbg

╭─shr@stefan in repo: linux on  ksm.v8:mm-unstable [?⇡3] via  v3.10.10 took 25s
╰─λ gdb --command=~/repo/shr/kernel-gdb/kernel.gdb
GNU gdb (GDB) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
0x000000000000fff0 in exception_stacks ()
Hardware assisted breakpoint 1 at 0xffffffff83046940: file init/main.c, line 881.

Thread 1 hit Breakpoint 1, start_kernel () at init/main.c:881
881     {

The kernel.gdb script defines a breakpoint at the start_kernel() function and stops the execution of the kernel at that function. This is very early in the boot sequence of the kernel and allows to define additional breakpoints.

With the c command (continue), the startup sequence can be resumed. In the first window one can see how the startup log statements are printed until the command prompt is available.

If the vmlinux-gdb.py script has been successfully loaded with the kernel.gdb script, additional gdb commands and functions get defined. The list of additional commands can be queried with the apropos command:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
apropos lx
function lx_clk_core_lookup -- Find struct clk_core by name
function lx_current -- Return current task.
function lx_device_find_by_bus_name -- Find struct device by bus and name (both strings)
function lx_device_find_by_class_name -- Find struct device by class and name (both strings)
function lx_module -- Find module by name and return the module variable.
function lx_per_cpu -- Return per-cpu variable.
function lx_rb_first -- Lookup and return a node from an RBTree
function lx_rb_last -- Lookup and return a node from an RBTree.
function lx_rb_next -- Lookup and return a node from an RBTree.
function lx_rb_prev -- Lookup and return a node from an RBTree.
function lx_task_by_pid -- Find Linux task by PID and return the task_struct variable.
function lx_thread_info -- Calculate Linux thread_info from task variable.
function lx_thread_info_by_pid -- Calculate Linux thread_info from task variable found by pid
lx-clk-summary -- Print clk tree summary
lx-cmdline -- Report the Linux Commandline used in the current kernel.
lx-configdump -- Output kernel config to the filename specified as the command
lx-cpus -- List CPU status arrays
lx-device-list-bus -- Print devices on a bus (or all buses if not specified)
lx-device-list-class -- Print devices in a class (or all classes if not specified)
lx-device-list-tree -- Print a device and its children recursively
lx-dmesg -- Print Linux kernel log buffer.
lx-fdtdump -- Output Flattened Device Tree header and dump FDT blob to the filename
lx-genpd-summary -- Print genpd summary
lx-iomem -- Identify the IO memory resource locations defined by the kernel
lx-ioports -- Identify the IO port resource locations defined by the kernel
lx-list-check -- Verify a list consistency
lx-lsmod -- List currently loaded modules.
lx-mounts -- Report the VFS mounts of the current process namespace.
lx-ps -- Dump Linux tasks.
lx-symbols -- (Re-)load symbols of Linux kernel and currently loaded modules.
lx-timerlist -- Print /proc/timer_list
lx-version -- Report the Linux Version of the current kernel.

The commands itself should be pretty self explanatory. The lx-ps shows the current list of processes. The lx-dmesg command prints the current contents of the kernel log buffer. For debugging it is important to have symbol information. With the lx-symbols command, the symbol information can be loaded. Additional documentation to the scripts and functions is available on kernel.org.

This only provides the standard gdb experience. Most of the time you are also interested to see the source code. In the same gdb repository there is also a gdbinit script. This script needs to be renamed to .gdbinit. This script gets automatically loaded when gdb gets started.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
─── Output/messages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Thread 1 hit Breakpoint 1, start_kernel () at init/main.c:881
881     {
─── Source ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
861      unknown_options = memblock_alloc(len, SMP_CACHE_BYTES);
862      if (!unknown_options) {
863          pr_err("%s: Failed to allocate %zu bytes\n",
864              __func__, len);
865          return;
866      }
867      end = unknown_options;
868
869      for (p = &argv_init[1]; *p; p++)
870          end += sprintf(end, " %s", *p);
871      for (p = &envp_init[2]; *p; p++)
872          end += sprintf(end, " %s", *p);
873
874      /* Start at unknown_options[1] to skip the initial space */
875      pr_notice("Unknown kernel command line parameters \"%s\", will be passed to user space.\n",
876          &unknown_options[1]);
877      memblock_free(unknown_options, len);
878  }
879
880  asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
881  {
882      char *command_line;
883      char *after_dashes;
884
885      set_task_stack_end_magic(&init_task);
886      smp_setup_processor_id();
887      debug_objects_early_init();
888      init_vmlinux_build_id();
889
890      cgroup_init_early();
891
892      local_irq_disable();
893      early_boot_irqs_disabled = true;
894
895      /*
896       * Interrupts are still disabled. Do necessary setups, then
897       * enable them.
898       */
899      boot_cpu_init();
900      page_address_init();
─── Variables ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
loc after_dashes = <optimized out>
loc command_line = 0x13cd0 <exception_stacks+31952> <error: Cannot access memory at address 0x13cd0>: Can…
─── Stack ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
[0] from 0xffffffff83046940 in start_kernel+0 at init/main.c:881
[1] from 0xffffffff81000145 in secondary_startup_64 at arch/x86/kernel/head_64.S:358
[2] from 0x0000000000000000
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
>>>

This shows the thread, source, variables and the stack. This makes debugging easier. There are several gdb extensions which provide an interface like this. The provided gdb script is a modified version of one these.