Skip to content

Generic driver crash in hrtimer on invalid memory access to 0000000000000001 #982

@maddogwg

Description

@maddogwg

BUG and stack trace, as collected from console logs, has been included below.

The generic driver was being exercised with an AWS ENA driver through the use of pkt-gen in receive mode, invoked thus:

sudo pkt-gen -i ens7 -f rx -c 1

The crash occurred regardless of whether any packets were received.

Environment
Netmap was built and run on an AWS EC2 Ubuntu 22.04.4 LTS instance, with Linux kernel version 6.8.0-1033-aws.

Fix?
The stack trace shows invalid memory access to 0000000000000001 from within hrtimer_interrupt . The combination of hrtimer execution and 0x0000000000000001 as the address accessed, which is the value of the CLOCK_MONOTONIC clock ID used by the netmap generic driver, seems to point to the nm_hrtimer_setup macro in LINUX/bsd_glue.h

Specifically, it looks like that macro is erroneously assigning the clock ID (c_) argument as the timer function, where it should instead be using the f_ argument.

The big hint for this actually came from the compilation warning below:

In file included from -/LINUX/netmap_linux.c:26:
-/LINUX/netmap_linux.c: In function ‘nm_os_mitigation_init’:
-/LINUX/bsd_glue.h:86:24: warning: assignment to ‘enum hrtimer_restart (*)(struct hrtimer *)’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
   86 |         (t_)->function = (c_);                  \
      |                        ^
-/LINUX/netmap_linux.c:513:9: note: in expansion of macro ‘nm_hrtimer_setup’
  513 |         nm_hrtimer_setup(&mit->mit_timer, &generic_timer_handler,
      |         ^~~~~~~~~~~~~~~~
-/LINUX/netmap_linux.c: At top level:
-/LINUX/netmap_linux.c:483:1: warning: ‘generic_timer_handler’

After the speculative patch below was applied locally, the crash no longer occurred.

diff --git a/LINUX/bsd_glue.h b/LINUX/bsd_glue.h
index 9c42bdcb..5aefb98b 100644
--- a/LINUX/bsd_glue.h
+++ b/LINUX/bsd_glue.h
@@ -83,7 +83,7 @@
 #else
 #define nm_hrtimer_setup(t_, f_, c_, m_) do {  \
        hrtimer_init(t_, c_, m_);               \
-       (t_)->function = (c_);                  \
+       (t_)->function = (f_);                  \
 } while (0)
 #endif

Stack Trace
Stack trace, from console logs:

[  918.086771] BUG: kernel NULL pointer dereference, address: 0000000000000001
[  918.087640] #PF: supervisor instruction fetch in kernel mode
[  918.088338] #PF: error_code(0x0010) - not-present page
[  918.088967] PGD 0 P4D 0 
[  918.089310] Oops: 0010 [#1] SMP NOPTI
[  918.089782] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           OE      6.8.0-1033-aws #35~22.04.1-Ubuntu
[  918.090899] Hardware name: Amazon EC2 m5a.large/, BIOS 1.0 10/16/2017
[  918.091665] RIP: 0010:0x1
[  918.092033] Code: Unable to access opcode bytes at 0xffffffffffffffd7.
[  918.092808] RSP: 0018:ffffa697000f4f08 EFLAGS: 00010046
[  918.093446] RAX: 0000000000000000 RBX: ffff893e91cb0e40 RCX: 0000000000000000
[  918.094298] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff893e91cb0e40
[  918.095153] RBP: ffffa697000f4f68 R08: 0000000000000000 R09: 0000000000000000
[  918.096010] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  918.096861] R13: ffff893eedf247c0 R14: ffff893eedf247c0 R15: ffff893eedf24800
[  918.097718] FS:  0000000000000000(0000) GS:ffff893eedf00000(0000) knlGS:0000000000000000
[  918.098688] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  918.099383] CR2: ffffffffffffffd7 CR3: 00000001924ec000 CR4: 00000000003506f0
[  918.100254] Call Trace:
[  918.100588]  <IRQ>
[  918.100872]  ? show_regs+0x6d/0x80
[  918.101307]  ? __die+0x24/0x80
[  918.101716]  ? page_fault_oops+0x99/0x1b0
[  918.102217]  ? do_user_addr_fault+0x2ee/0x670
[  918.102760]  ? exc_page_fault+0x83/0x190
[  918.103254]  ? asm_exc_page_fault+0x27/0x30
[  918.103783]  ? __hrtimer_run_queues+0x112/0x250
[  918.104349]  ? srso_return_thunk+0x5/0x5f
[  918.104854]  hrtimer_interrupt+0xf6/0x250
[  918.105361]  __sysvec_apic_timer_interrupt+0x4e/0xf0
[  918.105973]  sysvec_apic_timer_interrupt+0x8d/0xd0
[  918.106567]  </IRQ>
[  918.106858]  <TASK>
[  918.107696]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  918.108818] RIP: 0010:pv_native_safe_halt+0xb/0x10
[  918.109895] Code: 22 d7 31 ff e9 b6 28 01 00 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d 59 e3 3e 00 fb f4 <e9> 90 28 01 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 83
[  918.113509] RSP: 0018:ffffa697000b7db0 EFLAGS: 00000246
[  918.114646] RAX: 0000000000004000 RBX: ffff893dc0e33064 RCX: 0000000000000000
[  918.116006] RDX: 0000000000000001 RSI: ffff893dc0e33000 RDI: 0000000000000001
[  918.117361] RBP: ffffa697000b7db8 R08: 0000000000000000 R09: 0000000000000000
[  918.118701] R10: 0000000000000000 R11: 0000000000000000 R12: ffff893dc0e33064
[  918.120043] R13: 0000000000000001 R14: ffffffff91af9240 R15: ffff893eedf00000
[  918.121387]  ? acpi_safe_halt+0x19/0x60
[  918.122357]  acpi_idle_do_entry+0x40/0x80
[  918.123334]  acpi_idle_enter+0xb6/0x180
[  918.124302]  cpuidle_enter_state+0x91/0x6f0
[  918.125293]  ? srso_return_thunk+0x5/0x5f
[  918.126264]  ? finish_task_switch.isra.0+0x89/0x2f0
[  918.127332]  cpuidle_enter+0x2e/0x50
[  918.128277]  call_cpuidle+0x23/0x60
[  918.129179]  cpuidle_idle_call+0x10f/0x150
[  918.130136]  do_idle+0x87/0xf0
[  918.130966]  cpu_startup_entry+0x2a/0x30
[  918.131874]  start_secondary+0x129/0x160
[  918.132798]  secondary_startup_64_no_verify+0x184/0x18b
[  918.133862]  </TASK>
[  918.134588] Modules linked in: netmap(OE) tls binfmt_misc nls_iso8859_1 ppdev crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd parport_pc cryptd input_leds parport psmouse serio_raw ena dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd efi_pstore ip_tables x_tables autofs4 [last unloaded: netmap(OE)]
[  918.142596] CR2: 0000000000000001
[  918.143535] ---[ end trace 0000000000000000 ]---
[  918.274181] RIP: 0010:0x1
[  918.275426] Code: Unable to access opcode bytes at 0xffffffffffffffd7.
[  918.276786] RSP: 0018:ffffa697000f4f08 EFLAGS: 00010046
[  918.277978] RAX: 0000000000000000 RBX: ffff893e91cb0e40 RCX: 0000000000000000
[  918.279383] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff893e91cb0e40
[  918.280802] RBP: ffffa697000f4f68 R08: 0000000000000000 R09: 0000000000000000
[  918.282209] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  918.283590] R13: ffff893eedf247c0 R14: ffff893eedf247c0 R15: ffff893eedf24800
[  918.284963] FS:  0000000000000000(0000) GS:ffff893eedf00000(0000) knlGS:0000000000000000
[  918.286432] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  918.287663] CR2: ffffffffffffffd7 CR3: 00000001924ec000 CR4: 00000000003506f0
[  918.289064] Kernel panic - not syncing: Fatal exception in interrupt
[  918.290690] Kernel Offset: 0xe400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions