-
Notifications
You must be signed in to change notification settings - Fork 545
Description
BUG and stack trace, as collected from console logs, has been included below.
The generic driver was being exercised with an AWS ENA driver through the use of pkt-gen in receive mode, invoked thus:
sudo pkt-gen -i ens7 -f rx -c 1
The crash occurred regardless of whether any packets were received.
Environment
Netmap was built and run on an AWS EC2 Ubuntu 22.04.4 LTS instance, with Linux kernel version 6.8.0-1033-aws.
Fix?
The stack trace shows invalid memory access to 0000000000000001 from within hrtimer_interrupt . The combination of hrtimer execution and 0x0000000000000001 as the address accessed, which is the value of the CLOCK_MONOTONIC clock ID used by the netmap generic driver, seems to point to the nm_hrtimer_setup macro in LINUX/bsd_glue.h
Specifically, it looks like that macro is erroneously assigning the clock ID (c_) argument as the timer function, where it should instead be using the f_ argument.
The big hint for this actually came from the compilation warning below:
In file included from -/LINUX/netmap_linux.c:26:
-/LINUX/netmap_linux.c: In function ‘nm_os_mitigation_init’:
-/LINUX/bsd_glue.h:86:24: warning: assignment to ‘enum hrtimer_restart (*)(struct hrtimer *)’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
86 | (t_)->function = (c_); \
| ^
-/LINUX/netmap_linux.c:513:9: note: in expansion of macro ‘nm_hrtimer_setup’
513 | nm_hrtimer_setup(&mit->mit_timer, &generic_timer_handler,
| ^~~~~~~~~~~~~~~~
-/LINUX/netmap_linux.c: At top level:
-/LINUX/netmap_linux.c:483:1: warning: ‘generic_timer_handler’
After the speculative patch below was applied locally, the crash no longer occurred.
diff --git a/LINUX/bsd_glue.h b/LINUX/bsd_glue.h
index 9c42bdcb..5aefb98b 100644
--- a/LINUX/bsd_glue.h
+++ b/LINUX/bsd_glue.h
@@ -83,7 +83,7 @@
#else
#define nm_hrtimer_setup(t_, f_, c_, m_) do { \
hrtimer_init(t_, c_, m_); \
- (t_)->function = (c_); \
+ (t_)->function = (f_); \
} while (0)
#endif
Stack Trace
Stack trace, from console logs:
[ 918.086771] BUG: kernel NULL pointer dereference, address: 0000000000000001
[ 918.087640] #PF: supervisor instruction fetch in kernel mode
[ 918.088338] #PF: error_code(0x0010) - not-present page
[ 918.088967] PGD 0 P4D 0
[ 918.089310] Oops: 0010 [#1] SMP NOPTI
[ 918.089782] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G OE 6.8.0-1033-aws #35~22.04.1-Ubuntu
[ 918.090899] Hardware name: Amazon EC2 m5a.large/, BIOS 1.0 10/16/2017
[ 918.091665] RIP: 0010:0x1
[ 918.092033] Code: Unable to access opcode bytes at 0xffffffffffffffd7.
[ 918.092808] RSP: 0018:ffffa697000f4f08 EFLAGS: 00010046
[ 918.093446] RAX: 0000000000000000 RBX: ffff893e91cb0e40 RCX: 0000000000000000
[ 918.094298] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff893e91cb0e40
[ 918.095153] RBP: ffffa697000f4f68 R08: 0000000000000000 R09: 0000000000000000
[ 918.096010] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 918.096861] R13: ffff893eedf247c0 R14: ffff893eedf247c0 R15: ffff893eedf24800
[ 918.097718] FS: 0000000000000000(0000) GS:ffff893eedf00000(0000) knlGS:0000000000000000
[ 918.098688] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 918.099383] CR2: ffffffffffffffd7 CR3: 00000001924ec000 CR4: 00000000003506f0
[ 918.100254] Call Trace:
[ 918.100588] <IRQ>
[ 918.100872] ? show_regs+0x6d/0x80
[ 918.101307] ? __die+0x24/0x80
[ 918.101716] ? page_fault_oops+0x99/0x1b0
[ 918.102217] ? do_user_addr_fault+0x2ee/0x670
[ 918.102760] ? exc_page_fault+0x83/0x190
[ 918.103254] ? asm_exc_page_fault+0x27/0x30
[ 918.103783] ? __hrtimer_run_queues+0x112/0x250
[ 918.104349] ? srso_return_thunk+0x5/0x5f
[ 918.104854] hrtimer_interrupt+0xf6/0x250
[ 918.105361] __sysvec_apic_timer_interrupt+0x4e/0xf0
[ 918.105973] sysvec_apic_timer_interrupt+0x8d/0xd0
[ 918.106567] </IRQ>
[ 918.106858] <TASK>
[ 918.107696] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 918.108818] RIP: 0010:pv_native_safe_halt+0xb/0x10
[ 918.109895] Code: 22 d7 31 ff e9 b6 28 01 00 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d 59 e3 3e 00 fb f4 <e9> 90 28 01 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 83
[ 918.113509] RSP: 0018:ffffa697000b7db0 EFLAGS: 00000246
[ 918.114646] RAX: 0000000000004000 RBX: ffff893dc0e33064 RCX: 0000000000000000
[ 918.116006] RDX: 0000000000000001 RSI: ffff893dc0e33000 RDI: 0000000000000001
[ 918.117361] RBP: ffffa697000b7db8 R08: 0000000000000000 R09: 0000000000000000
[ 918.118701] R10: 0000000000000000 R11: 0000000000000000 R12: ffff893dc0e33064
[ 918.120043] R13: 0000000000000001 R14: ffffffff91af9240 R15: ffff893eedf00000
[ 918.121387] ? acpi_safe_halt+0x19/0x60
[ 918.122357] acpi_idle_do_entry+0x40/0x80
[ 918.123334] acpi_idle_enter+0xb6/0x180
[ 918.124302] cpuidle_enter_state+0x91/0x6f0
[ 918.125293] ? srso_return_thunk+0x5/0x5f
[ 918.126264] ? finish_task_switch.isra.0+0x89/0x2f0
[ 918.127332] cpuidle_enter+0x2e/0x50
[ 918.128277] call_cpuidle+0x23/0x60
[ 918.129179] cpuidle_idle_call+0x10f/0x150
[ 918.130136] do_idle+0x87/0xf0
[ 918.130966] cpu_startup_entry+0x2a/0x30
[ 918.131874] start_secondary+0x129/0x160
[ 918.132798] secondary_startup_64_no_verify+0x184/0x18b
[ 918.133862] </TASK>
[ 918.134588] Modules linked in: netmap(OE) tls binfmt_misc nls_iso8859_1 ppdev crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd parport_pc cryptd input_leds parport psmouse serio_raw ena dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd efi_pstore ip_tables x_tables autofs4 [last unloaded: netmap(OE)]
[ 918.142596] CR2: 0000000000000001
[ 918.143535] ---[ end trace 0000000000000000 ]---
[ 918.274181] RIP: 0010:0x1
[ 918.275426] Code: Unable to access opcode bytes at 0xffffffffffffffd7.
[ 918.276786] RSP: 0018:ffffa697000f4f08 EFLAGS: 00010046
[ 918.277978] RAX: 0000000000000000 RBX: ffff893e91cb0e40 RCX: 0000000000000000
[ 918.279383] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff893e91cb0e40
[ 918.280802] RBP: ffffa697000f4f68 R08: 0000000000000000 R09: 0000000000000000
[ 918.282209] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 918.283590] R13: ffff893eedf247c0 R14: ffff893eedf247c0 R15: ffff893eedf24800
[ 918.284963] FS: 0000000000000000(0000) GS:ffff893eedf00000(0000) knlGS:0000000000000000
[ 918.286432] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 918.287663] CR2: ffffffffffffffd7 CR3: 00000001924ec000 CR4: 00000000003506f0
[ 918.289064] Kernel panic - not syncing: Fatal exception in interrupt
[ 918.290690] Kernel Offset: 0xe400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)