Kexec bpf v5 #1

pfliu · 2025-09-24T02:22:44Z

No description provided.

In latter patches, PE format parser will extract the linux kernel inside and try its real format parser. So making kexec_image_load_default global. Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> To: kexec@lists.infradead.org

The KEXE PE format parser needs the kernel built-in decompressor to decompress the kernel image. So moving the decompressor out of __init sections. Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> To: linux-kernel@vger.kernel.org

In the security kexec_file_load case, the buffer which holds the kernel image should not be accessible from the userspace. Typically, BPF data flow occurs between user space and kernel space in either direction. However, kexec_file_load presents a unique case where user-originated data must be parsed and then forwarded to the kernel for subsequent parsing stages. This necessitates a mechanism to channel the intermedia data from the BPF program directly to the kernel. bpf_kexec_carrier() is introduced to serve that purpose. Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: KP Singh <kpsingh@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> To: bpf@vger.kernel.org

This commit bridges the gap between bpf-prog and the kernel decompression routines. At present, only a global memory allocator is used for the decompression. Later, if needed, the decompress_fn's prototype can be changed to pass in a task related allocator. This memory allocator can allocate 2MB each time with a transient virtual address, up to a 1GB limit. After decompression finishes, it presents all of the decompressed data in a new unified virtual address space. Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: KP Singh <kpsingh@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> To: bpf@vger.kernel.org

As UEFI becomes popular, a few architectures support to boot a PE format kernel image directly. But the internal of PE format varies, which means each parser for each format. This patch (with the rest in this series) introduces a common skeleton to all parsers, and leave the format parsing in bpf-prog, so the kernel code can keep relative stable. A new kexec_file_ops is implementation, named pe_image_ops. There are some place holder function in this patch. (They will take effect after the introduction of kexec bpf light skeleton and bpf helpers). Overall the parsing progress is a pipeline, the current bpf-prog parser is attached to bpf_handle_pefile(), and detatched at the end of the current stage 'disarm_bpf_prog()' the current parsed result by the current bpf-prog will be buffered in kernel 'prepare_nested_pe()' , and deliver to the next stage. For each stage, the bpf bytecode is extracted from the '.bpf' section in the PE file. Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Philipp Rudo <prudo@redhat.com> To: kexec@lists.infradead.org

This patch does two things: First, register as a listener on bpf_copy_to_kernel() Second, in order that the hooked bpf-prog can call the sleepable kfuncs, bpf_handle_pefile and bpf_post_handle_pefile are marked as KF_SLEEPABLE. Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Philipp Rudo <prudo@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: bpf@vger.kernel.org To: kexec@lists.infradead.org

Analague to kernel/bpf/preload/iterators/Makefile, this Makefile is not invoked by the Kbuild system. It needs to be invoked manually when kexec_pe_parser_bpf.c is changed so that kexec_pe_parser_bpf.lskel.h can be re-generated by the command "bpftool gen skeleton -L kexec_pe_parser_bpf.o". kexec_pe_parser_bpf.lskel.h is used directly by the kernel kexec code in later patch. For this patch, there are bpf bytecode contained in opts_data[] and opts_insn[] in kexec_pe_parser_bpf.lskel.h, but in the following patch, they will be removed and only the function API in kexec_pe_parser_bpf.lskel.h left. As exposed in kexec_pe_parser_bpf.lskel.h, the interface between bpf-prog and the kernel are constituted by: four maps: struct bpf_map_desc ringbuf_1; struct bpf_map_desc ringbuf_2; struct bpf_map_desc ringbuf_3; struct bpf_map_desc ringbuf_4; four sections: struct bpf_map_desc rodata; struct bpf_map_desc data; struct bpf_map_desc bss; struct bpf_map_desc rodata_str1_1; two progs: SEC("fentry.s/bpf_handle_pefile") SEC("fentry.s/bpf_post_handle_pefile") They are fixed and provided for all kinds of bpf-prog which interacts with the kexec kernel component. Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Philipp Rudo <prudo@redhat.com> Cc: bpf@vger.kernel.org To: kexec@lists.infradead.org

The routine to search a symbol in ELF can be shared, so split it out. Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Philipp Rudo <prudo@redhat.com> To: kexec@lists.infradead.org

All kexec PE bpf prog should align with the interface exposed by the light skeleton four maps: struct bpf_map_desc ringbuf_1; struct bpf_map_desc ringbuf_2; struct bpf_map_desc ringbuf_3; struct bpf_map_desc ringbuf_4; four sections: struct bpf_map_desc rodata; struct bpf_map_desc data; struct bpf_map_desc bss; struct bpf_map_desc rodata_str1_1; two progs: SEC("fentry.s/bpf_handle_pefile") SEC("fentry.s/bpf_post_handle_pefile") With the above presumption, the integration consists of two parts: -1. Call API exposed by light skeleton from kexec -2. The opts_insn[] and opts_data[] are bpf-prog dependent and can be extracted and passed in from the user space. In the kexec_file_load design, a PE file has a .bpf section, which data content is a ELF, and the ELF contains opts_insn[] opts_data[]. As a bonus, BPF bytecode can be placed under the protection of the entire PE signature. (Note, since opts_insn[] contains the information of the ringbuf size, the bpf-prog writer can change its proper size according to the kernel image size without modifying the kernel code) Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Philipp Rudo <prudo@redhat.com> Cc: bpf@vger.kernel.org To: kexec@lists.infradead.org

Now everything is ready for kexec PE image parser. Select it on arm64 for zboot and UKI image support. Signed-off-by: Pingfan Liu <piliu@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> To: linux-arm-kernel@lists.infradead.org

This BPF program aligns with the convention defined in the kernel file kexec_pe_parser_bpf.lskel.h, where the interface between the BPF program and the kernel is established, and is composed of: four maps: struct bpf_map_desc ringbuf_1; struct bpf_map_desc ringbuf_2; struct bpf_map_desc ringbuf_3; struct bpf_map_desc ringbuf_4; four sections: struct bpf_map_desc rodata; struct bpf_map_desc data; struct bpf_map_desc bss; struct bpf_map_desc rodata_str1_1; two progs: SEC("fentry.s/bpf_handle_pefile") SEC("fentry.s/bpf_post_handle_pefile") This BPF program only uses ringbuf_1, so it minimizes the size of the other three ringbufs to one byte. The size of ringbuf_1 is deduced from the size of the uncompressed file 'vmlinux.bin', which is usually less than 64MB. With the help of a group of bpf kfuncs: bpf_decompress(), bpf_copy_to_kernel(), bpf_mem_range_result_put(), this bpf-prog stores the uncompressed kernel image inside the kernel space. Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Philipp Rudo <prudo@redhat.com> Cc: bpf@vger.kernel.org To: kexec@lists.infradead.org

The objcopy binary can append an section into PE file, but it disregards the DOS header. While the zboot format carries important information: payload offset and size in the DOS header. In order to keep track and update such information, here introducing a dedicated binary tool to build zboot image. The payload offset is determined by the fact that its offset inside the .data section is unchanged. Hence the offset of .data section in the new PE file plus the payload offset within section renders the offset within the new PE file. The objcopy binary can append a section to a PE file, but it disregards the DOS header. However, the zboot format carries important information in the DOS header: payload offset and size. To track this information and append a new PE section, here a dedicated binary tool is introduced to build zboot images. The payload's relative offset within the .data section remains unchanged. Therefore, the .data section offset in the new PE file, plus the payload offset within that section, yields the payload offset within the new PE file. Finally, the new PE file 'zboot.efi' can be got by the command: make -C tools/kexec zboot Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Philipp Rudo <prudo@redhat.com> Cc: bpf@vger.kernel.org To: kexec@lists.infradead.org

pfliu · 2025-09-24T02:23:00Z

/gemini review

gemini-code-assist · 2025-09-24T02:23:21Z

Summary of Changes

Hello @pfliu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Linux kernel's kexec functionality by introducing a BPF-driven mechanism for loading and parsing UEFI Portable Executable (PE) images. This new approach allows for flexible and dynamic handling of complex boot image formats, such as zboot, where various components like the kernel, initrd, and command line might be compressed or nested within the PE file. By integrating BPF, the kernel can delegate the intricate parsing logic to user-defined programs, making the kexec process more adaptable to diverse boot environments and future format evolutions without requiring kernel recompilation for each new format.

Highlights

BPF-assisted PE Image Loading: Introduces a new kexec file loader (kexec_pe_image_ops) that leverages BPF programs to parse UEFI Portable Executable (PE) files, enabling dynamic and extensible handling of boot images.
Dynamic Parsing with Embedded BPF: BPF programs, embedded within a dedicated .bpf section of a PE image, can now dynamically extract and decompress kernel, initrd, and command line components during the kexec loading process.
New BPF Kernel Functions and Infrastructure: Adds essential BPF kernel functions (bpf_decompress, bpf_copy_to_kernel) and supporting data structures (mem_range_result, carrier_listener) to allow BPF programs to manage kernel memory, perform decompression, and pass extracted resources back to the kexec loader.
Zboot Image Support and Tooling: Provides a userspace tool (zboot_image_builder) to inject BPF parsing logic into zboot EFI images, ensuring proper handling of compressed kernel payloads while maintaining PE header integrity. A corresponding BPF program (zboot_parser_bpf.c) is also introduced for this purpose.
ARM64 Architecture Integration: Enables the KEXEC_PE_IMAGE configuration option for ARM64 architectures, allowing this new PE image loading mechanism to be utilized on ARM64 systems.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant new feature for kexec, allowing PE files to be loaded and parsed with the help of BPF programs. This is a complex change that adds a new BPF "carrier" framework for passing data from BPF to the kernel, a PE image loader, and BPF programs for parsing specific PE formats like zboot. While the overall architecture is sound, my review has identified several critical issues, including a mutex deadlock and use-after-free vulnerability in the BPF decompression helper, a memory leak in the PE image loader's error path, and other high-severity bugs related to incorrect resource management and logic. These critical issues must be addressed before this change can be merged.

gemini-code-assist · 2025-09-24T02:25:58Z

kernel/bpf/helpers.c

+	if (!decompressor) {
+		kfree(range);
+		vfree(input_buf);
+		pr_err("Can not find decompress method\n");
+		goto error;
+	}


There are critical issues in this error handling path:

If decompress_method() returns NULL, the code jumps to the error: label while output_buf_mutex is still held. This will cause the function to return without releasing the mutex, leading to a deadlock on subsequent calls.

In the same error path, range is freed via kfree(range), but the function proceeds to return this now-dangling range pointer, which is a use-after-free vulnerability.

The error handling logic must be refactored to ensure the mutex is always unlocked and resources are cleaned up correctly on all paths. Failures after acquiring the mutex should jump to a cleanup path that unlocks the mutex before returning.

gemini-code-assist · 2025-09-24T02:25:58Z

kernel/kexec_pe_image.c

+	for (int i = 0; i < 3; i++) {
+		r = parsed_resource[i].r;
+		if (!r)
+			continue;
+		parsed_resource[i].r = NULL;
+		/*
+		 * The release of buffer defers to
+		 * kimage_file_post_load_cleanup()
+		 */
+		r->buf = NULL;
+		r->buf_sz = 0;
+		mem_range_result_put(r);
+	}


There is a critical memory leak in the error handling path. The cleanup logic at the err: label is also used for the success path. It unconditionally sets r->buf = NULL before calling mem_range_result_put(r).

On a failure path, this prevents __mem_range_result_free from freeing the buffer pointed to by r->buf, causing a memory leak. The cleanup logic should differentiate between success and failure. On failure, mem_range_result_put(r) should be called without modifying r to ensure the buffer is freed.

err: /* On success, ldata is not an error pointer. */ bool success = !IS_ERR(ldata); for (int i = 0; i < 3; i++) { r = parsed_resource[i].r; if (!r) continue; parsed_resource[i].r = NULL; /* * On success, buffer ownership is transferred to kimage. * Prevent freeing it here. On failure, the buffer should be * freed by mem_range_result_put(). */ if (success) { r->buf = NULL; r->buf_sz = 0; } mem_range_result_put(r); }

gemini-code-assist · 2025-09-24T02:25:58Z

kernel/bpf/helpers_carrier.c

+		if (range->alloc_type == TYPE_KALLOC)
+			kfree(kbuf);
+		else
+			vfree(kbuf);


There is a bug in this error handling block. You are checking range->alloc_type to determine which deallocation function to use for kbuf. However, range->alloc_type is not initialized until line 157. At this point, it holds an uninitialized value, which will lead to undefined behavior. You should use the local alloc_type variable, which holds the correct value.

if (alloc_type == TYPE_KALLOC) kfree(kbuf); else vfree(kbuf);

gemini-code-assist · 2025-09-24T02:25:59Z

kernel/kexec_pe_image.c

+		image->cmdline_buf_len = cmdline_len;
+	}
+
+	if (kernel == NULL || initrd == NULL || cmdline == NULL) {


This check is too strict. A valid kexec image might not have an initrd, and the command line can be empty. The kexec_file_load syscall supports these cases. This check will cause pe_image_load to fail for valid scenarios. It should be relaxed to only require that the kernel is present.

if (kernel == NULL) {

gemini-code-assist · 2025-09-24T02:25:59Z

kernel/kexec_pe_image.c

+			strcpy(c, "cmdline ");
+			c += strlen("cmdline ");
+		}
+		c = '\0';


There is a bug in the error message construction. c = '\0'; assigns the null character's value (0) to the pointer c, which is incorrect and will likely cause a crash. To null-terminate the string, you should write to the memory location pointed to by c.

*c = '\0';

gemini-code-assist · 2025-09-24T02:25:59Z

kernel/Kconfig.kexec

+	  This option makes the kexec_file_load() syscall cooperates with bpf-prog
+	  to parse PE format file


The help text has some minor grammatical errors and could be clearer. Consider rephrasing for better readability.

This option enables the kexec_file_load() syscall to cooperate with a BPF program to parse a PE format file.

gemini-code-assist · 2025-09-24T02:25:59Z

kernel/kexec_bpf/kexec_pe_parser_bpf.c

+	return __builtin_memcmp(local_name, res_kernel, 4);
+}
+
+SEC("fentry.s/bpf_handle_pefile")


The section name fentry.s/bpf_handle_pefile appears to be incorrect. The standard format for fentry/fexit programs is fentry/FUNCTION_NAME. The .s is likely a typo and should be removed.

SEC("fentry/bpf_handle_pefile")

gemini-code-assist · 2025-09-24T02:25:59Z

kernel/kexec_bpf/kexec_pe_parser_bpf.c

+	return 0;
+}
+
+SEC("fentry.s/bpf_post_handle_pefile")


Similar to the other fentry program, the section name fentry.s/bpf_post_handle_pefile appears to be incorrect. The .s is likely a typo and should be removed.

SEC("fentry/bpf_post_handle_pefile")

gemini-code-assist · 2025-09-24T02:25:59Z

tools/kexec/zboot_image_builder.c

+	close(base_fd);
+	close(out_fd);
+	close(bpf_fd);


This function has resource leaks. The file pointer bin_fp is opened with fopen but never closed. Additionally, bpf_fd is closed but was never opened, which will attempt to close an arbitrary file descriptor.

close(base_fd); close(out_fd); fclose(bin_fp);

gemini-code-assist · 2025-09-24T02:25:59Z

tools/kexec/zboot_parser_bpf.c

+	/* Overwrite buf */
+	bpf_probe_read((void *)buf, payload_size, context->image + payload_offset);
+	bpf_printk("Calling bpf_kexec_decompress()\n");
+	struct mem_range_result *r = bpf_decompress(buf, payload_size - 4);


The expression payload_size - 4 uses a magic number 4. Please add a comment to explain why 4 bytes are being subtracted from the payload size before decompression. This will improve code readability and maintainability.

gemini-code-assist

Code Review

This pull request introduces a significant new feature enabling kexec to load and parse PE files using BPF programs. This is an innovative approach that involves adding new BPF helpers, a generic "carrier" mechanism for BPF-to-kernel data transfer, and new tooling for embedding BPF programs into PE files. The changes are extensive and touch multiple kernel subsystems. My review has identified a few critical and high-severity bugs in the implementation of the new BPF helpers and the PE image loading logic, along with some medium-severity issues in the supporting tools and the BPF program itself. These should be addressed to ensure the stability and correctness of this new functionality.

gemini-code-assist · 2025-09-24T02:28:49Z

kernel/bpf/helpers_carrier.c

+	ret = copy_from_kernel_nofault(kbuf, buf, size);
+	if (unlikely(ret < 0)) {
+		if (range->alloc_type == TYPE_KALLOC)
+			kfree(kbuf);
+		else
+			vfree(kbuf);
+		kfree(range);
+		ret = -EINVAL;
+		goto err;
+	}


This error handling block has a couple of issues:

copy_from_kernel_nofault() returns an unsigned long indicating the number of bytes not copied. A non-zero value means failure. The current check ret < 0 is incorrect because the return value is never negative. Also, storing the result in an int can cause truncation.

The free logic uses range->alloc_type, but this field is not initialized until line 157. The local alloc_type variable should be used instead.

Here's a suggested fix that addresses both points. I've also changed the error code to -EFAULT which is more appropriate for a copy failure.

if (unlikely(copy_from_kernel_nofault(kbuf, buf, size))) { if (alloc_type == TYPE_KALLOC) kfree(kbuf); else vfree(kbuf); kfree(range); ret = -EFAULT; goto err; }

gemini-code-assist · 2025-09-24T02:28:49Z

kernel/kexec_pe_image.c

+	if (kernel == NULL || initrd == NULL || cmdline == NULL) {
+		char *c, buf[64];
+
+		c = buf;
+		if (kernel == NULL) {
+			strcpy(c, "kernel ");
+			c += strlen("kernel ");
+		}
+		if (initrd == NULL) {
+			strcpy(c, "initrd ");
+			c += strlen("initrd ");
+		}
+		if (cmdline == NULL) {
+			strcpy(c, "cmdline ");
+			c += strlen("cmdline ");
+		}
+		c = '\0';
+		pr_err("Can not extract data for %s", buf);
+		ldata = ERR_PTR(-EINVAL);
+		goto err;
+	}


This error handling logic has several flaws:

It treats a missing initrd or cmdline as a fatal error by checking them with ||. These are often optional, and only a missing kernel should be a fatal error.

The error message construction is buggy. It uses strcpy in a loop, which overwrites the buffer on each iteration instead of appending. For example, if both kernel and initrd are NULL, the buffer will only contain "initrd ".

c = '\0'; is incorrect for null-terminating a string. It should be *c = '\0';.

I suggest refactoring this to only fail on a missing kernel and issue warnings for other missing components.

if (!kernel) { pr_err("Can not extract kernel from PE container\n"); ldata = ERR_PTR(-EINVAL); goto err; } if (!initrd) pr_warn("Could not find initrd in PE container\n"); if (!cmdline) pr_warn("Could not find cmdline in PE container\n");

gemini-code-assist · 2025-09-24T02:28:49Z

kernel/bpf/helpers.c

+/*
+ * At present, one global allocator for decompression. Later if needed, changing the
+ * prototype of decompress_fn to introduce each task's allocator.
+ */
+static DEFINE_MUTEX(output_buf_mutex);


The use of a global mutex (output_buf_mutex) and a single global allocator instance (dcmpr_allocator) makes the bpf_decompress kfunc non-reentrant. While this might be acceptable for the current kexec use case where concurrent calls are unlikely, it limits the general applicability of this helper for other potential use cases in the future. As noted in the comment on line 3724, making this mechanism re-entrant would be a valuable improvement for wider adoption.

gemini-code-assist · 2025-09-24T02:28:50Z

tools/kexec/zboot_image_builder.c

+	munmap(base_start_addr, sb.st_size);
+	close(base_fd);
+	close(out_fd);
+	close(bpf_fd);


The file descriptor bpf_fd is closed here, but it was never initialized or opened. Meanwhile, the file pointer bin_fp is opened with fopen but never closed, leading to a resource leak. You should close bin_fp using fclose.

fclose(bin_fp);

gemini-code-assist · 2025-09-24T02:28:50Z

tools/kexec/zboot_parser_bpf.c

+	/* Overwrite buf */
+	bpf_probe_read((void *)buf, payload_size, context->image + payload_offset);
+	bpf_printk("Calling bpf_kexec_decompress()\n");
+	struct mem_range_result *r = bpf_decompress(buf, payload_size - 4);


The subtraction of 4 from payload_size appears to be a magic number. To improve code clarity and maintainability, please add a comment explaining why these 4 bytes are being excluded from the decompression payload. For instance, it might be a checksum or size field that isn't part of the compressed stream.

The kernel forbids the creation of non-FDB nexthop groups with FDB nexthops: # ip nexthop add id 1 via 192.0.2.1 fdb # ip nexthop add id 2 group 1 Error: Non FDB nexthop group cannot have fdb nexthops. And vice versa: # ip nexthop add id 3 via 192.0.2.2 dev dummy1 # ip nexthop add id 4 group 3 fdb Error: FDB nexthop group can only have fdb nexthops. However, as long as no routes are pointing to a non-FDB nexthop group, the kernel allows changing the type of a nexthop from FDB to non-FDB and vice versa: # ip nexthop add id 5 via 192.0.2.2 dev dummy1 # ip nexthop add id 6 group 5 # ip nexthop replace id 5 via 192.0.2.2 fdb # echo $? 0 This configuration is invalid and can result in a NPD [1] since FDB nexthops are not associated with a nexthop device: # ip route add 198.51.100.1/32 nhid 6 # ping 198.51.100.1 Fix by preventing nexthop FDB status change while the nexthop is in a group: # ip nexthop add id 7 via 192.0.2.2 dev dummy1 # ip nexthop add id 8 group 7 # ip nexthop replace id 7 via 192.0.2.2 fdb Error: Cannot change nexthop FDB status while in a group. [1] BUG: kernel NULL pointer dereference, address: 00000000000003c0 [...] Oops: Oops: 0000 [#1] SMP CPU: 6 UID: 0 PID: 367 Comm: ping Not tainted 6.17.0-rc6-virtme-gb65678cacc03 #1 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014 RIP: 0010:fib_lookup_good_nhc+0x1e/0x80 [...] Call Trace: <TASK> fib_table_lookup+0x541/0x650 ip_route_output_key_hash_rcu+0x2ea/0x970 ip_route_output_key_hash+0x55/0x80 __ip4_datagram_connect+0x250/0x330 udp_connect+0x2b/0x60 __sys_connect+0x9c/0xd0 __x64_sys_connect+0x18/0x20 do_syscall_64+0xa4/0x2a0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: 38428d6 ("nexthop: support for fdb ecmp nexthops") Reported-by: syzbot+6596516dd2b635ba2350@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68c9a4d2.050a0220.3c6139.0e63.GAE@google.com/ Tested-by: syzbot+6596516dd2b635ba2350@syzkaller.appspotmail.com Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250921150824.149157-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Ido Schimmel says: ==================== nexthop: Various fixes Patch #1 fixes a NPD that was recently reported by syzbot. Patch #2 fixes an issue in the existing FIB nexthop selftest. Patch #3 extends the selftest with test cases for the bug that was fixed in the first patch. ==================== Link: https://patch.msgid.link/20250921150824.149157-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Add 0x29 as the accelerometer address for the Dell Latitude E6530 to lis3lv02d_devices[]. The address was verified as below: $ cd /sys/bus/pci/drivers/i801_smbus/0000:00:1f.3 $ ls -d i2c-* i2c-20 $ sudo modprobe i2c-dev $ sudo i2cdetect 20 WARNING! This program can confuse your I2C bus, cause data loss and worse! I will probe file /dev/i2c-20. I will probe address range 0x08-0x77. Continue? [Y/n] Y 0 1 2 3 4 5 6 7 8 9 a b c d e f 00: 08 -- -- -- -- -- -- -- 10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 20: -- -- -- -- -- -- -- -- -- UU -- 2b -- -- -- -- 30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 40: -- -- -- -- 44 -- -- -- -- -- -- -- -- -- -- -- 50: UU -- 52 -- -- -- -- -- -- -- -- -- -- -- -- -- 60: -- 61 -- -- -- -- -- -- -- -- -- -- -- -- -- -- 70: -- -- -- -- -- -- -- -- $ cat /proc/cmdline BOOT_IMAGE=/vmlinuz-linux-cachyos-bore root=UUID=<redacted> rw loglevel=3 quiet dell_lis3lv02d.probe_i2c_addr=1 $ sudo dmesg [ 0.000000] Linux version 6.16.6-2-cachyos-bore (linux-cachyos-bore@cachyos) (gcc (GCC) 15.2.1 20250813, GNU ld (GNU Binutils) 2.45.0) #1 SMP PREEMPT_DYNAMIC Thu, 11 Sep 2025 16:01:12 +0000 […] [ 0.000000] DMI: Dell Inc. Latitude E6530/07Y85M, BIOS A22 11/30/2018 […] [ 5.166442] i2c i2c-20: Probing for lis3lv02d on address 0x29 [ 5.167854] i2c i2c-20: Detected lis3lv02d on address 0x29, please report this upstream to platform-driver-x86@vger.kernel.org so that a quirk can be added Signed-off-by: Nickolay Goppen <setotau@mainlining.org> Reviewed-by: Hans de Goede <hansg@kernel.org> Link: https://patch.msgid.link/20250917-dell-lis3lv02d-latitude-e6530-v1-1-8a6dec4e51e9@mainlining.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

Running sha224_kunit on a KMSAN-enabled kernel results in a crash in kmsan_internal_set_shadow_origin(): BUG: unable to handle page fault for address: ffffbc3840291000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G N 6.17.0-rc3 torvalds#10 PREEMPT(voluntary) Tainted: [N]=TEST Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100 [...] Call Trace: <TASK> __msan_memset+0xee/0x1a0 sha224_final+0x9e/0x350 test_hash_buffer_overruns+0x46f/0x5f0 ? kmsan_get_shadow_origin_ptr+0x46/0xa0 ? __pfx_test_hash_buffer_overruns+0x10/0x10 kunit_try_run_case+0x198/0xa00 This occurs when memset() is called on a buffer that is not 4-byte aligned and extends to the end of a guard page, i.e. the next page is unmapped. The bug is that the loop at the end of kmsan_internal_set_shadow_origin() accesses the wrong shadow memory bytes when the address is not 4-byte aligned. Since each 4 bytes are associated with an origin, it rounds the address and size so that it can access all the origins that contain the buffer. However, when it checks the corresponding shadow bytes for a particular origin, it incorrectly uses the original unrounded shadow address. This results in reads from shadow memory beyond the end of the buffer's shadow memory, which crashes when that memory is not mapped. To fix this, correctly align the shadow address before accessing the 4 shadow bytes corresponding to each origin. Link: https://lkml.kernel.org/r/20250911195858.394235-1-ebiggers@kernel.org Fixes: 2ef3cec ("kmsan: do not wipe out origin when doing partial unpoisoning") Signed-off-by: Eric Biggers <ebiggers@kernel.org> Tested-by: Alexander Potapenko <glider@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

When the PAGEMAP_SCAN ioctl is invoked with vec_len = 0 reaches pagemap_scan_backout_range(), kernel panics with null-ptr-deref: [ 44.936808] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 44.937797] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 44.938391] CPU: 1 UID: 0 PID: 2480 Comm: reproducer Not tainted 6.17.0-rc6 torvalds#22 PREEMPT(none) [ 44.939062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 44.939935] RIP: 0010:pagemap_scan_thp_entry.isra.0+0x741/0xa80 <snip registers, unreliable trace> [ 44.946828] Call Trace: [ 44.947030] <TASK> [ 44.949219] pagemap_scan_pmd_entry+0xec/0xfa0 [ 44.952593] walk_pmd_range.isra.0+0x302/0x910 [ 44.954069] walk_pud_range.isra.0+0x419/0x790 [ 44.954427] walk_p4d_range+0x41e/0x620 [ 44.954743] walk_pgd_range+0x31e/0x630 [ 44.955057] __walk_page_range+0x160/0x670 [ 44.956883] walk_page_range_mm+0x408/0x980 [ 44.958677] walk_page_range+0x66/0x90 [ 44.958984] do_pagemap_scan+0x28d/0x9c0 [ 44.961833] do_pagemap_cmd+0x59/0x80 [ 44.962484] __x64_sys_ioctl+0x18d/0x210 [ 44.962804] do_syscall_64+0x5b/0x290 [ 44.963111] entry_SYSCALL_64_after_hwframe+0x76/0x7e vec_len = 0 in pagemap_scan_init_bounce_buffer() means no buffers are allocated and p->vec_buf remains set to NULL. This breaks an assumption made later in pagemap_scan_backout_range(), that page_region is always allocated for p->vec_buf_index. Fix it by explicitly checking p->vec_buf for NULL before dereferencing. Other sites that might run into same deref-issue are already (directly or transitively) protected by checking p->vec_buf. Note: From PAGEMAP_SCAN man page, it seems vec_len = 0 is valid when no output is requested and it's only the side effects caller is interested in, hence it passes check in pagemap_scan_get_args(). This issue was found by syzkaller. Link: https://lkml.kernel.org/r/20250922082206.6889-1-acsjakub@amazon.de Fixes: 52526ca ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: Jakub Acs <acsjakub@amazon.de> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jinjiang Tu <tujinjiang@huawei.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Penglei Jiang <superman.xpt@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Pingfan Liu and others added 13 commits August 8, 2025 20:00

Merge branch 'torvalds:master' into kexec_bpf_v5

b43dc73

gemini-code-assist bot reviewed Sep 24, 2025

View reviewed changes

		This option makes the kexec_file_load() syscall cooperates with bpf-prog
		to parse PE format file

Kexec bpf v5 #1

Are you sure you want to change the base?

Kexec bpf v5 #1

Uh oh!

Conversation

pfliu commented Sep 24, 2025

Uh oh!

pfliu commented Sep 24, 2025

Uh oh!

gemini-code-assist bot commented Sep 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants