Conversation
|
Just some background on this submission. I created this backend for some work on the PortMaster project of running classic PC games on cheap arm linux handhelds. It was developed entirely with claude code, so if you have an aversion to LLM development, this is severely tainted. It's been tested on haxe unit.hl as well as running Dead Cells on my M2 Asahi Linux machine. It's not super fancy, the register allocation scheme is pretty basic. We do try to utilize callee and caller-saved registers when we can. Still, it runs well enough for my needs. |
tobil4sk
left a comment
There was a problem hiding this comment.
Hi, this looks quite exciting! :) Just a few general comments, which will hopefully help clean things up a bit. I haven't looked at the actual jit implementation, hopefully someone else might be able to.
|
Interesting but a lot of changes. One question: does your implementation passes Haxe compiler unit tests? That would be a requirement before any merge. |
Yes all but three related to stack trace formatting. |
4a2a412 to
15cd29d
Compare
tobil4sk
left a comment
There was a problem hiding this comment.
There is also this place in CI that should be updated for arm64 support:
hashlink/.github/workflows/build.yml
Line 329 in e83fc71
Also wonder if jit_common.h and jit_shared.c make more sense to be named the same thing?
|
Since call stacks were mentioned and we're currently investigating a platform-specific HL problem related to that, could you try running https://github.com/HaxeFoundation/hxcoro/blob/master/callstack-tests/build-hl.hxml and let me know how that goes? |
|
|
Nice! That means #892 really is a problem with the x86 jit. |
|
There are some conflicts that need to be resolved (probably mainly in the Makefile, I can help with those if needed). The haxe test suite is also now part of ci, so if this branch is updated and enables jit tests for arm64 then we can verify that the haxe test suite is passing on arm64 mac/linux. |
Enable HashLink VM on AArch64 (Apple Silicon, ARM Linux servers, etc.) by adding a new JIT backend alongside the existing x86/x64 one. - Rename jit.c to jit_x86.c, extract shared code into jit_common.h/jit_shared.c - Add jit_aarch64.c/jit_aarch64_emit.c for ARM64 instruction selection and encoding - Add jit_elf.c for GDB JIT debug interface - Architecture-aware JIT selection in Makefile and CMakeLists.txt - Add aarch64 support in hl.h, profile.c, hlmodule.h, module.c
- Exception type filtering: OTrap now looks ahead at catch handler opcodes to set tcheck for typed exception catches, matching x86 - hl_jit_free: properly clean up all allocator state and support can_reset for hot reload, fixing memory leaks - OAssert: use correct LDR+BLR+B+literal pool pattern instead of broken literal+BL sequence that was never patched - OSwitch: replace O(n) linear CMP/B.EQ scan with O(1) branch table using ADR+ADD+BR - Size encoding: large-offset paths in op_get_mem/op_set_mem now correctly handle 1-byte and 2-byte access sizes Inspired by review of HaxeFoundation#857. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…hared - Remove --export-dynamic from Makefile LFLAGS (was accidentally reintroduced during merge; removed upstream in fec624c) - Restore BOM and no-trailing-newline in hl.vcxproj.filters for Visual Studio compatibility - Rename jit_shared.c to jit_common.c to match jit_common.h Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rebased and pushed. I also enabled the CI test on arm so hopefully it'll pass |
|
Looks like there is some |
|
There's a real bug in the SHA code tests so I'm working on a fix |
X16 and X17 were included in RCPU_SCRATCH_REGS, making them allocatable for vreg storage. However: 1. X17 is used for opcode debug markers (MOV W17, #marker) emitted before every opcode — clobbering any vreg value in W17 2. X16 is used as RTMP throughout the JIT for multi-instruction sequences (large stack offsets, address calculations, etc.) — clobbering any vreg value in W16 Under low register pressure, the allocator would pick X0-X15 first and never use X16/X17. But with 30+ vregs (like SHA256's computation), the allocator would spill into X16/X17, causing silent data corruption. Fix: Removed X16 and X17 from RCPU_SCRATCH_REGS and reduced RCPU_SCRATCH_COUNT from 18 to 16. These registers are reserved for their intended scratch/temporary purposes.
|
Looks good! The last thing to check that I have is the hxcoro tests: https://github.com/HaxeFoundation/hxcoro/blob/master/tests/build-hl.hxml (Which we should probably also run as part of the CI here because Haxe itself doesn't do that for HL.) |
|
|
Thank you for contributing this, but right now I cannot merge it. Not because it is not stable enough but because we recently identified an important way to optimize further the HL VM, but this requires a quite extensive rewrite of the JIT by introducing an intermediate language (IL) that will do most of the work of the current JIT, but in a more abstract manner (SSA) then do proper register allocation using LSRA. I think when we have this done for x86_64, it will much more easier to add ARM support, and will be much more easy to maintain and develop over the long term. The HLVM JIT was my first, and it went from 32 bits initially to 64 bits without much refactoring. I think it's due for a complete refactor for the next big HLVM version. I'm keeping the PR open as some might want to try it for their own usage. |
|
I have another backend that converts hashlink to LLVM-IR which I use for AOT compilation. If you're inventing your own IR you might just consider using LLVM-IR and then doing some really-shoddy-but-fast AOT compilation of LLVM-IR for your jit. |
|
Or at the very least you could use my LLVM-IR backend as a comparative example for your SSA Phi functions. But also, you don't want to just commit this to take the pressure off your whole IR rewrite? Surely it's existence presents a design requirement for you to support aarch64 in any rewrite. |
|
I'm curious about your LLVM-IR backend, if you can share the sources. I think I will still come up with my own IR, as I prefer the VM to be self-contained, and there are some advanced topics that needs to be handled such as debugger wrt register allocation. |
|
Here's the llvm backend. I also have a patch for hashlink gdb elf integration that produces proper gdb stacktraces, as well as heaptrack integration for jit memory profiling. I'm not going to do the work of submitting it if you are mid rewrite though, so let me know beforehand if any of it is usable to your efforts. https://github.com/bmdhacks/hashlink/tree/bmd-aarch64/src/llvm |
Enable HashLink VM on AArch64 (Apple Silicon, ARM Linux servers, etc.) by adding a new JIT backend alongside the existing x86/x64 one.