Skip to content

Android: Don't register LLVM signal handlers#5070

Draft
kinke wants to merge 2 commits intoldc-developers:masterfrom
kinke:android_test
Draft

Android: Don't register LLVM signal handlers#5070
kinke wants to merge 2 commits intoldc-developers:masterfrom
kinke:android_test

Conversation

@kinke
Copy link
Member

@kinke kinke commented Feb 22, 2026

A little test wrt. #4383, based on termux/termux-packages#28586 (comment).

@robertkirkman
Copy link

I was able to successfully compile this in GitHub Actions by changing the Ubuntu version to 24.04 and trying until it worked:

https://github.com/robertkirkman/ldc/actions/runs/22288993543/job/64472748363

and, when I test this one, it looks like it works just slightly more than the 1.41.0 release! It prints the messages Error: No source files and LDC - the LLVM D compiler, but the other one doesn't.

however, unfortunately, when I try to compile anything with it, it still prints stack corruption detected (-fstack-protector) Aborted, so the same problem might still be happening, and it might be just a coincidence that it prints slightly more messages.

~/ldc2-8005ae55-android-aarch64/bin $ ./ldc2 --version
LDC - the LLVM D compiler (1.42.0-git-8005ae5):
  based on DMD v2.112.1 and LLVM 21.1.8
  built with LDC - the LLVM D compiler (1.42.0-git-8005ae5)
  Default target: aarch64-none-linux-android29
stack corruption detected (-fstack-protector)
Aborted                    ./ldc2 --version
~/ldc2-8005ae55-android-aarch64/bin $ cat > test.d << 'EOF'
>      void main()
     {
          import std.stdio : writefln;
          "Hello".writefln!"%s, World!";
     }
> EOF
~/ldc2-8005ae55-android-aarch64/bin $ ./ldc2 test.d
stack corruption detected (-fstack-protector)
Aborted                    ./ldc2 test.d
~/ldc2-8005ae55-android-aarch64/bin $ ./ldc2
Error: No source files
~/ldc2-8005ae55-android-aarch64/bin $ gdb --args ./ldc2 test.d
GNU gdb (GDB) 16.3
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/
...
stack corruption detected (-fstack-protector)

Program received signal SIGABRT, Aborted.
0x0000007fbb5a7994 in abort () from /apex/com.android.runtime/lib64/bionic/libc.so
(gdb) bt
#0  0x0000007fbb5a7994 in abort () from /apex/com.android.runtime/lib64/bionic/libc.so
#1  0x0000007fbb5bc5ec in __stack_chk_fail () from /apex/com.android.runtime/lib64/bionic/libc.so
#2  0x000000555a1c4a18 in RegisterHandlers() ()
#3  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

@kinke
Copy link
Member Author

kinke commented Feb 23, 2026

Heh wow, thx a lot for the effort! - Okay then this a) doesn't seem to prevent the RegisterHandlers() call as I had hoped, and b) seems to be indeterministic based on your wild results (incl. crashing after the --version output, which is way after the RegisterHandlers() call).

I'm wondering if this is just some -fstack-protector-strong incompatibility for LLVM itself, or some genuine Android-specific bug on our end. AFAIK, the prebuilt D libraries work fine, as well as cross-compiled+linked binaries using the NDK. So it doesn't seem too likely to me that only LDC itself would be miscompiled somehow, causing a genuine stack corruption.

@kinke
Copy link
Member Author

kinke commented Feb 23, 2026

Btw, is the ulimit -s stack size 8 MB as on regular Linux?

@robertkirkman
Copy link

ulimit -s shows 8192 in both Termux and ADB shell on my device, yes.

@kinke
Copy link
Member Author

kinke commented Feb 23, 2026

FWIW, I've tried -fstack-protector-strong for LLVM and the LDC C[++] parts, on Linux x86_64, using gcc 13 (couldn't get clang to work) - seems to work fine.

I'll try an Android LLVM build without -fstack-protector-strong next (via CI).

@kinke
Copy link
Member Author

kinke commented Feb 24, 2026

I've installed Termux on my phone and played around a bit. I can reproduce the wild behavior, incl. this signal-handling-skipping here definitely getting LDC much further. The other bundled D binaries seem to run fine btw.

Using this CI artifact here, compiling a minimal program with -o- (no codegen, just semantic analysis) works fine, consistently, while beta3 crashes right away. No luck with gdb and lldb debugging so far at all. The stack corruption happens consistently while trying to write the object file to disk (checked via -vv output) - which seems to involve signal handling again (causing a RegisterHandlers() call), as we use a llvm::ToolOutputFile for these temp files, and they are removed on any signal.

So it looks as if the signal handling stuff in our Android LLVM builds is seriously broken. The CMake build might need explicit options to work properly on Android; checking similar builds could help.

@robertkirkman
Copy link

There's a libllvm package in Termux at version 21.1.8 that we try to use as much as possible with LLVM-based compilers, so I might try compiling ldc with non-cross-compilation (inside Termux) and the prebuilt Termux libllvm package by attempting to follow these steps: https://wiki.dlang.org/Building_LDC_from_source#Building_LDC_from_source to check if it's possible for it to work that way. If it does, then it would suggest the problem is just in the LLVM build settings of the CI in github.com/ldc-developers/ldc , but if it doesn't, then something else might be going on.

@kinke
Copy link
Member Author

kinke commented Feb 25, 2026

Cool - that work won't be in vain, since that's exactly how the Termux package should be built nowadays. AFAICT, you shouldn't need any extra CMake flags on Android, it hopefully just builds and works fine.

After the latest results, I'm pretty positive that the problem lies in our prebuilt LLVM binaries. But if it's not the alternate-stack thing either, I'd be out of ideas. Maybe trying once more with the latest NDK.

@robertkirkman
Copy link

I have good news, which is that I was able to compile the newest LDC inside Termux, and also that it can now compile and run hello world when I do that!

There is possibly a remaining problem that when I run the hello world program, it crashes when ending instead of ending cleanly, but overall this is good. I used 64-bit ARM Android 13.

commands used
# dependencies:
# approximately 'pkg install build-essential libllvm-static ldc cmake ninja termux-elf-cleaner'
git clone --recursive https://github.com/ldc-developers/ldc.git
cd ldc
mkdir build
cd build
export LDFLAGS="-lzstd"
cmake -G Ninja .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$PREFIX
ninja
rm /data/data/com.termux/files/usr/etc/ldc2.conf
ninja install
ldc2 --version
cat > hello.d << 'EOF'
import std.stdio;

void main() {
    writeln("Hello, World!");
}
EOF
ldc2 hello.d
termux-elf-cleaner hello
./hello
result
~/.../ldc/build $ ldc2 --version
cat > hello.d << 'EOF'
import std.stdio;

void main() {
    writeln("Hello, World!");
}
EOF
ldc2 hello.d
LDC - the LLVM D compiler (1.42.0-git-2e33c9c):
  based on DMD v2.112.1 and LLVM 21.1.8
  built with LDC - the LLVM D compiler (1.30.0)
  Default target: aarch64-unknown-linux-android24
  Host CPU: cortex-a76
  https://dlang.org - https://wiki.dlang.org/LDC


  Registered Targets:
    aarch64     - AArch64 (little endian)
    aarch64_32  - AArch64 (little endian ILP32)
    aarch64_be  - AArch64 (big endian)
    amdgcn      - AMD GCN GPUs
    arc         - ARC
    arm         - ARM
    arm64       - ARM64 (little endian)
    arm64_32    - ARM64 (little endian ILP32)
    armeb       - ARM (big endian)
    avr         - Atmel AVR Microcontroller
    bpf         - BPF (host endian)
    bpfeb       - BPF (big endian)
    bpfel       - BPF (little endian)
    csky        - C-SKY
    hexagon     - Hexagon
    lanai       - Lanai
    loongarch32 - 32-bit LoongArch
    loongarch64 - 64-bit LoongArch
    m68k        - Motorola 68000 family
    mips        - MIPS (32-bit big endian)
    mips64      - MIPS (64-bit big endian)
    mips64el    - MIPS (64-bit little endian)
    mipsel      - MIPS (32-bit little endian)
    msp430      - MSP430 [experimental]
    nvptx       - NVIDIA PTX 32-bit
    nvptx64     - NVIDIA PTX 64-bit
    ppc32       - PowerPC 32
    ppc32le     - PowerPC 32 LE
    ppc64       - PowerPC 64
    ppc64le     - PowerPC 64 LE
    r600        - AMD GPUs HD2XXX-HD6XXX
    riscv32     - 32-bit RISC-V
    riscv64     - 64-bit RISC-V
    sparc       - Sparc
    sparcel     - Sparc LE
    sparcv9     - Sparc V9
    spirv       - SPIR-V Logical
    spirv32     - SPIR-V 32-bit
    spirv64     - SPIR-V 64-bit
    systemz     - SystemZ
    thumb       - Thumb
    thumbeb     - Thumb (big endian)
    ve          - VE
    wasm32      - WebAssembly 32-bit
    wasm64      - WebAssembly 64-bit
    x86         - 32-bit X86: Pentium-Pro and above
    x86-64      - 64-bit X86: EM64T and AMD64
    xcore       - XCore
~/.../ldc/build $ ./hello
error: "./hello": executable's TLS segment is underaligned: alignment is 16, needs to be at least 64 for ARM64 Bionic
Aborted                    ./hello
~/.../ldc/build $ termux-elf-cleaner hello
./hello
termux-elf-cleaner: Changing TLS alignment for 'hello' to 64, instead of 16
termux-elf-cleaner: Replacing unsupported DF_1_* flags 134217729 with 1 in 'hello'
Hello, World!
Segmentation fault         ./hello
~/.../ldc/build $ gdb ./hello 
...
Hello, World!

Program received signal SIGSEGV, Segmentation fault.
0x00000055555ba998 in object.keys!(bool, std.concurrency.Tid).keys(inout(bool[std.concurrency.Tid])) ()
(gdb) bt
#0  0x00000055555ba998 in object.keys!(bool, std.concurrency.Tid).keys(inout(bool[std.concurrency.Tid])) ()
#1  0x00000055555bcb2c in std.concurrency.ThreadInfo.cleanup() ()
#2  0x000000555560e184 in rt.minfo.rt_moduleTlsDtor().__foreachbody_L625_C5(ref rt.sections_elf_shared.DSO) ()
#3  0x000000555560f67c in rt.sections_elf_shared.DSO.opApplyReverse(scope int(ref rt.sections_elf_shared.DSO) delegate) ()
#4  0x000000555560b260 in rt_term ()
#5  0x000000555560b718 in rt.dmain2._d_run_main2(char[][], ulong, extern(C) int(char[][]) function).runAll() ()
#6  0x000000555560b5a4 in _d_run_main2 ()
#7  0x000000555560b424 in _d_run_main ()
#8  0x00000055555b7de8 in main ()
#9  0x0000007fba33e1f8 in __libc_init () from /apex/com.android.runtime/lib64/bionic/libc.so
(gdb) 

On 64-bit ARM Android 10 I see this:

~/.../build/bin $ ldc2 hello.d
ld.lld: error: undefined symbol: __tls_get_addr
>>> referenced by sections_elf_shared.d
>>>               sections_elf_shared.o:(_D2rt19sections_elf_shared13initTLSRangesFNbNiZPS4core8internal9container5array__T5ArrayTAvZQk) in archive /data/data/com.termux/files/usr/lib/libdruntime-ldc.a
>>> referenced by sections_elf_shared.d
>>>               sections_elf_shared.o:(_d_dso_registry) in archive /data/data/com.termux/files/usr/lib/libdruntime-ldc.a
cc: error: linker command failed with exit code 1 (use -v to see invocation)
Error: /data/data/com.termux/files/usr/bin/cc failed with status: 1

I assume that this would be the API level 30 restriction for the thread-local storage that you mentioned. since this error only occurs on Android 10 and not Android 13, I assume that the other problem with hello world crashing when ending on Android 13 must be something different?

Next, I will try to cross-compile the newest LDC using a similar setup, but the cross-compilation equivalent, to update the Termux ldc package, and check if I can make that work.

@kinke
Copy link
Member Author

kinke commented Feb 26, 2026

Oh, finally some good news indeed!

Default target: aarch64-unknown-linux-android24

I guess you'll have to specify -mtriple=aarch64-linux-android30 explicitly to target the correct min API level. Then hopefully the error wrt. underaligned TLS segment vanishes (I cannot imagine that we'd need to handle that in LDC), and the segfault at runtime.

ld.lld: error: undefined symbol: __tls_get_addr

It's bizarre - the static libc.a contains that symbol for API level 29 (where the native ELF TLS was officially introduced), but the libc.so only contains it since API level 30. Only for aarch64, the armv7a .so contains it since level 29. Initially I thought this was a NDK oversight, but I've checked the r29 NDK yesterday, still the same.

@robertkirkman
Copy link

I guess you'll have to specify -mtriple=aarch64-linux-android30 explicitly to target the correct min API level. Then hopefully the error wrt. underaligned TLS segment vanishes (I cannot imagine that we'd need to handle that in LDC), and the segfault at runtime.

I tried that in this way (on Android 13):

~/.../ldc/build $ ldc2 hello.d -mtriple=aarch64-linux-android30
~/.../ldc/build $ ./hello 
error: "./hello": executable's TLS segment is underaligned: alignment is 16, needs to be at least 64 for ARM64 Bionic
Aborted                    ./hello
~/.../ldc/build $ termux-elf-cleaner ./hello
termux-elf-cleaner: Changing TLS alignment for './hello' to 64, instead of 16
termux-elf-cleaner: Replacing unsupported DF_1_* flags 134217729 with 1 in './hello'
~/.../ldc/build $ ./hello 
Hello, World!
Segmentation fault         ./hello
~/.../ldc/build $ 

but I seem to see the same result still.

@kinke
Copy link
Member Author

kinke commented Feb 26, 2026

Hmm. This post is from 2018, so might be outdated, but shows how stack-protection and TLS might trip over each other on Android: https://reviews.llvm.org/D53906#1281612

@kinke
Copy link
Member Author

kinke commented Feb 26, 2026

I tried that in this way (on Android 13): […] but I seem to see the same result still.

Note that the druntime and Phobos libraries (linked into the hello-world, and also the location of the std.concurrency TLS module destructor which segfaults) are still targeting API level 24 (built during the LDC build). You can set the CMake var D_EXTRA_FLAGS=-mtriple=aarch64-linux-android30 to build these libs for API level 30.

@robertkirkman
Copy link

Hmm, I've tried redoing the same steps in the commands I sent earlier a few more times, and unfortunately the results seem to vary, because I'm not able to reproduce the working build of LDC 1.42.0 anymore. Now, what keeps happening whenever I build LDC and try to compile and run the hello world program, regardless of whether I use -mtriple=aarch64-linux-android30, -DD_EXTRA_FLAGS=-mtriple=aarch64-linux-android30 and termux-elf-cleaner or not, is error: "./hello": executable's TLS segment is underaligned: alignment is 64 (skew 48), needs to be at least 64 for ARM64 Bionic. Either the first time I tried this I did something different by accident that I can't reproduce anymore, or the results are variable and only successful rarely.

Termux packages are all built targeting API level 24, and among them are several important packages that I think could be having an effect on LDC, including libllvm and ndk-sysroot. I might try setting up custom builds of those packages targeting API level 30 instead of 24 and try using them to build LDC, to check whether they are part of the problem.

@kinke
Copy link
Member Author

kinke commented Feb 28, 2026

Hmm - I've checked the TLS segment alignment of the cross-compiled CI dlang binaries here, and they are 64-bytes aligned, incl. the ldc2 executable. So my assumption is that when targeting API level 29/30 (for all D parts linked into the binary), LLVM takes care of that automatically (and possibly more stuff wrt. TLS).

For LLVM and its C[++] dependencies, I guess they might not use any TLS stuff at all, so might not need to be compiled for higher API levels. We never compiled LLVM itself with our former custom TLS emulation (e.g., for Termux' LDC v1.30).

Wrt. stack corruption for the CI binaries here (well, #5072), I'm really out of ideas. I was never able to see a meaningful stacktrace, e.g., never even saw RegisterHandlers() in there, which your stack trace included.

@kinke
Copy link
Member Author

kinke commented Feb 28, 2026

For LLVM and its C[++] dependencies, I guess they might not use any TLS stuff at all

Oh well, LLVM seems to have a few TLS vars - incl. for backtraces support, which can be disabled (LLVM_ENABLE_BACKTRACES=OFF) to avoid that according to https://github.com/llvm/llvm-project/blob/2078da43e25a4623cab2d0d60decddf709aaea28/llvm/lib/Support/PrettyStackTrace.cpp#L44-L47:

// If backtrace support is not enabled, compile out support for pretty stack
// traces.  This has the secondary effect of not requiring thread local storage
// when backtrace support is disabled.
#if ENABLE_BACKTRACES

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants