Skip to content

Conversation

@wfouche
Copy link
Contributor

@wfouche wfouche commented Dec 5, 2025

jbang --help on Windows does not display Unicode characters.

image

but now it does.

image

Fixes #2350

@quintesse
Copy link
Contributor

Although this definitely works the issue I have with this is that the encoding will persist after JBang exits. So we have basically changed the console's encoding affecting any commands we run afterwards.

Now in itself that might not be bad, in many (most?) cases this might actually improve things for the user. But it would be somewhat weird to see (some) apps behaving differently before JBang was run vs after JBang was run.

Of course we could try resetting the code page back again before exiting. Or point users to documentation on how to change codepages on the system level. (Although I definitely like being able to show correct output regardless of the user's system settings)

@wfouche
Copy link
Contributor Author

wfouche commented Dec 5, 2025

100% agree with what you are saying. However, I think this PR is better approach than asking users to globally enable

image

because

  1. Most will never do this.

  2. If it is set globally then the risk of it breaking something is higher than localizing this setting to the specific Windows shell that JBang is running in.

I think it is better to enable this setting as the PR proposes, than not to have it enabled at all. It might break an ancient Windows command-line program that requires the default Windows code page. I don't think many of those are still in use by JBang command-line users.

@wfouche
Copy link
Contributor Author

wfouche commented Dec 5, 2025

Alternative CMD implementation.

REM Step 1 - save current code page
for /f "tokens=3" %%c in ('chcp') do set "ORIGINAL_CP=%%c"

REM Step 2 - set code page to UTF-8
chcp 65001 > NUL


REM Step n - restore original code page
chcp %ORIGINAL_CP% > nul

Something similar could be done for PowerShell. But I think this might be overkill.

CMD script code to only set the code page to 65001 if required:

setlocal
for /F "tokens=4" %%c in ('chcp') do (
    if "%%c" NEQ "65001" (
        chcp 65001 > nul
    )
)
endlocal

Best practice for modern, cross-platform compatibility.
@quintesse
Copy link
Contributor

Something similar could be done for PowerShell. But I think this might be overkill.

You might be right, I'll let @maxandersen decide

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Unicode character display issues in JBang's Windows launcher scripts by enabling UTF-8 encoding. The fix ensures that help text and other output containing Unicode characters (like special symbols) display correctly on Windows console environments instead of showing garbled characters.

Key Changes:

  • Enabled UTF-8 encoding in PowerShell launcher by setting console encoding properties
  • Enabled UTF-8 code page (65001) in CMD launcher script

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/main/scripts/jbang.ps1 Added UTF-8 encoding configuration for PowerShell console input/output at script initialization
src/main/scripts/jbang.cmd Added UTF-8 code page activation (chcp 65001) at script initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@maxandersen
Copy link
Collaborator

I do think we should clean up - it's not good behaviour to modify users environment. We do the env cleanup for other variables; especially on windows.

My main concerns are:

  • does it affect the execution speed? I doubt it but windows can be surprising :)
  • impact on users execution. Should we run the users app with this on or off? I'm leaning towards keeping it on as UTF-8 is just easier :) but could imagine we would need to offer flag to NOT apply it - so should we do it just for jbang exec part?
  • should we do similar on other OS for consistent behaviour ?

@wfouche
Copy link
Contributor Author

wfouche commented Dec 6, 2025

does it affect the execution speed? I doubt it but windows can be surprising :)

Adds 7ms overhead on my 8-core dev machine

@wfouche
Copy link
Contributor Author

wfouche commented Dec 6, 2025

impact on users execution. Should we run the users app with this on or off?

It should be on, we want to have nice things on Windows too! :-)

set JBANG_APP_JAVA_OPTIONS=-Xmx1g
jbang run JvmRuntimeOpts.java

--- 🚀 JVM Runtime Options (VM Arguments) ---
Option 1: -Xmx1g

--- 📝 Application Arguments ---
No Application Arguments were passed to the main method.

--- ✅ Execution Complete ---

@wfouche
Copy link
Contributor Author

wfouche commented Dec 6, 2025

should we do similar on other OS for consistent behaviour ?

Not needed on Linux or MacOS

@wfouche
Copy link
Contributor Author

wfouche commented Dec 6, 2025

Fixed jbang.ps1 - it now restores the code page to its original value.

@wfouche
Copy link
Contributor Author

wfouche commented Dec 6, 2025

jbang,cmd recursively calling itself ...... oh no.

if "!binaryPath!"=="" if "!jarPath!"=="" (
  if not exist "%JBDIR%\bin\jbang.jar" (
    powershell -NoProfile -ExecutionPolicy Bypass -NonInteractive -Command "%~dp0jbang.ps1 version" > nul
    if !ERRORLEVEL! NEQ 0 ( exit /b %ERRORLEVEL% )
  )
  call "%JBDIR%\bin\jbang.cmd" %*
  exit /b %ERRORLEVEL%
)

I'm going to rename jbang.cmd to _jbang.cmd and then create a new jbang.cmd with the folloing code:

@echo off

rem Save current code page
for /f "tokens=2 delims=:" %%a in ('chcp') do set "_OriginalCP=%%a"
set "_OriginalCP=%_OriginalCP: =%"

rem Enable UTF-8 code page
chcp 65001 > nul

call _jbang.cmd %*

rem Restore original code page
chcp %_OriginalCP% > nul

exit /b <n> to the rescue.

@wfouche
Copy link
Contributor Author

wfouche commented Dec 6, 2025

Only one test case fails on Windows:

	@Test
	public void shouldHandleSpecialCharacters() {
		assertThat(shell("jbang echo.java \" ~!@#$%^&*()-+\\:;\'`<>?/,.{}[]\"")).outIsExactly(
				"0: ~!@#$%^&*()-+\\:;'`<>?/,.{}[]" + lineSeparator());
	}

@wfouche wfouche marked this pull request as draft December 11, 2025 16:31
@wfouche wfouche marked this pull request as ready for review December 11, 2025 19:22
@wfouche
Copy link
Contributor Author

wfouche commented Dec 11, 2025

@maxandersen , the original code page is now restored when using CMD, PowerShell or Bash on (Cygwin or Git-Bash).

@maxandersen
Copy link
Collaborator

Given that Java in JEP 400 has standardized on UTF-8, JBang should ensure a seamless UTF-8 experience across platforms including Windows. It does not, at the moment.

I get what you are after. And I applaud it. But unfortunately JEP 400 does NOT ensure seamless UTF-8 experience when running java - it ensures UTF-8 will mostly works IF you are running in a UTF-8 environment...subtle but important difference. Its still up to user to set things up properly.

if [ $err -eq 255 ]; then
eval "exec $output"
if [[ "$os" == "windows" ]] && [[ "$JBANG_WIN_UTF8" != "false" ]]; then
bash -c "$output"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bash not exec here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the command to return to the curent shell , to be able to restore the code page setting at the end of the script.

eval "exec ..." does not return.

@wfouche
Copy link
Contributor Author

wfouche commented Dec 12, 2025

The alternative is to document what users should do to enable UTF-8 for each shell.

CMD - add the following registry key:

reg add "HKLM\Software\Microsoft\Command Processor" /v "Autorun" /t REG_SZ /d "@chcp 65001>nul" /f

Bash (Git-Bash or Cygwin) - add the following line to ~/.bashrc

chcp.com 65001 > /dev/null

PowerShell - add the following line to file $PROFILE

$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new($false)

This has the added benefit that programs run using java will also benefit from UTF-8 console support:

java -jar JvmRuntimeOpts-fatjar.jar

--- 🚀 JVM Runtime Options (VM Arguments) ---
No explicit JVM Runtime Options found (default settings are in use).

--- 📝 Application Arguments ---
No Application Arguments were passed to the main method.

--- ✅ Execution Complete ---

@maxandersen
Copy link
Collaborator

ok - first, I'm really appreciative of your efforts here @wfouche its been very educational ! But starting to do different execution flows and jump through hoops that potentially breaks user flows is just not worth the hassle.

Here is my suggestion:

  1. document the options you found - it would actually be nice to have that documented in one place rather than scattered all over the internet :)

  2. (optionally, if you want) add JBANG_WIN_UTF8 which if true on windows we'll just do the right thing.

That at least makes #1 have an option that is just set JBANG_WIN_UTF8=true and be done with it.

but I'm also fine just leave it at #1 as it is probably the best for users to just set those values or registry key anyways.

@wfouche wfouche marked this pull request as draft December 12, 2025 09:24
@quintesse
Copy link
Contributor

as it is probably the best for users to just set those values or registry key anyways.

Not sure I agree here, the actual thing they should do is set that beta feature in Windows:

image

Not messing about with registry keys because undoing those can be a nightmare.

@quintesse
Copy link
Contributor

Another option would be to use something like JNA and set the codepage from within JBang itself: https://github.com/java-native-access/jna/blob/master/contrib/platform/src/com/sun/jna/platform/win32/Wincon.java#L106

@maxandersen
Copy link
Collaborator

as it is probably the best for users to just set those values or registry key anyways.

Not sure I agree here, the actual thing they should do is set that beta feature in Windows:

image

Not messing about with registry keys because undoing those can be a nightmare.

Any indication setting those registry keys are anything different than toggling that registry key?

@wfouche
Copy link
Contributor Author

wfouche commented Dec 12, 2025

Any indication setting those registry keys are anything different than toggling that registry key?

image

Enabling this option sets three registry values permanently to 65001:

  • OEMCP - code page for console applications
  • ACP - code page for GUI applications
  • MACCP - the default Macintosh code page

They are all under registry key

  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage

Running chcp 65001 temporarily changes the value of OEMCP that is active for the process and not the OEMCP registry value itself.

@wfouche
Copy link
Contributor Author

wfouche commented Dec 12, 2025

New PR created to document how to enable console UTF-8 support on Windows.

@wfouche
Copy link
Contributor Author

wfouche commented Dec 13, 2025

Another option would be to use something like JNA and set the codepage from within JBang itself

Currently evaluating this option.

@wfouche
Copy link
Contributor Author

wfouche commented Dec 13, 2025

This program uses JNA to save the current Windows code page, changes it to 65001, and restores the original code page again before exiting. System.out has to be reinitialized for Unicode output to work.

///usr/bin/env jbang "$0" "$@" ; exit $?

//JAVA 25+

//DEPS net.java.dev.jna:jna:5.18.1
//DEPS net.java.dev.jna:jna-platform:5.18.1

import com.sun.jna.platform.win32.Kernel32;
import java.io.FileDescriptor;
import java.io.FileOutputStream;
import java.io.PrintStream;
import java.nio.charset.StandardCharsets;

void main(String... args) throws Exception {

    int CODEPAGE_UTF8 = 65001;

    if (!System.getProperty("os.name").toLowerCase().contains("windows")) {
        return;
    }

    // check if env var JBANG_WIN_UTF8 is true, otherwise return
    // Not yet implemented.
    
    // save current code page
    int currentOutputCP = Kernel32.INSTANCE.GetConsoleOutputCP();
    
    // set code page to 65001
    boolean setOutputOk = Kernel32.INSTANCE.SetConsoleOutputCP(CODEPAGE_UTF8);

    // remap stdout
    if (setOutputOk) {
        FileOutputStream fos = new FileOutputStream(FileDescriptor.out);
                
        PrintStream utf8Out = new PrintStream(
            fos, 
            true, 
            StandardCharsets.UTF_8.name()
        );
        
        // Replace the default System.out stream with the new UTF-8 stream
        System.setOut(utf8Out);
    }

    System.out.println("\nTesting UTF-8 output: \u2764");

    // Restore original code page
    Kernel32.INSTANCE.SetConsoleCP(currentInputCP);
    Kernel32.INSTANCE.SetConsoleOutputCP(currentOutputCP);

}

@wfouche
Copy link
Contributor Author

wfouche commented Dec 13, 2025

In the future if there ever is a need to move forward with this proposal, then the approach suggested by @quintesse to use JNA is the best way to implement the code page switching functionality.

@maxandersen
Copy link
Collaborator

so im trying using utf-8 on a windows vm now and i'm for some reasons NOT seeing any effect of chaning the codepage.

Still gets ?? in the output.

@maxandersen
Copy link
Collaborator

...and i can for the live of me not find that Beta setting anymore....im on Windows 11.

@maxandersen
Copy link
Collaborator

...and i can for the live of me not find that Beta setting anymore....im on Windows 11.

the internet lied to me. Its not under Settings but under Control Panel...stupid.

@maxandersen
Copy link
Collaborator

anyhow - just calling chcp did NOT work for me.

@maxandersen
Copy link
Collaborator

ok im just dumb - or rather windows is stupid :)

chcp 65001 works everywhere but powershell, on powershell you need [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new($false) ffs.

@maxandersen
Copy link
Collaborator

that jbang --utf8 idea is starting to grow on me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

jbang --help console output on Windows

3 participants