Opened 8 years ago

Closed 6 years ago

Last modified 6 years ago

#5599 closed bug (fixed)

msys has bad Unicode support

Reported by: simonpj Owned by:
Priority: normal Milestone: 7.4.1
Component: Compiler Version: 7.2.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case: llibraries/tests/IO/T3307, environment001
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Tests 3307 environment001 pass on Cygwin, Linux, fail on msys:

>    lib/IO                        3307 [bad exit code] (normal)
>    lib/IO                        environment001 [bad stdout] (normal)

Here is Max's diagnosis:

Basically, msys has kind of bad Unicode support. If you write a program "len.c" like this:

#include <windows.h>
#include <stdio.h>
#include <string.h>

int main(int _argc, char **_argv) {
	LPWSTR cmdLine = GetCommandLineW();

	int argc;
	LPWSTR *argv = CommandLineToArgvW(cmdLine, &argc);

	printf("%d args, %d wide chars in first arg\n", argc, wcslen(argv[1]));
	return 0;
}

Create a UTF-8 encoded file called "utf8" containing two characters:

不好

And then execute it like so:

gcc len.c && ./a.exe $(cat utf8)

(NB: it is irrelevant whether you use Cygwin gcc or msys gcc: this is an issue with the shells)

You get different results on msys and Cygwin:

  • On Cygwin, you get 2 wide characters in the first argument; i.e. the UTF-16 encoded Chinese text
  • On msys, you get 6 wide characters in the first argument; i.e. one 16-byte value for every byte in the UTF-8 encoded Chinese text

IMHO the msys behaviour is broken because the command line arguments supplied via the Windows API are meant to be UTF-16. It does match the behaviour of Windows cmd if you do this:

set /p myvar= < utf8
a.exe %myvar%

(You get "6 wide characters" printed)

Perhaps the issue in cmd stems from the fact that the Windows console is stuck in code page 850 and doesn't support the UTF-8 "code page". But msys really has no excuse since it reports itself as being UTF-8.

I'm not sure what to do here because I don't think our code actually has a problem, and the test does pass (and check something useful) in Linux, OS X and Cygwin. But still, something is not working quite right here. Perhaps just mark it as expect-fail in msys?

Change History (9)

comment:1 Changed 8 years ago by igloo

Milestone: 7.4.1
Owner: set to igloo

comment:2 Changed 8 years ago by igloo

Resolution: fixed
Status: newclosed

Fixed by:

commit f6f20381c8064f7f98f9b9ab082e5ad65c132be9
Author: Ian Lynagh <igloo@earth.li>
Date:   Sun Nov 27 16:36:55 2011 +0000

    Expect 3307 and environment001 to fail on msys; fixes trac #5599

    Unicode support on MSYS seems to be broken.

comment:3 Changed 6 years ago by ezyang

difficulty: Unknown
Owner: igloo deleted
Resolution: fixed
Status: closednew

I was running the validate testsuite on MSYS2, and it looks like in this case, these two tests pass. So maybe we should have a further test for MSYS versus MSYS2.

comment:4 Changed 6 years ago by ezyang

It also looks like T4006 is in a similar boat.

comment:5 Changed 6 years ago by simonpj

Resolution: fixed
Status: newclosed
Test Case: llibraries/tests/IO/T3307, environment001

MSYS2 is way better than MSYS, so I think we should just adopt it. I'm making it an expected pass.

Simon

comment:6 in reply to:  5 ; Changed 6 years ago by schyler

Replying to simonpj:

MSYS2 is way better than MSYS, so I think we should just adopt it. I'm making it an expected pass.

Simon

#8842 would be a good thing to check before making it default.

comment:7 in reply to:  6 ; Changed 6 years ago by awson

Replying to schyler:

#8842 would be a good thing to check before making it default.

You shall *never* install msys2 runtime headers, libraries and development tools. You should only install msys2-base distribution and mingw(32|64)-* headers, libraries and development tools.

You also shall start msys2 shell with mingw(32|64)_shell.bat script included into distribution.

This way you receive (native 32 or 64 bit windows) mingw-w64 runtime based development environment with no remnants of Msys2 runtime.

Last edited 6 years ago by awson (previous) (diff)

comment:8 in reply to:  7 Changed 6 years ago by simonpj

Replying to awson:

You shall *never* install msys2 runtime headers, libraries and development tools. You should only install msys2-base distribution and mingw(32|64)-* headers, libraries and development tools.

You also shall start msys2 shell with mingw(32|64)_shell.bat script included into distribution.

This way you receive (native 32 or 64 bit windows) mingw-w64 runtime based development environment with no remnants of Msys2 runtime.

You clearly know something important here! Our Windows wiki page says only "install msys2". Might you elaborate it to be a bit more specific, and explain why various things are important? Perhaps a section on that page about msys2?

Thanks

Simon

comment:9 Changed 6 years ago by ezyang

It's a bit of a kerfuffle. I think what should be done is the current contents of that wikipage be archived (with a big banner saying they are out of date), and wiki:Building/Preparation/Windows/MSYS2 moved to replace it

Note: See TracTickets for help on using tickets.