Ticket #1284 (closed defect: fixed)

Opened 17 months ago

Last modified 16 months ago

UTF-8 marshaling fails on characters outside of the basic multilingual plane

Reported by: guest Owned by: dmwit
Priority: normal Milestone: 0.13.0
Component: general (Gtk+, Glib) Version: 0.12.4
Keywords: UTF-8 Cc: leech@…

Description

It appears that characters that require more than three bytes to encode in UTF-8 are not supported.

I'm trying to use pango to render music notation (symbols from block 1d100-1d1ff in plane 1). They are incorrectly converted to some other plane 0 character. Also, attempting to read back the text from a layout that was created using an improperly converted string results in an error in fromUTF.

Change History

Changed 17 months ago by dmwit

  • owner changed from somebody to dmwit
  • status changed from new to accepted

Well, I've fixed System.Glib.UTFString to handle four-byte characters, which fixes some things (notably labels with four-byte characters in them display correctly on my machine now), but it seems there's still some other bug to find somewhere. e.g. showText (and possibly other API calls) still don't work right. Still looking into this.

Changed 17 months ago by dmwit

Okay, this is very strange. showText is essentially just a call to withUTFString. I've confirmed (by writing a little C function to print out a block of memory) that withUTFString sends the right bytes in a small test program, but when I set a breakpoint on cairo_show_text in gdb it's getting some garbage instead. I'm tempted to say it's withCAString doing something sneaky with my locale (since the string we're handing it isn't technically ASCII -- it has some high bits set), but that doesn't jive with the right bytes coming out of my little test program.

Really weird!

Changed 16 months ago by dmwit

  • status changed from accepted to closed
  • resolution set to fixed

Oh, for !$#@ sake. Somebody had used copied the source of withUTFString from glib to cairo. Now both glib and cairo ship encoding and decoding out to utf8-string, so this should settle things once and for all.

Note: See TracTickets for help on using tickets.