Opened 6 years ago

Last modified 4 years ago

#8730 new bug

Invalid Unicode Codepoints in Char

Reported by: mdmenzel Owned by:
Priority: low Milestone:
Component: Core Libraries Version: 7.6.3
Keywords: unicode Cc: batterseapower, core-libraries-committee@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

The surrogate range in Unicode is supposed to (as of Unicode 2.0, 1996) be a range of invalid code points yet, Data.Char allows the use of values in this range (in fact, it even gives them their own GeneralCategory).

Change History (2)

comment:1 Changed 5 years ago by thomie

Cc: batterseapower core-libraries-committee@… added
Component: CompilerCore Libraries
Owner: set to ekmett

Thank you for the report. I am just adding some references.

Prelude Data.Char> all ((==) Surrogate . generalCategory) ['\xdc80' .. '\xdfff']
True

In commit dc58b7398910a433259a6c0f58a0d05a48555191:

Author: Max Bolingbroke <>
Date:   Sat May 14 22:50:46 2011 +0100

    Big patch to improve Unicode support in GHC. Validated on OS X and Windows, this
    patch series fixes #5061, #1414, #3309, #3308, #3307, #4006 and #4855.

This commit adds checks like ... if isSurrogate c then done InvalidSequence ir ow else do ... to GHC/IO/Encoding/UTF{8|16|32}.hs

comment:2 Changed 4 years ago by thomie

Owner: ekmett deleted
Note: See TracTickets for help on using tickets.