Opened 4 years ago

Last modified 4 years ago

#11609 new task

Document unicode report deviations

Reported by: thomie Owned by:
Priority: normal Milestone:
Component: Documentation Version: 7.10.3
Keywords: unicode, report-impact Cc: nomeata
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #1103, #4373, #7650, #10196, #11012 Differential Rev(s):
Wiki Page:

Description (last modified by thomie)

@nomeata mentions in #10196:

The report specifies “Haskell compilers are expected to make use of new versions of Unicode as they are made available.” So if we deviate from that, we should make sure that

  • the user’s guide explicitly lists all deviations from the report in this section, and
  • that the Haskell prime committee is going to be aware of these (sensible) deviations, so that they can become official.

Certain deviations are (there might be more):

  • OtherLetter are treated as lowercase (#1103), and thus allowed in identifiers.
  • ModifierLetter (#10196), OtherNumber (#4373) and NonSpacingMark (#7650) are allowed in identifiers, but only starting from the second character.
  • $decdigit = $ascdigit -- for now, should really be $digit (ToDo) (see compiler/parser/Lexer.x)

Change History (2)

comment:1 Changed 4 years ago by thomie

Description: modified (diff)

comment:2 Changed 4 years ago by rwbarton

Oh, I was going to comment on the subject of documentation, so I'll do it here. We should have a changelog entry about allowing combining characters in identifiers; and we should be clear about what kind of normalization we do to decide when a sequence involving combining characters is considered the same as a precomposed sequence. (I assume the answer is currently "none", but it would probably be nice to change that for 8.2. NFC normalization seems to be a popular choice for programming languages.)

Note: See TracTickets for help on using tickets.