Opened 9 years ago

Closed 9 years ago

#4373 closed feature request (fixed)

Lexer does not handle unicode numeric subscripts

Reported by: liamoc Owned by: simonmar
Priority: normal Milestone: 7.4.1
Component: Compiler (Parser) Version:
Keywords: lexer, unicode, tiny Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by igloo)

Hi all,

I would fix this myself but the GHC Lexer looks rather fragile and I'd be afraid of breaking something. I can have a crack at it and write a patch if you like.

Currently GHC rejects perfectly good unicode identifier characters (numeric subscripts):

For example, the following expression:

let v₂ = (+) in v₂ 1 3

gives:

lexical error at character '\8322'

The subscripts are in the "OtherNumber" general unicode category, so I'm pretty sure the main change is to Lexer.x, changing:

   OtherNumber           -> other_graphic 

To some other category (in the definition of alexGetChar).

The main issue I see here is that we can't just change "other_graphic" to "digit" - it would have to be like ' or _ rather than digit or it would become acceptable to use these for real numeric digits, which I don't think we want.

Seeing as I am not confident enough in GHC's lexer/parser structure to make these changes, I was wondering if anyone who is more experienced who has the time could do it.

Change History (6)

comment:1 Changed 9 years ago by simonmar

The change you suggest sounds reasonable. You want to make these legal characters in an identifier, but not legal in a numeric constant, which is exactly what happens if you categorise them as "digit". Numeric constants are already restricted to only contain decimal digits. Could you make a patch and attach it to this ticket?

comment:2 Changed 9 years ago by simonmar

Type: bugfeature request
Type of failure: GHC rejects valid programNone/Unknown

comment:3 Changed 9 years ago by igloo

Description: modified (diff)

comment:4 Changed 9 years ago by igloo

Milestone: 7.2.1

comment:5 Changed 9 years ago by simonmar

Owner: set to simonmar

comment:6 Changed 9 years ago by simonmar

Resolution: fixed
Status: newclosed

Fixed:

Mon Nov 15 09:54:44 GMT 2010  Simon Marlow <marlowsd@gmail.com>
  * Unicide OtherNumber category should be allowed in identifiers (#4373)
Note: See TracTickets for help on using tickets.