Ticket #191 (new defect)

Opened 3 years ago

Last modified 2 years ago

Incorrect handling of character references

Reported by: selinger Owned by:
Priority: minor Milestone:
Version: 2.9.4 Keywords:
Cc:

Description

In Haddock, a character reference such as ü is used to represent non-ASCII characters, such as the German umlaut "u.

However, this does not work in the following situations:

* if the character appears in italics, * if the character appears in a code block with ">", * if the character appears in a URL.

Moreover, if such a character appears in a Haskell identifier between single quotes, the character is rendered correctly, but the word is not recognized as a Haskell identifier (and therefore the surrounding quotes are copied to the output and the identifier not linked).

See the attached file for examples.

Here are some comments on how I think it could be fixed. In my opinion, the best way to handle the ü syntax would be to treat it as an input encoding, i.e., handle it at the I/O level, before any lexing and parsing is done by Haddock proper. In other words, the sequence ü should be treated as if it were a single character literally present in the input file.

If it were done this way, then one could use the ü in *every* context, and one could even use escapes to represent actual ASCII characters, for example, & to represent a literal "&". Thus, if the sequence of 6 characters ü had to appear literally in a comment, one could type it as ü - although &\#252; would achieve the same result in a simpler way.

Attachments

Test.hs (0.9 kB) - added by selinger 3 years ago.
Sample module

Change History

Changed 3 years ago by selinger

Sample module

Changed 2 years ago by anonymous

  • milestone 2.10.0 deleted

Milestone 2.10.0 deleted

Note: See TracTickets for help on using tickets.