Ticket #191 (new defect)
Incorrect handling of character references
| Reported by: | selinger | Owned by: | |
|---|---|---|---|
| Priority: | minor | Milestone: | |
| Version: | 2.9.4 | Keywords: | |
| Cc: |
Description
In Haddock, a character reference such as ü is used to represent non-ASCII characters, such as the German umlaut "u.
However, this does not work in the following situations:
* if the character appears in italics, * if the character appears in a code block with ">", * if the character appears in a URL.
Moreover, if such a character appears in a Haskell identifier between single quotes, the character is rendered correctly, but the word is not recognized as a Haskell identifier (and therefore the surrounding quotes are copied to the output and the identifier not linked).
See the attached file for examples.
Here are some comments on how I think it could be fixed. In my opinion, the best way to handle the ü syntax would be to treat it as an input encoding, i.e., handle it at the I/O level, before any lexing and parsing is done by Haddock proper. In other words, the sequence ü should be treated as if it were a single character literally present in the input file.
If it were done this way, then one could use the ü in *every* context, and one could even use escapes to represent actual ASCII characters, for example, & to represent a literal "&". Thus, if the sequence of 6 characters ü had to appear literally in a comment, one could type it as ü - although &\#252; would achieve the same result in a simpler way.
