Ticket #57 (closed defect: fixed)

Opened 6 years ago

Last modified 2 years ago

haddock doubles ^Ms

Reported by: igloo Owned by:
Priority: major Milestone:
Version: Keywords:
Cc: ndmitchell@…

Description

Originally reported here: http://www.haskell.org/pipermail/cvs-ghc/2008-August/044392.html by Neil Mitchell.

We can skip the CPP part from his report if we start with this Foo.hs:

^M
module Foo where^M
^M
-- | > test a^M
test :: a^M
test = undefined^M

(CPP in Neil's message just ensures that there are ^Ms in the file)

Then running

haddock.exe --hoogle Foo

gives us a main.txt containing

-- Hoogle documentation, generated by Haddock^M
-- See Hoogle, http://www.haskell.org/hoogle/^M
^M
@package main^M
^M
module Foo^M
^M
-- | <pre>^M
--   test a^M^M
--   </pre>^M
test :: a^M

Note the double ^M.

I suspect that this happens because the file is read with something like lines (which leaves the ^M in the string) and then written with something like hPutStrLn (which appends ^M\n to the string). I don't know exactly which lines or hPutStrLn, but I think they're more likely to be in haddock than GHC, so I'm filing the bug here.

Change History

Changed 6 years ago by anonymous

  • cc ndmitchell@…, neil.mitchell.2@… added

Hoogle currently hacks around this bug after the fact, but its quite annoying to do, and does make it harder for other people to generate Hoogle databases. Following Ian's more accurate diagnosis, I think the Hoogle backend in Haddock should be able to work around the issue directly, although it is still probably a bug earlier in the Haddock code.

I think if the line in ppHoogle was changed from:

    writeFile (odir </> filename) (unlines contents)

To

    writeFile (odir </> filename) (filter (/= '\r') $ unlines contents)

Then it should fix Hoogle, although the underlying bug would still be present.

Changed 6 years ago by waern

  • cc neil.mitchell.2@… removed

Changed 6 years ago by waern

  • milestone set to 2.4.2

Changed 5 years ago by waern

  • status changed from new to closed
  • resolution set to fixed

The problem is that doc strings can contain CRLF line endings. Both the HTML and Hoogle backend are written with the assumption that line endings are represented by LF only (since this is the assumption taken by the Haskell String functions such as lines, unlines and even IO commands such as hPutStr which inserts CRs before LFs?). So we should let LF be our internal format and convert to it in the doc string lexer.

This patch for GHC should do it:

Sat Feb 28 15:53:51 CET 2009  David Waern <david.waern@gmail.com>
  * Filter out carriage returns in doc strings
  
  We want the internal format to contain LFs only. This makes it easier to work
  with the doc strings for clients of the GHC API.

Changed 2 years ago by anonymous

  • milestone 2.4.2 deleted

Milestone 2.4.2 deleted

Note: See TracTickets for help on using tickets.