Opened 12 years ago

Closed 10 years ago

#1886 closed bug (fixed)

GHC API should preserve and provide access to comments

Reported by: claus Owned by:
Priority: normal Milestone: 6.12 branch
Component: GHC API Version: 6.9
Keywords: GHC API, comments, program transformation, layout Cc: j.waldmann
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


one class of applications of the GHC API are program transformations (refactoring, source to source optimisation, partial evaluation, ..) and code layouters (pretty-print, 2html, syntax-colouring, ..). but, even ignoring layout, parsing and pretty-printing with the GHC API does not currently preserve the source (nor does it generate syntactically valid code..).

consider this simple test: we want to parse a module, then pretty-print it (we might want to adjust the layout, or switch between layout and explicit braces). applying the attached code to itself gives this result:

$ /cygdrive/c/fptools/ghc/compiler/stage2/ghc-inplace -package ghc -e main API_Layout.hs
module API where
import DynFlags
import GHC
import PprTyThing
import System.Process
import System.IO
import Outputable
import Data.Maybe
instance Num () where
    { fromInteger = undefined }
mode = CompManager
compileToCoreFlag = False
writer >| cmd = runInteractiveCommand cmd >>= \ (i, o, e, p) -> writer i
cmd |> reader = runInteractiveCommand cmd >>= \ (i, o, e, p) -> reader o
ghcDir = "c:/fptools/ghc/compiler/stage2/ghc-inplace --print-libdir"
         (fmap dropLineEnds . hGetContents)
           dropLineEnds = filter (not . (`elem` "\r\n"))
main = defaultErrorHandler defaultDynFlags
     $ do s <- newSession . Just =<< ghcDir
          flags <- getSessionDynFlags s
          (flags, _) <- parseDynamicFlags flags ["-package ghc"]
            GHC.defaultCleanupHandler flags
          $ do setSessionDynFlags s (flags {hscTarget = HscInterpreted})
                 addTarget s =<< guessTarget "API_Layout.hs" Nothing
               load s LoadAllTargets
               prelude <- findModule s (mkModuleName "Prelude") Nothing
               usermod <- findModule s (mkModuleName "API") Nothing
               setContext s [usermod] [prelude]
               Just cm <- checkModule s (mkModuleName "API") compileToCoreFlag
               unqual <- getPrintUnqual s
                   printForUser stdout unqual $ ppr $ parsedSource cm

this has lost all comments, including pragmas, and is syntactically invalid!

one suggestion, to avoid upsetting the rest of ghc, would be to preserve the comments, with source locations, but to separate them from the main abstract syntax tree. there would also need to be a way to link ast fragments to comments, which might be slightly awkward. perhaps something like:

-- was there a comment just preceeding the current AST fragment?
commentsBefore :: AST -> Maybe String
-- was there a comment immediately following the current AST fragment?
commentsAfter :: AST -> Maybe String

Attachments (1)

API_Layout.hs (1.6 KB) - added by claus 12 years ago.
a module parsing and pretty-printing itself via the GHC API

Download all attachments as: .zip

Change History (11)

Changed 12 years ago by claus

Attachment: API_Layout.hs added

a module parsing and pretty-printing itself via the GHC API

comment:1 Changed 12 years ago by claus

Keywords: GHC API comments program transformation layout added

i forgot one important example of program transformations that would also need layout preservation: version updates to follow library api changes. i think someone once started a business with this kind of thing?-)

related ticket: #1467 (api reorganisation of stages)

comment:2 Changed 12 years ago by igloo

difficulty: Unknown
Milestone: 6.10 branch

comment:3 Changed 11 years ago by claus

see also this thread on cvs-ghc, messages before and after this one:

should haddock.ghc be a sub-repo of ghc?

comment:4 Changed 11 years ago by j.waldmann

Cc: j.waldmann added

comment:5 Changed 11 years ago by claus

see also this thread for a simpler breakdown of what is needed, and how it might be achieved:

comment:6 Changed 11 years ago by Jedai

My proposal is to support access to a special kind of token stream including comments. As the tokens themselves aren't enough to get back to the source that produced them (some aesthetic details disappear), I also create a function to add source strings to the tokens in a stream and a function to show such a "rich" token stream. HaRe use the following model : get the AST and the token stream >>> modify AST &&& propagate changes to token stream >>> second (pretty print the token stream).

While this model may not be as convenient as we could hope for, it works and the guts of this process could eventually become a package on Hackage, separate from HaRe.

comment:7 Changed 11 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:8 Changed 11 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:9 Changed 10 years ago by igloo

Milestone: 6.10 branch6.12 branch

comment:10 Changed 10 years ago by simonmar

Resolution: fixed
Status: newclosed

We now have

getRichTokenStream :: GhcMonad m => Module -> m [(Located Token, String)]
showRichTokenStream :: [(Located Token, String)] -> String

amongst other thing, thanks to Jedai. If this isn't enough, please re-open.

Note: See TracTickets for help on using tickets.