Ticket #118 (closed defect: fixed)

Opened 5 years ago

Last modified 22 months ago

parsing of multiple URLs (greedy matching?)

Reported by: kowey Owned by:
Priority: major Milestone:
Version: 2.4.2 Keywords:
Cc:

Description

This bug is actually against Cabal/hackage (see http://hackage.haskell.org/trac/hackage/ticket/569), but I'm cross-posting here in case it's still relevant to haddock. Hopefully this is just a matter of marking it already closed.

It appears that strings with more than one URL in them are parsed incorrectly. For example,

See <http://www.mediawiki.org/wiki/API> and <http://haskell.forkio.com/>

gets treated as though there was only one URL. I get the impression that there's some kind of greedy matching going on, like a "<.*>" in regexp terms.

This manifests as funny looking output on hackage, eg. http://hackage.haskell.org/package/mediawiki

Change History

Changed 5 years ago by duncan

Replying to EricKow:

I get the impression that there's some kind of greedy matching going on, like a "<.*>" in regexp terms.

Indeed, in the lexer:

  \<.*\>         { strtoken $ \s -> TokURL (init (tail s)) }
  \<\<.*\>\>     { strtoken $ \s -> TokPic (init $ init $ tail $ tail s) }
  \#.*\#         { strtoken $ \s -> TokAName (init (tail s)) }

For emphasis like /blah/ it uses:

  \/ [^\/]* \/   { strtoken $ \s -> TokEmphasis (init (tail s)) }

The same trick should work for the three cases above. Note that the same code is used in ghc, haddock-0.x, hackage-scripts and hackage-server.

Changed 4 years ago by waern

  • status changed from new to closed
  • resolution set to fixed
  • milestone changed from 2.5.0 to 2.8.0

Changed 22 months ago by anonymous

  • milestone 2.8.0 deleted

Milestone 2.8.0 deleted

Note: See TracTickets for help on using tickets.