Ticket #78 (closed defect: fixed)

Opened 5 years ago

Last modified 4 years ago

control characters in prompt mess up width calculations

Reported by: judah Owned by: judah
Priority: minor Milestone: 0.6.3
Version: 0.6 Keywords:
Cc: lordrat@…

Description

Control characters in the prompt will make it look longer to Haskeline than it actually is.

Of course, that sort of thing won't work on Windows, so maybe there's a better solution.

Change History

Changed 5 years ago by phercek

I use ANSI escape sequences for coloring and this seems to work in simple cases. But it is incorrect when I search history. I would like to see this working well.

How can one compute the length of control codes in general?

If there is no general rule then maybe the used sequences of control codes can be defined in .haskeline. Each sequence would have defined there how much space it takes on the terminal. This would make it compatible with windows users (which just would not define any sequence).

Changed 5 years ago by judah

In general, any escape sequence for bold or color should have zero length in the terminal.

In bash, to signify that a control sequence in the prompt has zero width you surround it with \[ and \]. In readline itself, you surround it with \001 and \002. Maybe it would be easiest for Haskeline to do something similar?

Changed 5 years ago by phercek

Right, \[ and \] look better. It would work only for zero length control codes but that is enough. I like it more than defining the length in .haskeline.

Changed 5 years ago by judah

What Haskeline program(s) are you using color escape sequences with? I wasn't able to get them working in ghci.

As it turns out, '\[' is a bash-ism which is translated into '\001' before the prompt is passed to readline. Since Haskeline's prompt is (currently) just a String, it probably also needs to use ugly surrounding characters like readline does. Programs which call Haskeline can provide the nicer UI themselves.

Changed 5 years ago by phercek

Actually, I use only ghci with haskeline now. But it is not the stock ghci. I have about 5 private patches to it and one of them enables escape sequences in the prompt. It is a really ugly hack which uses some internal features of showSDoc functionality but does not modify showSDoc itlself. I was told rather not to modify showSDoc itslef, but I wanted this, so I just went around for my ghci instance. There is one more thing which annoys me with ghci SDoc: it assumes my terminal has 80 columns even when it has much more. But I did not do anything with this yet.

Personally I do not really care whether it is \[ and \] or \001 and \002 (e.g. zsh uses %{ and %}). The first allows us not to meddle with what user wants to pass to the actual terminal (i.e. user can pass \001 at the places he wants and not at the places we force him). But it also means we need additional escaping for \. So if somebody would want to use \ in the prompt it would need to be passed as \\. If a haskeline client program would use \ as its escaping character too, then it would mean user would need to specify haskeline \[ (\]) as \\[ (\\]) so that the client program does not process them. This is not a problem with ghci itslef which uses % as ecape character (so it is like zsh).

On the other side \001 is start of heading (SOH) and \002 is start of text (STX) in ASCII. So this choice makes sense. If haskeline is going to interpret escape sequences (by correctly recognizing their length on ansi terminals at least) it can intepret also SOH and STX. They do not have any width themselves on my terminal. So SOH/STX looks like a better way. Maybe this is even the intended way how to recognize the width of escape sequences and control codes. It would be good to verify it with the ANSI standard (to check whether this is really the intended use) but I do not have access to it so I would rather do what others (readline) do. Maybe they verified it, and if not at least we are compatible with them. Things would be easier to port.

So my summary:
* \[(\]) is more flexible and in a way more compatible with systems not using SOH/SOX "standard" (we do not assign meaning to SOH/STX - we do not meddle with what user wants to send to the terminal).
* SOH/STX is more compatible with current systems (known to use ansi sequences).
Without additional info I vote lets do what readline does (SOH/STX). I'm a linux user and I doubt windows users mind we force SOH/STX ussage at them. They will not use any escape sequences anyway since they do not work from the time MS stopped to provide ansi.sys. This way it will easily work on linux quickly and on Windows we can add our own interpretation of ansi ecape-sequences later (to translate them to the correct console api calls).
Hey, and if windows users really care now, they can get support for ansi escape sequences with even for 32 bit programs (where ansi.sys does not work) with ansicon. I used it myself working on windows. Well I had it patched because it had bugs which presented itself with ghci. When patched it works. I can provide the patch if anybody cares.

If you want to try it with ghci, here is the ugly hack I use:

diff -rN -u old-ghc/ghc/InteractiveUI.hs new-ghc/ghc/InteractiveUI.hs
--- old-ghc/ghc/InteractiveUI.hs	2009-02-25 21:54:08.879780180 +0100
+++ new-ghc/ghc/InteractiveUI.hs	2009-02-25 21:54:08.999779327 +0100
@@ -531,7 +531,7 @@
         f [] = empty
    --
   st <- getGHCiState
-  return (showSDoc (f (prompt st)))
+  return $ read $ '"' : ( showSDoc (f $ prompt st) ++ "\"" )
 
 
 queryQueue :: GHCi (Maybe String)

So far the hack did not make me any troubles.

Changed 4 years ago by guest

  • cc lordrat@… added

Example for colored GHCi prompt:

:set prompt "\27[32;1m:mod +\27[0m %s\27[32;1m>\27[0m "

And this one even sets window title:

:set prompt "\27]0;GHCi: %s\7\27[32;1m:mod +\27[0m %s\27[32;1m>\27[0m "

(\[\] nor \SOH\STX escaping was applied because neither of them works)

And, as you might expect (as I'm writing here) they are not working ... properly ;-). (And it wasn't short way to find out why... I thought that GHCi is not properly handling things around libreadline... how big was my surprise when I realized that there was no spoon^Wlibreadline...)

I tried it this way:

hunk ./System/Console/Haskeline/LineState.hs 109
 stringToGraphemes = mkString . dropWhile isCombiningChar
     where
         mkString [] = []
+        mkString ('\SOH':cs) = mkString (dropWhile (/='\STX') cs)
+        mkString ('\STX':cs) = mkString cs
         mkString (c:cs) = Grapheme c (takeWhile isCombiningChar cs)
                                 : mkString (dropWhile isCombiningChar cs)

but, as you see, it's nasty hack. However, I can't see simple and straightforward solution using current Grapheme approach (I tried another hack — prepending control sequence to next grapheme, but it had some issues(=wasn't working properly) too and there was unresolved special case — string containing nothing but escaped seqences... there was no grapheme to prepend it to...).

I'm looking forward to any progress on this ticket :-).

Changed 4 years ago by phercek

Heh, I used a different hack to get it working with ghci. One part of it is the patch to ghc itself I posted in my previous comment. The second part is a patch to haskeline itself. The patch is here:

--- ghc-6.12.org/libraries/haskeline/System/Console/Haskeline/Backend/Terminfo.hs	2009-09-24 00:35:18.000000000 +0200
+++ ghc-6.12/libraries/haskeline/System/Console/Haskeline/Backend/Terminfo.hs	2009-10-12 14:26:57.000000000 +0200
@@ -213,10 +213,10 @@
     w <- asks width
     TermPos {termRow=r,termCol=c} <- get
     let roomLeft = w - c
-    if length str < roomLeft
+    if visibleLen str < roomLeft
         then do
                 posixEncode (graphemesToString str) >>= output . text
-                put TermPos{termRow=r, termCol=c+length str}
+                put TermPos{termRow=r, termCol=c+visibleLen str}
                 return []
         else do
                 let (thisLine,rest) = splitAt roomLeft str
@@ -228,14 +228,24 @@
 drawLineDiffT :: LineChars -> LineChars -> DrawM ()
 drawLineDiffT (xs1,ys1) (xs2,ys2) = case matchInit xs1 xs2 of
     ([],[])     | ys1 == ys2            -> return ()
-    (xs1',[])   | xs1' ++ ys1 == ys2    -> changeLeft (length xs1')
-    ([],xs2')   | ys1 == xs2' ++ ys2    -> changeRight (length xs2')
+    (xs1',[])   | xs1' ++ ys1 == ys2    -> changeLeft (visibleLen xs1')
+    ([],xs2')   | ys1 == xs2' ++ ys2    -> changeRight (visibleLen xs2')
     (xs1',xs2')                         -> do
-        changeLeft (length xs1')
+        changeLeft (visibleLen xs1')
         printText (xs2' ++ ys2)
-        let m = length xs1' + length ys1 - (length xs2' + length ys2)
+        let m = visibleLen xs1' + visibleLen ys1 - (visibleLen xs2' + visibleLen ys2)
         clearDeadText m
-        changeLeft (length ys2)
+        changeLeft (visibleLen ys2)
+
+visibleLen :: [Grapheme] -> Int
+visibleLen str = countVisibleChars (map baseChar str) 0
+  where
+    countVisibleChars ( '\SOH' : rest ) cnt = countInvisibleChars rest cnt
+    countVisibleChars ( _ : rest ) cnt = countVisibleChars rest (cnt+1)
+    countVisibleChars [] cnt = cnt
+    countInvisibleChars ( '\STX' : rest ) cnt = countVisibleChars rest cnt
+    countInvisibleChars ( _ : rest ) cnt = countInvisibleChars rest cnt
+    countInvisibleChars [] cnt = cnt
 
 linesLeft :: Layout -> TermPos -> Int -> Int
 linesLeft Layout {width=w} TermPos {termCol = c} n

My ghci prompt is
:set prompt "\SOH\ESC[1;32m\STX%s>\SOH\ESC[0m\STX "

Together these two patches work pretty well. I'm not even sure whether they ever misbehave. I think they do misbehave but it is so rare that I do not care (and even do not remember when it does happen).

Both patches are probably just ugly hacks. Definitely so for the ghci prompt patch. The haskeline patch is possibly not so ugly but I do not know since I do not know haskeline design. I did whatever I could to get it kind of working in the least amount of time. I was not bothering to get good solution, I only wanted mostly working solution.

I'll probably will not mess with this till I'll be upgrading to ghc 6.14. In such a case, if this ticket is not resolved by then, I'll just port my patches forward. I would hate to return back to non-colored ghci prompt.

Changed 4 years ago by judah

  • owner set to judah
  • status changed from new to assigned

Thanks for your suggestions. I originally balked at adding what's essentially a platform-specific hack; but you're both right that it's better to at least work with the above cases. So I've applied the following patch:

Sat Aug 21 09:18:37 PDT 2010  Judah Jacobson <judah.jacobson@gmail.com>
  * #78: If "\ESC...\STX" or "\SOH\ESC...\STX" appears in the prompt, treat it as a zero-width grapheme.
    hunk ./System/Console/Haskeline/LineState.hs 109
    +        -- Minor hack: "\ESC...\STX" or "\SOH\ESC...\STX", where "\ESC..." is some
    +        -- control sequence (e.g., ANSI colors), is represented as a grapheme
    +        -- of zero length with '\ESC' as the base character.
    +        -- Note that this won't round-trip correctly with graphemesToString.
    +        -- In practice, however, that's fine since control characters can only occur
    +        -- in the prompt.
    +        mkString ('\SOH':cs) = stringToGraphemes cs
    +        mkString ('\ESC':cs) | (ctrl,'\STX':rest) <- break (=='\STX') cs
    +                    = Grapheme '\ESC' ctrl : stringToGraphemes rest

Please let me know if that doesn't work for you, or if the width calculations don't behave correctly.

I'll document this on the wiki before releasing the next version.

Changed 4 years ago by guest

Really good (& fast!) work, thanks :-).

I found another length-calculating problem: It should be calculated only from last line. See example:

"Cool prompt on next line:\nPrompt> "

(I know that this is prompt misuse (or even abuse), but it is possibility O:-).)

(Don't know whether this issue fits this ticket, sorry if not.)

-- lordrat

Changed 4 years ago by judah

Can you please open a new ticket for that issue?

I'm not sure what the right behavior is for that situation in all of the different cases; it would be good to have a separate place to discuss it.

Changed 4 years ago by guest

With pleasure :-). #111

Changed 4 years ago by judah

  • status changed from assigned to closed
  • resolution set to fixed
  • milestone set to 0.6.3

Fixed for 0.6.3, and documented in ControlSequencesInPrompt.

Note: See TracTickets for help on using tickets.