Opened 13 months ago

Last modified 9 months ago

#15617 new bug

Unboxed tuples/sum error message on `a = show 5` in expression evaluation and interactive modes

Reported by: ChaiTRex Owned by: JulianLeviston
Priority: normal Milestone: 8.6.1
Component: Compiler Version: 8.6.1-beta1
Keywords: Cc: simonpj, RolandSenn
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Poor/confusing error message Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by ChaiTRex)

With GHC 8.4.3 (on both Ubuntu 16.04.5 and Ubuntu 18.04.1) and recent GHC (not HEAD but close at ff29fc84c03c800cfa04c2a00eb8edf6fa5f4183 on Ubuntu 16.04.5), I get errors for a = show 5.

I run the following commands, showing that show 5 is usually fine:

$ ghc -fobject-code -O2 -e 'show 5'
"5"

$ ghc -fobject-code -O2 -e 'let a = show 5 in a'
"5"

But not with a = show 5:

$ ghc -fobject-code -O2 -e 'a = show 5'
<interactive>: Error: bytecode compiler can't handle unboxed tuples and sums.
  Possibly due to foreign import/export decls in source.
  Workaround: use -fobject-code, or compile this module to .o separately.

Running with ghci gives the same error:

$ ghci -fobject-code -O2
GHCi, version 8.4.3: http://www.haskell.org/ghc/  :? for help
Prelude> a = show 5
Error: bytecode compiler can't handle unboxed tuples and sums.
  Possibly due to foreign import/export decls in source.
  Workaround: use -fobject-code, or compile this module to .o separately.
Prelude> 
Leaving GHCi.

Both errors stop when optimization is turned off:

$ ghc -fobject-code -O0 -e 'a = show 5'

$ ghci -fobject-code -O0
GHCi, version 8.4.3: http://www.haskell.org/ghc/  :? for help
Prelude> a = show 5
Prelude> 
Leaving GHCi.

Change History (17)

comment:1 Changed 13 months ago by ChaiTRex

Description: modified (diff)

comment:2 Changed 12 months ago by osa1

Cc: simonpj added

This is because the simplifier introduces a call to a worker function that returns an unboxed tuple.

This is the original expression:

-- RHS size: {terms: 3, types: 1, coercions: 0, joins: 0/0}
a :: String
[LclIdX]
a = show @ Integer GHC.Show.$fShowInteger 5

The simplifier transforms this to

-- RHS size: {terms: 4, types: 1, coercions: 0, joins: 0/0}
a :: String
[LclIdX,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 140 0}]
a = GHC.Show.$fShowInteger_$cshowsPrec
      GHC.Show.$fShow(,)1 5 (GHC.Types.[] @ Char)

and then

-- RHS size: {terms: 9, types: 11, coercions: 0, joins: 0/0}
a :: String
[LclIdX,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 160 30}]
a = case GHC.Show.$w$cshowsPrec4 0# 5 (GHC.Types.[] @ Char) of
    { (# ww3_a1Xd, ww4_a1Xe #) ->
    GHC.Types.: @ Char ww3_a1Xd ww4_a1Xe
    }

Here the scrutinee returns an unboxed tuple, and we can't compile this expression to bytecode.

Some ideas:

  • Ignore optimization settings in GHCi and always compile things with all optimizations disabled, to avoid introducing unboxed tuples and sums as a result of inlining worker functions etc.
  • Somehow teach simplifier to not introduce unboxed tuples/sums (seems like too much work, we may be able to implement unboxed tuple/sum support with same effort).
  • Implement unboxed tuple and sum support for GHCi.
  • Improve error message and mention that this may happen as a result of optimizations (maybe only show this if optimizations are enabled).

Any other ideas?

comment:3 Changed 12 months ago by simonpj

Why doesn't this happen all the time in GHCi?

comment:4 Changed 12 months ago by osa1

So monoidal pointed out to me about this note

GHCi and -O
---------------

When using optimization, the compiler can introduce several things
(such as unboxed tuples) into the intermediate code, which GHCi later
chokes on since the bytecode interpreter can't handle this (and while
this is arguably a bug these aren't handled, there are no plans to fix
it.)

While the driver pipeline always checks for this particular erroneous
combination when parsing flags, we also need to check when we update
the flags; this is because API clients may parse flags but update the
DynFlags afterwords, before finally running code inside a session (see
T10052 and #10052).

I think this says that optimisation flags should be ignored by ghci. Indeed they normally are:

~ $ ghci -O2

when making flags consistent: warning:
    -O conflicts with --interactive; -O ignored.
GHCi, version 8.4.3: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/omer/rcbackup/.ghci

The problem is when we also add -fbyte-code then I guess the consistency check does not work as expected and accepts -O2:

~ $ ghci -fobject-code -O2
GHCi, version 8.4.3: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/omer/rcbackup/.ghci
λ:1> a = show 5
Error: bytecode compiler can't handle unboxed tuples and sums.
  Possibly due to foreign import/export decls in source.
  Workaround: use -fobject-code, or compile this module to .o separately.

So this is probably just a matter of fixing the flag consistency check (whatever that is).

Last edited 12 months ago by osa1 (previous) (diff)

comment:5 Changed 12 months ago by osa1

Owner: set to monoidal

The bug should be in DynFlags.makeDynFlagsConsistent. monoidal will submit a patch for this.

comment:6 Changed 12 months ago by osa1

Why doesn't this happen all the time in GHCi?

This happens all the time when you combine -fobject-code and -O2, so I guess people don't combine these too much.

comment:7 Changed 12 months ago by JulianLeviston

Owner: changed from monoidal to JulianLeviston

Trying this as my first ticket.

comment:8 Changed 12 months ago by JulianLeviston

I _think_ the problem is setting the fobject-code flag means HscInterpreted won't ever get set, which means when checkOptLevel gets called at the bottom of makeDynFlagsConsistent, the error doesn't trigger, because of an explicit check for HscInterpreted.

Last edited 12 months ago by JulianLeviston (previous) (diff)

comment:9 Changed 12 months ago by osa1

Cc: RolandSenn added

I think you're right. This is another example of problems caused by not having the information of whether we're in interpreter or not in the compiler (we only have information derived from the flags but that information can't tell reliably whether we're in interpreter, as in this ticket). @RolandSenn has a good summary of the problem in Phab:D5122 which also links to a few other issues caused by this problem. Perhaps we should do the refactoring to add a field to DynFlags about whether we're in interepreter.

If we do this then maybe we should revisit some of the fixed tickets and fix them "properly" by using the new field.

comment:10 Changed 12 months ago by JulianLeviston

Right. It sounds like refactoring is a good idea for this. In terms of HscTarget, it feels like HscInterpreted is possibly not named correctly? The other three seem like actual targets: HscC, HscAsm, HscLlvm.

That is, how can "Interpreted" be a target? A target is the resultant output of a process. Interpreted is a method of obtaining a target. That is, the others seems like target languages (C, assembly, LLVM bytecode). What is Interpreted in that context? is it trying to say Haskell Bytecode? (I'm not even sure what that *is*).

This is just according to my current understanding of what's going on, which is pretty shallow.

On the other hand, DynFlags already seems quite large. Is it the right place to determine what mode the compiler is in? I'd be guided by others' advice here because I'm so green.

comment:11 Changed 12 months ago by osa1

As you say, HscInterpreted means modules are compiled to bytecode and then interpreted. Interpreter can interact with native code and with -fobject-code you tell GHCi to compile the loaded modules to native code rather than to bytecode (the default, or -fbyte-code).

Either way the expressions you type in the GHCi prompt are compiled to bytecode and interpreted, so I think those options are only applied to the loaded modules. We need to avoid optimising those expressions. I think another (simpler) way for this might be to find the top-level function for compiling GHCi expressions to bytecode, and override relevant DynFlag fields there so that down the line the desugarer and simplifier do not optimise it. That means no new field to DynFlags so think it would be even better. If the top-level function to compile a GHCi expression/statement is also used for other purposes perhaps we can introduce a new top-level function for GHCi only, and override relevant DynFlags fields there.

How does that sound? Sounds better to me as we don't add more to DynFlags (which is already huge).

comment:12 Changed 12 months ago by JulianLeviston

That feels like it might be a bit messy. I'll investigate more, though.

comment:13 Changed 12 months ago by JulianLeviston

I've been still looking at this, just to keep this updated.

As you say, HscInterpreted means modules are compiled to bytecode and then interpreted. Interpreter can interact with native code and with -fobject-code you tell GHCi to compile the loaded modules to native code rather than to bytecode (the default, or -fbyte-code).

The above paragraph confused me.

The -e and --interactive flags setup HscInterpreted as the language in main' in Main.hs. DynFlags can be used to override this target (ie with -fbytecode or -fobjectcode). It seems like a mistake to be able to override it when it's already set to the HscInterpreted target, though I don't really understand if that's actually wanted.

Like, in the case that you're using ghci, would you ever want to turn on -fobjectcode from within the interpreter? What would that mean if you could do that? Would it start compiling to object code and then execute the compiled code? If you're compiling with ghc --make and you also use -fbytecode is that something that's intended?

I guess I'm trying to figure out if, when -e and --interactive set the HscInterpreted target it actually makes more sense to have that be a mode of the compiler that cannot be adjusted via DynFlags rather than the target which can be adjusted.

However, I don't know the intent well enough. I'm not sure it's captured anywhere? The man doc for ghc seems to be extremely brief on what these particular flags mean, or are intended for.

comment:14 Changed 12 months ago by osa1

Here's another way to say what I mean. When in GHCi we do two kinds of compilation:

  • We compile loaded modules (if necessary)
  • We compile expressions typed in the GHCi prompt

The target only makes sense for (1). In (2) we only compile to bytecode. So really HscInterpreted and flags like -O etc. are only applicable to (1). But currently we also apply some of those flags/settings to (e.g. -O) to (2) which is what's causing this bug.

One of my suggestions in comment:11 was to separate these two compilations so that when we do (2) we never try to optimise the code. This can be done by implementing a new (or modifying the existing one if one already exists) top-level function for compiling GHCi expressions and updating DynFlags there to fix the compilation settings (e.g. by resetting optimisation level) for GHCi.

Is this any more clear than my previous comment?

The -e and --interactive flags setup HscInterpreted as the language in main' in Main.hs. DynFlags can be used to override this target (ie with -fbytecode or -fobjectcode). It seems like a mistake to be able to override it when it's already set to the HscInterpreted target, though I don't really understand if that's actually wanted.

So the lang/target doesn't matter when compiling GHCi expressions, as you _have to_ compile those to bytecode regardless of the lang/target. I meant overriding optimisation settings, not the lang/target.

comment:15 Changed 12 months ago by JulianLeviston

Oh that's much clearer, thank you. I'll have to dig in more to find out about the split of compilation of loaded modules versus expressions (at GHCi prompt and/or given to -e flag) to think about how to separate those two types of code.

As far as I can see so far in my explorations, HscInterpreted (ie the target) is the marker used to determine whether to compile to bytecode or not. Loading with -e or --interpreted sets this, and Setting -fbytecode sets it, too, and conversely, setting -fobjectcode sets it to the standard compile target for the platform (which is near HscInterpreted.

Obviously that's not the whole story, though, of course, otherwise we wouldn't have this bug, so I'll explore more.

Thanks for being so helpful and patient.

Last edited 12 months ago by JulianLeviston (previous) (diff)

comment:16 Changed 12 months ago by JulianLeviston

I'm leaving this here for myself as WIP notes for later.

Execution path:

  1. ghc/Main.hs main function
  2. (mode, argv3, flagWarnings) <- parseModeFlags argv2
  3. case mode of ... Right postStartupMode ->
  4. case postStartupMode of ->
  5. Right postLoadMode -> main' postLoadMode dflags argv3 flagWarnings
  6. this hits the main' function after getting postLoadMode of either DoInteractive or DoEval (because these are the HscInterpreted target cases)
  7. this is the meat, where we're pulling apart the flags then throwing them at ghciUI which evaluates to an expression of interactiveUI defaultGhciSettings for both cases
  8. The interactiveUI function comes from the GHCi.UI module...
Last edited 9 months ago by JulianLeviston (previous) (diff)

comment:17 Changed 9 months ago by JulianLeviston

To refresh my brain (and so I don't have to re-load this into my brain over and over)... as per above, but with annotation:

  1. In ghc/Main.hs, the main function is the entry point for all of GHC.
  2. The line with (mode, argv3, flagWarnings) <- parseModeFlags argv2 parses the mode flags out. There are two possible cases here: either we have a Left preStartupMode or Right postStartupMode. The preStartupMode case is only for doing things before GHC starts up; things such as outputting the version number, etc (ie not executing code). So, we're only interested in the Right postStartupMode variant.
  3. Continuing on, we start GHC by using GHC.runGhc with mbMinusB, then pull the flags out with GHC.getSessionDynFlags. We then case on postStartupMode which itself is an Either as well... it's similar to above, but here we have some preLoadMode values that (I'm assuming) can only be got when GHC has started up. Things like showing info, ghc usage, ghci usage and printing the flags. Here, though, we're only interested in the Right variant again — the postLoadMode value gets pattern matched out and;
  4. next this hits the main' function as: main' postLoadMode dflags argv3 flagWarnings. This function then cases on postLoadMode... and the only branches of this we're really interested in is DoInteractive -> (CompManager, HscInterpreted, LinkInMemory) and DoEval _ -> (CompManager, HscInterpreted, LinkInMemory)... there are five other matches, but they're make, backpack, something called MkDependHS and also AbiHash, then the catchall (ie _). Essentially we want to block optimisation on the interactive and eval cases. So, this matches (mode, lang, link) to the values (CompManager, HscInterpreted, linkInMemory) respectively for both of these cases (interactive and eval).
  5. Next we use a series of let expressions building up dflags1, dflags2, dflags3, etc. then, at the very end, we have another case stateament inside of an evaluation of handleSourceError to handle source code errors... again, the only two cases we care about are DoInteractive and DoEval exprs, both of which evalute to ghciUI expressions. Respectively: ghciUI hsc_env dflags6 srcs Nothing and ghciUI hsc_env dflags6 srcs $ Just $ reverse exprs.
  6. The ghciUI function pulls the flags out after initializing the plugins with the hsc_env and dflags0, then sets this into the session with GHC.setSessionDynFlags. We then call the interactiveUI function with args thusly: interactiveUI defaultGhciSettings srcs maybe_expr. This function comes from the GHCi.UI module in ghc/GHCi/UI.hs.

So, we want to investigate interactiveUI in ghc/GHCi/UI.hs.

Note: See TracTickets for help on using tickets.