Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#9049 closed feature request (fixed)

Expose srcLoc from the assertion architecture to allow better error messages

Reported by: nh2 Owned by: gridaphobe
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 7.8.2
Keywords: Cc: augustss
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #5273, #624 Differential Rev(s): Phab:D578
Wiki Page:

Description

Assertions give great error messages by including file name and line, but they are disabled for optimisation. While there is a compiler flag to turn them on even in presence of optimisation, this is an awkward external way to simply attach a useful error location to something like error.

It would be great if we could decouple getting the current srcLoc from assertions, and making it available for good error messages.

It is currently possible to do this using TemplateHaskell (e.g. the placeholders package) or CPP, but both are inconvenient for technical reasons (such as breaking tooling) and do not enjoy widespread use.

It seems a shame that the architecture for annotated error messages is already there, but only available through a mechanism that is disabled for optimisation (or at least "intended" to be disabled for optimisation).

Notably, we have the same architecture available for pattern mach failures:

f x | x > 0 = "foo"

main = putStrLn (f (-1)) -- gives srcLoc'd pattern match error message
f x | x > 0 = "foo"
f _         = error "f needs a positive number"

main = putStrLn (f (-1)) -- does not give a srcLoc'd error message

It seems non-sensical that your try to write a better error message (here using error) yields in fact a less-helpful error message.

I completely agree that one can aid bug-fixing by writing globally unique arguments to error or using TH, but from an engineering perspective, a little bit of help from the language/compiler side can improve the situation a lot, and in other languages this is acknowledged and used to everybody's benefit.

There are some alternatives we might consider, such as

  • exposing srcLoc :: String that gives an assert-style location at the lexical position of its occurrence
  • adding an error-like function that behaves like error but also prints a source location

Arguably srcLoc :: String breaks equational reasoning on the syntactic level, as let x = srcLoc in (x, x) and (srcLoc, srcLoc) have different values. However, looking at it from a program execution point of view, referential transparency is preserved since once compiled, any function using srcLoc will always return the same location.

I believe that offering some kind of improvement over the dreaded error or Prelude: undefined should be easy to implement, but let's hear what the people who actually know the compiler think about it.

Change History (42)

comment:1 Changed 5 years ago by simonpj

Cc: augustss added

This is a sensible, and long-running feature request. I can't find other tickets about it, which is puzzling, but see ExplicitCallStack for lots of material. I thought that Lennart was muttering about this too, but I can't find the trail.

It's possible I may get some time/effort to work in it this summer, but it'd be great to evolve a clear design first.

Simon

comment:2 in reply to:  1 Changed 5 years ago by refold

Replying to simonpj:

I thought that Lennart was muttering about this too, but I can't find the trail.

I think you're referring to these posts:

http://augustss.blogspot.se/2014/04/a-small-haskell-extension.html http://augustss.blogspot.se/2014/04/haskell-error-reporting-with-locations_5.html

comment:3 Changed 5 years ago by simonpj

Simon Marlow's response is this: "Compile your program with -prof, run with +RTS -xc" and you'll get exactly the backtrace you want".

You have to compile a profiled version of the packages you install, but you can do that by adding --enable-pofiling to Cabal, and you can put that in your ~/.cabal file.

Doesn't that do it?

Simon

comment:4 Changed 5 years ago by nh2

Not really.

In my view giving this kind of good error message really doesn't have to do much with profiling.

Sure, profiling currently improves error messages, but ideally (and practically) I don't want to compile with profiling to get a very orthogonal feature.

comment:5 Changed 5 years ago by carter

am i correct in understanding that the current state of things is

a) the best option for now is to build with profiling?

b) the feature request will be subsumed by the work on stacktraces thats (hopefully slated) for ghc 7.10?

(or is there some nuance i'm missing?)

comment:6 Changed 5 years ago by nh2

@carter

a) I believe so.

b) I do not know much about the stacktraces work, but as with profiling, these kind of run-time things still seem very orthogonal to embedding a source location into an error string.

Of course any form of better error messages is very appreciated, just not sure if these should be treated as the same.

comment:7 Changed 5 years ago by simonpj

I've revised ExplicitCallStack to discuss alternatives.

comment:8 Changed 5 years ago by simonmar

I actually like the wiki:ExplicitCallStack/ImplicitLocations idea, I'm just worried that if we have it then we'll have people who want location-abstracted versions of all the partial functions in base, and that's a lot of extra clutter. If we could provide the feature but draw the line at using it in APIs (except for error, undefined, throw, and other things that explicitly raise exceptions) then I'm all for it.

I realise that then you don't get automatic head [] locations. But it doesn't stop someone from defining their own location-abstracted head, or indeed making a package of location-abstracted partial functions from base.

comment:9 Changed 5 years ago by rodlogic

Closed #9794 as a duplicate of this one.

comment:10 Changed 5 years ago by rodlogic

I read the wiki:ExplicitCallStack/ImplicitLocations, and the general approach makes sense. The notion of a stack of source locations startes to loose me, though. Could you give a more explicit example?

Does it basically mean that show (?location :: Location) below will render g's location and g's call site location? Or am I misunderstanding this?

h :: [a] -> Int
h as = f as

g :: (?location :: Location) => [a] -> Int
g as = f as

f :: (?location :: Location) => [a] -> Int
f []     = error ("Failure at " ++ show (?location :: Location))
f (x:xs) = ...

Could you point me to:

  1. In what module should Location be defined?
  2. Where in the type checker should Location be constructed?
  3. Where should Locations be pushed?

I can then take a closer look and come back with additional questions.

@nh2 I am not sure how far along you are with this, so ping me if this is basically done already ;-)

Last edited 5 years ago by rodlogic (previous) (diff)

comment:11 Changed 5 years ago by simonmar

Simon PJ: you asked what I thought about ImplicitLocations, see comment:8 above.

comment:12 Changed 5 years ago by gridaphobe

Owner: set to gridaphobe

I'm starting to work on implementing wiki:ExplicitCallStack/ImplicitLocations. If anyone else has been working on it, please let me know so I don't unnecessarily duplicate work :)

simonmar: I agree that the changes to base should be kept to a minimum, at least initially.

comment:13 Changed 5 years ago by simonpj

gridaphobe, rodlogic, are you willing to reveal your email identities? I'm having trouble connecting up email discussion with Trac personae! Thanks

Simon

comment:14 Changed 5 years ago by gridaphobe

simonpj: This is Eric Seidel (eric at seidel.io). I've told trac my name and email, but it doesn't seem to expose them to others! I'd be happy to tick a "make my name+email public" box if there's one I'm missing somewhere.

comment:15 Changed 5 years ago by simonmar

Let me elaborate a bit more on my concerns here.

We currently have two imperfect ways to get a stack trace: profiling and the new -g in 7.10. This is another imperfect way, with different tradeoffs. In particular, you have to modify the source code to take advantage of it (not true of the other two ways). This is fine when it's your own program, but if implicit locations start to be used in common APIs then I'm not sure the extra clutter and user confusion is worth the benefit. It's also very hard to get rid of: APIs have greater longevity than compiler features.

Specifically I'm concerned that

  1. Users have three imperfect solutions to choose from, which is confusing.
  2. Users will want either (a) head to have a implicit location, or (b) for there to be a new headLocated function that has an implicit location. Both are bad: (a) because head no longer matches the Prelude definition, and users will be confused, and (b) because there are two versions of head, and users will be confused. Moreover, this is a slippery slope: every package author be getting bug reports wanting implicit location versions of all their partial functions. Some users will like this, others will hate it. There will be no clear policy on what to do, and different libraries will adopt different strategies, leading to yet more confusion.
  3. We have one more abstraction that must be eliminated by the simplifier to get reliable performance. People who are really concerned about performance will either avoid it, or use #ifdef to ensure that they don't have to worry about it.
  4. It relies on a very little-used extension (implicit parameters) so those who want to understand what that strange ?location means have to go and understand implicit parameters. Were it not for this we might be able to deprecate implicit parameters as a feature that didn't turn out to be that useful.

Let's think very carefully before we start down this path. It's a really useful feature for end-user code, but the problems arise when we start using it in library APIs. I'd be perfectly OK with this feature (as I mentioned in my earlier comment) as long as it does not infect library APIs except for things that are explicitly undefined, like error, undefined, and so on.

comment:16 Changed 5 years ago by gridaphobe

One use-case that the implicit parameters approach supports that neither -prof nor -g really do is reifying the source locations to a data structure that your haskell program can process. I imagine this is a fairly niche scenario, but it's extremely useful for embedded DSLs. I was recently working on tooling support for an EDSL used to write large (10+ kloc) programs, and providing useful diagnostic messages for the DSL was quite difficult, because I only had access to the internal AST, nothing remained of the original haskell program. I ended up writing a Core-to-Core plugin to insert the source locations that I mined from -prof, but it's a bit kludgy. This approach, on the other hand, makes it trivial for a function to get a hold of its call-site.

Perhaps it would make sense to discuss the implications of this feature for library authors on the core-libraries list?

comment:17 Changed 5 years ago by gridaphobe

I'd also like to add that I think it would be a shame to deprecate ImplicitParams, this feature notwithstanding. ImplicitParams is niche, yes, but it's also super convenient at times, in particular for passing around some configuration without having to restructure your whole program as a reader monad.

comment:18 in reply to:  16 Changed 5 years ago by simonpj

Replying to gridaphobe:

One use-case that the implicit parameters approach supports that neither -prof nor -g really do is reifying the source locations to a data structure that your haskell program can process.

Yes. So (importantly) let's not just give access to a String but rather to a proper data structure with line number, module, package etc. We should use the same data structures for this purpose as the StaticPtrInfo fields in StaticPointers/ImplementationPlan. The latter are not yet well fleshed out, and advertised as subject to change.

comment:19 Changed 5 years ago by gridaphobe

Differential Rev(s): Phab:D578
Status: newpatch

comment:20 in reply to:  13 Changed 5 years ago by rodlogic

Replying to simonpj:

gridaphobe, rodlogic, are you willing to reveal your email identities? I'm having trouble connecting up email discussion with Trac personae! Thanks

Simon

Simon, my real name is Yuri de Wit and my email is admin@….

comment:21 in reply to:  19 ; Changed 5 years ago by rodlogic

Replying to gridaphobe:

I've sent a patch that implements ExplicitCallStack/ImplicitLocations to Phab (https://phabricator.haskell.org/D578).

This is great. I would have taken me quite a while to get there.

In your patch you mention that you didn't change base at this point, could we at least change GHC.Base to use it and add a new assertMsg?

This would basically cover the ASSERT macros that exist in HsVersions.h:

#define ASSERT(e)      if debugIsOn && not (e) then (assertPanic __FILE__ __LINE__) else
#define ASSERT2(e,msg) if debugIsOn && not (e) then (assertPprPanic __FILE__ __LINE__ (msg)) else
#define MASSERT(e)      ASSERT(e) return ()
#define MASSERT2(e,msg) ASSERT2(e,msg) return ()
#define ASSERTM(e)      do { bool <- e; MASSERT(bool) }
#define ASSERTM2(e,msg) do { bool <- e; MASSERT2(bool,msg) }

comment:22 Changed 5 years ago by gridaphobe

Milestone: 7.12.1

comment:23 Changed 5 years ago by simonpj

Thanks. I've added some comments to Phab:D578.

Here are some other thoughts:

  • Could you modify ExplicitCallStack/ImplicitLocations to describe the exact specification of what you propose? For example, the design means that there are two new types in the base library, defined in GHC.Location:
    • SrcLoc (for a source location)
    • Location (for a stack of source locations) Both are abstract, so you can't make a new SrcLoc or Location; only GHC can do that. Is that what we want?
  • You can add a section on open questions too, for things you aren't sure about.
  • I'm very keen that SrcLoc is used also by the Typeable library (and any other reflection libraries too). Currently Data.Typeable.Internals defines TyCon which contains package/module/name information, but it should just use a SrcLoc. We are proposing to redesign Typeable anyway (see Typeable) so we can do this SrcLoc stuff at the same time
  • Given this broader use, is Location the right name? It's really more like a SrcLocStack.
  • The stack would be even better if it said something like "foo called at <location>" rather than just "<location>". That would be easy enough to implement (the information is in the CtOrigin of the constraint), but again the data type would need to be elaborated a bit, something like
    data Location = Location [(String, SrcLoc)]
    
  • At a grungy level, I wonder about all the copies of the same module-name and file-name strings. Maybe the desugarer should just generate one copy and use it? Or maybe CSE should gather them up (I don't think it does so at the moment.)

comment:24 in reply to:  23 Changed 5 years ago by gridaphobe

Replying to simonpj:

  • Could you modify ExplicitCallStack/ImplicitLocations to describe the exact specification of what you propose? For example, the design means that there are two new types in the base library, defined in GHC.Location:
    • SrcLoc (for a source location)
    • Location (for a stack of source locations) Both are abstract, so you can't make a new SrcLoc or Location; only GHC can do that. Is that what we want?
  • You can add a section on open questions too, for things you aren't sure about.

Done. It's not an open question per se, but folks following this ticket may want to look at the second bullet point under Implementation Details. I'm a bit uneasy about veering that far away from the standard behavior of ImplicitParams, but I think it's worth considering at least.

  • I'm very keen that SrcLoc is used also by the Typeable library (and any other reflection libraries too). Currently Data.Typeable.Internals defines TyCon which contains package/module/name information, but it should just use a SrcLoc. We are proposing to redesign Typeable anyway (see Typeable) so we can do this SrcLoc stuff at the same time
  • Given this broader use, is Location the right name? It's really more like a SrcLocStack.
  • The stack would be even better if it said something like "foo called at <location>" rather than just "<location>". That would be easy enough to implement (the information is in the CtOrigin of the constraint), but again the data type would need to be elaborated a bit, something like
    data Location = Location [(String, SrcLoc)]
    

In anticipation of this I've renamed Location to CallStack, but it looks like the CtOrigin that we get may not provide enough info. In my test-case at least, I'm getting a TypeEqOrigin, which isn't particularly helpful.

  • At a grungy level, I wonder about all the copies of the same module-name and file-name strings. Maybe the desugarer should just generate one copy and use it? Or maybe CSE should gather them up (I don't think it does so at the moment.)

Hmm, if this is an issue my instinct would be that CSE should be responsible for cleaning up the duplicate strings. It seems to fit with the Core philosophy of having the simplifier clean up after the various other passes, and may provide benefits outside of these IP CallStacks too.

comment:25 in reply to:  21 Changed 5 years ago by gridaphobe

Replying to rodlogic:

Replying to gridaphobe:

I've sent a patch that implements ExplicitCallStack/ImplicitLocations to Phab (https://phabricator.haskell.org/D578).

This is great. I would have taken me quite a while to get there.

In your patch you mention that you didn't change base at this point, could we at least change GHC.Base to use it and add a new assertMsg?

I'd rather keep that patch feature-only, to focus discussion around the design/implementation. But I'm happy to put together another patch, once D578 is merged, to discuss using it in base.

comment:26 Changed 5 years ago by simonpj

Re "the second bullet point under Implementation Details", I now understand. Good point. The issue is this:

f :: (?loc :: CallStack) => IO ()
f = print (?location :: CallStack)

There are two alternatives for what to pass to print:

  • Pass the call site of print, pushed onto the passed-in call stack
  • Pass a singleton stack, just the call site of print, reflecting the change in name.

You have implemented the first, but I think the latter would perhaps be more intuitive. Does anyone else have opinions?

comment:27 Changed 5 years ago by simonpj

You did not respond to bullet 5 of comment:24. I'd like to see displayed stacks looking like:

  foo, called at Bar.hs line 5
  wim, called as Wobble.hs line 9
  ...etc...

Currently we can get the call sites but not the function being called. I think it'd make the stacks a lot more informative.

comment:28 in reply to:  27 Changed 5 years ago by gridaphobe

Replying to simonpj:

You did not respond to bullet 5 of comment:24. I'd like to see displayed stacks looking like:

  foo, called at Bar.hs line 5
  wim, called as Wobble.hs line 9
  ...etc...

Currently we can get the call sites but not the function being called. I think it'd make the stacks a lot more informative.

Sorry, my response was jumbled in with the rename to CallStack

... but it looks like the CtOrigin that we get may not provide enough info. In my test-case at least, I'm getting a TypeEqOrigin, which isn't particularly helpful.

The TypeEqOrigin appears to come from call-sites; I also get an IPOccOrigin when the IP is used.

comment:29 Changed 5 years ago by simonpj

Just ignore CtOrigin for now. Just think about getting a more informative stack.

comment:30 in reply to:  26 Changed 5 years ago by gridaphobe

Replying to simonpj:

Re "the second bullet point under Implementation Details", I now understand. Good point. The issue is this:

f :: (?loc :: CallStack) => IO ()
f = print (?location :: CallStack)

There are two alternatives for what to pass to print:

  • Pass the call site of print, pushed onto the passed-in call stack
  • Pass a singleton stack, just the call site of print, reflecting the change in name.

You have implemented the first, but I think the latter would perhaps be more intuitive. Does anyone else have opinions?

The more I think about this, the more inclined I am to go with the latter behavior as well. The former, while useful, is just a bit strange. Also, from a maintainability perspective, it would probably be easier to switch from the 2nd to the 1st behavior, if needed, than vice-versa (since the 1st should allow strictly more programs to type-check).

But I'd also like to hear from others.

comment:31 Changed 5 years ago by Austin Seipp <austin@…>

In c024af131b9e2538486eb605ba8af6a8d10fe76d/ghc:

Expose source locations via Implicit Parameters of type GHC.Location.Location

Summary:
IPs with this type will always be solved for the current source
location. If another IP of the same type is in scope, the two locations will be
appended, creating a call-stack. The Location type is kept abstract so users
cannot create them, but a Location can be turned into a list of SrcLocs, which
correspond to individual locations in a program. Each SrcLoc contains a
package/module/file name and start/end lines and columns.

The only thing missing from the SrcLoc in my opinion is the name of the
top-level definition it inhabits. I suspect that would also be useful, but it's
not clear to me how to extract the current top-level binder from within the
constraint solver. (Surely I'm just missing something here?)

I made the (perhaps controversial) decision to have GHC completely ignore
the names of Location IPs, meaning that in the following code:

    bar :: (?myloc :: Location) => String
    bar = foo

    foo :: (?loc :: Location) => String
    foo = show ?loc

if I call `bar`, the resulting call-stack will include locations for

1. the use of `?loc` inside `foo`,
2. `foo`s call-site inside `bar`, and
3. `bar`s call-site, wherever that may be.

This makes Location IPs very special indeed, and I'm happy to change it if the
dissonance is too great.

I've also left out any changes to base to make use of Location IPs, since there
were some concerns about a snowball effect. I think it would be reasonable to
mark this as an experimental feature for now (it is!), and defer using it in
base until we have more experience with it. It is, after all, quite easy to
define your own version of `error`, `undefined`, etc. that use Location IPs.

Test Plan: validate, new test-case is testsuite/tests/typecheck/should_run/IPLocation.hs

Reviewers: austin, hvr, simonpj

Reviewed By: simonpj

Subscribers: simonmar, rodlogic, carter, thomie

Differential Revision: https://phabricator.haskell.org/D578

GHC Trac Issues: #9049

comment:32 Changed 5 years ago by thoughtpolice

Resolution: fixed
Status: patchclosed

comment:33 Changed 4 years ago by Herbert Valerio Riedel <hvr@…>

In 75adc352549a43d4c37bd3bdade55cecf3d75bb1/ghc:

Add missing since-annotations for c024af131b9e2538

See also #9049

comment:35 Changed 4 years ago by bgamari

This has been merged to ghc-7.10 as e3dc28046373f3183dda56b096dbebec865e3be7.

comment:36 Changed 4 years ago by bgamari

Unfortunately 00cd6173a620ef99739d97ac843258fee8e2dee9, which e3dc28046373f3183dda56b096dbebec865e3be7 depends upon, changes an existing interface, which we try very hard to avoid in minor releases.

comment:37 Changed 4 years ago by gridaphobe

Does it actually break any client packages? There was talk of building stackage with the backport to investigate.

comment:38 in reply to:  37 Changed 4 years ago by hvr

Replying to gridaphobe:

Does it actually break any client packages? There was talk of building stackage with the backport to investigate.

Yeah, one single package from Stackage seems to be affected by this:

[11 of 14] Compiling HsWalk           ( HsWalk.hs, dist/build/ide-backend-server/ide-backend-server-tmp/HsWalk.o )

HsWalk.hs:136:35:
    Couldn't match expected type ‘SrcSpan’
                with actual type ‘RealSrcSpan’
    In the first argument of ‘go’, namely ‘span’
    In the second argument of ‘($)’, namely ‘go span’

comment:39 in reply to:  36 Changed 4 years ago by gridaphobe

FWIW, LiquidHaskell is also broken by 00cd6173a620ef99739d97ac843258fee8e2dee9, but as a maintainer I can deal with it.

comment:40 Changed 4 years ago by edsko

comment:41 Changed 4 years ago by bgamari

One thing that we may want to consider is using this mechanism to address #624, which seeks backtraces for indefinitely blocking MVar and STM operations. The only slightly tricky thing here is that you would need to pipe the backtrace through the RTS.

comment:42 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

Note: See TracTickets for help on using tickets.