Opened 12 months ago

Last modified 8 months ago

#15646 new bug

ghci takes super long time to find the type of large fractional number

Reported by: Johannkokos Owned by: JulianLeviston
Priority: normal Milestone: 8.6.1
Component: GHCi Version: 8.4.3
Keywords: newcomer Cc: alpmestan, goldfire, osa1
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case: T15646
Blocked By: Blocking:
Related Tickets: #5692 Differential Rev(s):
Wiki Page:

Description

:t 1e123456789

takes more than 3 seconds to get the type info.

:t 1e1234111111111111111111111

shows a warning/error,

GNU MP: Cannot allocate memory (size=93978265)

Attachments (1)

ghc_changes.patch (22.3 KB) - added by JulianLeviston 11 months ago.
PatchFile for 22 Oct 2018

Download all attachments as: .zip

Change History (86)

comment:1 Changed 12 months ago by Johannkokos

The problem seems to be specific to scientific notation literals. On GHCi, Both

10^1000000 :: Double

and

:t 10^1000000 :: Double

terminate near-instantly, while there is a noticeable delay with

1e1000000 :: Double

and

:t 1e1000000 :: Double

.

Comment on reddit by /u/duplode

comment:2 Changed 12 months ago by osa1

Keywords: newcomer added; type removed

It seems like adding one more digit in the literal causes 10x increase in desugaring + type checking time:

~ $ echo ':t 1e10000000' | time ghci
...
ghci  0,37s user 0,04s system 97% cpu 0,418 total

~ $ echo ':t 1e100000000' | time ghci
...
ghci  3,63s user 0,12s system 99% cpu 3,749 total

~ $ echo ':t 1e1000000000' | time ghci
...
ghci  43,77s user 1,68s system 100% cpu 45,429 total

I also tried compiling this

a = 1e1000000000

and it also takes forever. I suspect this is a type checker bug or a desugarer bug.

I think this would be a good newcomer ticket. I'd start with finding how the expression 1e1000000000 parsed, and then find relevant desugarer and type checker code. Then add some print statements around those code to which one takes so long.

comment:3 Changed 12 months ago by JulianLeviston

Owner: set to JulianLeviston

comment:4 Changed 12 months ago by JulianLeviston

Took me a very long time to realise I could use ghc -e ':t 1e100000000' -ddump-parsed to spit out the parsed file.

First I dug into the generated source for the parser trying to wrap my head around how it's built. Then I tried to write a program to spit out the parse result, but that was ultimately fruitless:

    module Blah () where

    import Parser
    import Lexer
    import GHC
    import DynFlags
    import StringBuffer (stringToStringBuffer)
    import FastString
    import SrcLoc
    import Outputable

    main :: IO ()
    main = do
      dynFlags <- getProgramDynFlags
      let x = runParser dynFlags "putStrLn \"Hey\"" parseStatement
      case x of
        POk pstate res -> putStrLn $ ppr res
        _ -> putStrLn "Fail"

    runParser :: DynFlags -> String -> P a -> ParseResult a
    runParser flags str parser = unP parser parseState
      where
        filename = "<interactive>"
        location = mkRealSrcLoc (mkFastString filename) 1 1
        buffer = stringToStringBuffer str
        parseState = mkPState flags buffer location

That was pretty silly... anyway, I subsequently realised I could use flags to spit out stage output. I should have read the docs on hacking on GHC more! That yielded this:

➜  ghc git:(master) ✗ ./inplace/bin/ghc-stage2 blah2.hs -ddump-rn -dppr-debug
[1 of 1] Compiling Blah2            ( blah2.hs, blah2.o )
blah2.hs:1:1:
    
    ==================== Renamer ====================
    nonrec {blah2.hs:3:1-24}
           main:Blah2.largevalue{v rpX}
           main:Blah2.largevalue{v rpX}
             = {blah2.hs:3:14-24}
               1e100000000 (base:GHC.Real.fromRational{v 02C})
           <>

So I'll dig into GHC.Real.fromRational next and see if I can't work out what's going.

comment:5 Changed 12 months ago by JulianLeviston

Seems to be picking Double as the type... I noticed that Double over 1e309 parses as Infinity...

ghc git:(master)time ./inplace/bin/ghc-stage2 blah2.hs -ddump-tc -dppr-debug
[1 of 1] Compiling Blah2            ( blah2.hs, blah2.o )
TYPE SIGNATURES
  (main:Blah2.$trModule{v r1} [lidx] :: ghc-prim:GHC.Types.Module{tc 622}) ::
    ghc-prim:GHC.Types.Module{tc 622}
  (main:Blah2.largeValue{v rpX} [lid] :: ghc-prim:GHC.Types.Double{(w) tc 3k}) ::
    ghc-prim:GHC.Types.Double{(w) tc 3k}
TYPE CONSTRUCTORS
COERCION AXIOMS
Dependent modules: []
Dependent packages: [base, ghc-prim, integer-gmp]
blah2.hs:1:1:
    
    ==================== Typechecker ====================
    {<no location info>}
    ((main:Blah2.$trModule{v r1} [lidx] :: ghc-prim:GHC.Types.Module{tc 622})
       :: ghc-prim:GHC.Types.Module{tc 622})
      = ghc-prim:GHC.Types.Module{d 625}
          {<no location info>}
          (ghc-prim:GHC.Types.TrNameS{d 62b}
             {<no location info>}
             "main"#)
          {<no location info>}
          (ghc-prim:GHC.Types.TrNameS{d 62b}
             {<no location info>}
             "Blah2"#)
    {blah2.hs:3:1-22}
    {blah2.hs:3:1-22}
    (largeValue{v aLS} [lid] :: ghc-prim:GHC.Types.Double{(w) tc 3k})
      :: ghc-prim:GHC.Types.Double{(w) tc 3k}
    [LclId]
    main:Blah2.largeValue{v rpX}
      = {blah2.hs:3:14-22}
        1e1000000 (base:GHC.Real.fromRational{v 02C}
                     {<no location info>}
                     1e1000000)
    <>
    
./inplace/bin/ghc-stage2 blah2.hs -ddump-tc -dppr-debug  0.27s user 0.21s system 96% cpu 0.505 total

comment:6 Changed 12 months ago by monoidal

The literal 1e100 means fromRational (100...000 :: Rational), as specified in Haskell report. Constructing this number takes time and space proportional to the number of zeroes. It's not surprising it crashes badly when the exponent has 10 digits or more. On the other hand, the type of 10^1000000000 can be found quickly because the expression is not evaluated.

The big integer is already created during parsing (as can be seen by compiling main = print 1e1000 with -ddump-parsed-ast). With -XNumDecimals, we need to process the literal before we can tell its type: 1.234e3 is a valid Integer but 1.234e2 is not. I don't see any easy way to fix this and preserve backwards compatibility. Perhaps we could show a parse error when attempting to create a number with an unrealistic exponent.

Last edited 12 months ago by monoidal (previous) (diff)

comment:7 Changed 12 months ago by osa1

The literal 1e100 means fromRational (100...000 :: Rational), as specified in Haskell report

Where is this specified in the report? I can see the syntax in section 2.5 which links to sections 3.4 and 6.4.1 but I can't see this rule in any of those linked sections. Can't find where the semantics for this literal is defined.

It's not surprising it crashes badly when the exponent has 10 digits or more.

Wait, are you saying that typing 1e1234111111111111111111111 to take minutes and use up more than 20G of memory (residence, not allocation!) not surprising? Perhaps I'm completely lost then. Could you elaborate on how is this not surprising?

FWIW I just typed in that expression in Python REPL and it gave me an answer in an instant.

With -XNumDecimals, we need to process the literal before we can tell its type: 1.234e3 is a valid Integer but 1.234e2 is not.

I'm still not convinced that we can't do better but OK. At the very least we should be able to type those expressions quickly when -XNumDecimals is not enabled (which is the default).

comment:8 Changed 12 months ago by simonpj

Wait, are you saying that typing 1.7e1234111111111111111111111 to take minutes and use up more than 20G of memory (residence, not allocation!) not surprising?

I agree: that's absurd!

HOwever, unlike Python, we can't just compute a suitable Float, because literals are overloaded. The Report does say that a floating point literal like 1e100 means fromRational (n % d) where "the integers n and d are chosen so that n/d = f".

I suspect that in computing 17 % 10000000000000000000000 we try to find the GCD of the two before we even start with fromRational, and you can see this isn't going to end well.

What to do? Probably we need a special case for fromRational :: Rational -> Float (and similarly Double); and maybe even a special literal representation inside GHC for Rationals of form N / 10000000000000 for some number of zeros.

comment:9 Changed 12 months ago by osa1

a floating point literal like 1e100 means fromRational (n % d) where "the integers n and d are chosen so that n/d = f"

But finding the n and d should happen after type checking, no? Why idoes this even effect type checking time? Shouldn't :t just do type checking?

comment:10 Changed 12 months ago by monoidal

I suspect that in computing 17 % 10000000000000000000000 we try to find the GCD of the two before we even start with fromRational, and you can see this isn't going to end well.

It's not even GCD: just computing the nominator of 1e1000000000 in the binary representation requires at least log2(10**1000000000) bits of memory, which is over 396GB.

If we'd like to fix this, I see the following options:

  1. Move computation of n,d from parsing to desugaring. This should allow :t 1e1000000000, but let x _ = 1e1000000000 will still crash.
  2. Add a method to Fractional, say fromMantissaExp :: Integer -> Integer -> Integer -> a that will be called instead of fromRational, with a default implementation that calls fromRational. This will allow let x _ = 1e1000000000 to work, we could make it return Infinity for Double. This will likely have performance implications (I'm not sure if positive or negative).
  3. Change the representation of Ratio to have an extra constructor that represents numbers of the form N / D * 10**E.

All those solutions need some extra special cases for -XNumDecimals. It's doable, I'm not convinced it's worth the maintenance cost but I won't protest if it's done.

Last edited 12 months ago by monoidal (previous) (diff)

comment:11 Changed 12 months ago by alpmestan

Cc: alpmestan added

comment:12 Changed 12 months ago by JulianLeviston

  1. Change the representation of Ratio to have an extra constructor that represents numbers of the form N / D * 10E.

It feels to me that this (C) is the best of the 3 options, probably because it's the laziest. Ideally we don't want to compute anything until the last possible moment, I'd reckon.

As far as I can tell from section 2.5 of the report, the type of any literal where there's an e marking the exponent is going to be Fractional a => a. Can we not short circuit on this, and then keep the value as Ratio (ie "c" above) until its actual value is needed?

I'm a bit curious that computing time ./inplace/bin/ghc-stage2 -e '1e10000000 :: Float' takes a long time to render Infinity but I guess this issue would go away when we fix the typechecking bug.

I don't think choosing B is really an option because that'd involve changing the Haskell report.

As I'm pretty new, some direction would be good.

comment:13 Changed 12 months ago by osa1

So, to summarize the problem here:

  • Semantics of a floating point literal (which is in our case in XeY form) is defined as fromRational (n % d).
    • So we need to find and n and d such that fromRational (n % d) gives us our float.
  • As per Haskell 2010 chapter 6.4.1 (note that section 2.5 only gives the syntax, not the type) fromRational has type Fractional a => Rational -> a so fromRational (n % d) has type Fractional a => a
    • So XeY form always has type Fractional a => a !
  • Currently we tokenize an XeY we call readRational__ on the string "XeY", which returns this: (n%1)*(10^^(k-d)). n, k and d are parsed from the input so this is very cheap (in our example n is X, k is Y, d is 0).
    • However, evaluating (10^^_) :: Rational part of the expression takes long time and uses a lot of memory. Indeed, as comment:10 says, if you have 1000000000 as the exponent then representation of this number takes GBs of memory so this is expected.
  • comment:12 asks why rendering Infinity takes this long: that's because we generate the Infinity value during desugaring (after parsing), and for this we have to first generate the Rational value for the literal first (which is what is taking all the time and using all the memory). You can observe this if you run ghci with -fdump-parsed -fdump-ds and evaluate 1e100000 :: Float.

Perhaps it makes sense to distinguish these two problems:

  • Type checking is too slow
  • Generating Infinity :: Float is too slow

(a) in comment:10 fixes (1) but not (2). (c) in comment:10 fixes (1), and it may also fix (2) if we could somehow use this new constructor fields to return Infinity without calculating 10^^e :: Integer first (as far as I understand the new consturctor will look like: ExpRational n e -- stands for (n%1)*(10^^e)).

However, (c) means a change in a public type, and we'd need to evaluate performance changes etc. caused by turning a product type to a sum type (more optimizations apply to product types than sum types! at least until we improve demand analysis for sum types and use unboxed sums in WW). Also, I'm not sure if adding one more constructor to a library type just for this is a good idea. We'd need to think about how/whether to expose this to users (perhaps we do want to expose it in Data.Ratio somehow). This means long discussions and bikeshedding etc.

What to do? I think delaying evaluation readRational until it's needed by making some code lazier would work. Looking at the lexer, we generate a ITrational for this literal like this:

tok_float        str = ITrational   $! readFractionalLit str

readFractionalLit :: String -> FractionalLit
readFractionalLit str = ((FL $! (SourceText str)) $! is_neg) $! readRational str
                        where is_neg = case str of ('-':_) -> True

Notice how we're very deliberately strict here. My guess is that simply removing strictness around readRational (e.g. the $! in readFractionalLit) may fix this. If it doesn't then some other code down the line needs to be made lazier too. However I'm not sure if this causes other problems (the fact that the thunk for readRational str will keep the str alive is not a problem because the str only holds the string for the literal). We'll have to evaluate this change in strictness somehow.

If we follow this route then we'd also need to add a test to make sure typing this expression remains fast (strictness properties are hard to maintain ...).

Here's another idea that is like (c) but does not need changes in a public type: duplicate the Ratio type so that only the version used by GHC has the special constructor. The code gen and Data.Ratio will still use the current type. Now we can use the special constructor in lexing to avoid evaluating a huge Integer, and only when desugaring we do some kind of evaluation (this is where we can perhaps be smart and generate Infinity efficiently, or in the worst case we end up with the current situation, but with fast type checking).

None of these solutions strike me as particularly satisfying though...

comment:14 Changed 12 months ago by simonpj

Some thoughts

  • I don't think we should rely on laziness in the compiler. It'll come back to bite us.
  • It's not unreasonable that programs with silly literals will blow up at runtime, if it is evaluated, because of the fromRational r semantics. But it should not blow up at compile time.
  • The underlying problem is that, in the compiler, we represent the literal as a Rational. In BasicTypes we have:
    data FractionalLit
      = FL { fl_text :: SourceText     -- How the value was written in the source
           , fl_neg :: Bool            -- See Note [Negative zero]
           , fl_value :: Rational      -- Numeric value of the literal
           }
    
    This FractionalLit is used in HsLit and HsOverLit. And it is finally desugar in Match.dsLit:
    dsLit :: HsLit GhcRn -> DsM CoreExpr
    dsLit l = ...
        HsRat _ (FL _ _ val) ty -> do
          num   <- mkIntegerExpr (numerator val)
          denom <- mkIntegerExpr (denominator val)
          return (mkCoreConApps ratio_data_con [Type integer_ty, num, denom])
    
    That is we finally generate Core for (n % d).

  • But why do we produce a compile-time Rational for the literal?? For integral values it makes sense to do so, so that we can do constant folding (turning 1 + 2 into 3 at compile time). But by the time we get to Core, we've turned it into n % d and I don't think we then do any useful compile time work.

So my solution is this:

  1. Drop the fl_value field in FractionalLit
  2. Change dsLit on FractionalLit to desugar to readRational <str>, where<str> is the string the user wrote, recorded in the fl_text field. That is, defer the construction of the Rational to runtime.

I think that'd be simple to do.

If we were going to parse a string at runtime, we'd want to be sure that parsing would succeed, but I think the lexer has ensured that it's parseble. I suppose another alternative would be to have

data FractionalLit
  = FL { fl_text :: SourceText     -- How the value was written in the source
       , fl_neg :: Bool            -- See Note [Negative zero]
       , fl_before, fl_after, fl_exp :: Integer
                                   -- Denotes <before>.<after>E<exp>
       }

THat is, parse the pieces of the string, and record them in the literal. Then desugar to makeRational before after exp which again defers to runtime the building of the Rational itself.

comment:15 Changed 12 months ago by osa1

That sounds good to me.

Just to make sure, the proposed solution will still blow up when desugaring, so currently we aim to fix the type checking, right? Or am I missing something about the proposed solution?

One problem with removing fl_value may be that any users of this type (maybe source plugins?) will have to provide a string instead of an actual float (we currently allow users to only give a value and omit fl_text by passing a NoSourceText). I don't know if there are any users (can it be used by source plugins?) of this type but maybe it's worth checking.

comment:16 Changed 12 months ago by simonpj

the proposed solution will still blow up when desugaring

No, it won'r blow up when desugaring. Just desugar the constant to fromRational (readRational "1E1000"). Nothing blows up there!

One problem with removing fl_value may be that any users of this type (maybe source plugins?) will have to provide a string instead of an actual float

Well (show f) is a good string. And I quite like the fl_before/after/exp version too.

comment:17 Changed 12 months ago by JulianLeviston

Owner: JulianLeviston deleted

Removing myself as owner because this has gone well beyond the small simple patch I was hoping for one of my first tickets :) Hopefully someone else can pick it up.

comment:18 Changed 12 months ago by osa1

The discussion may be confusing but the proposed change is actually quite simple. Just refactor one type and follow the type errors. In more details:

  • Update the FractionalLit type as shown in comment:14. There are two variants, pick one. I'd pick the one with fl_before/fl_after/fl_exp.
  • readRational (the slow function that causes this ticket) is used by DynFlags and CmmLex too, so to keep things simpler let's keep the original readRational and add a new variant that returns FractionalLit (instead of Rational). This is the fast variant.
  • Replace readFractionalLit in Lexer.x with the new fast variant of readRational.

Sounds like a good first ticket to me.

Simon,

No, it won'r blow up when desugaring. Just desugar the constant to fromRational (readRational "1E1000"). Nothing blows up there!

But this relies on laziness, no? I thought you said this is not a good idea in comment:14.

comment:19 Changed 12 months ago by simonpj

But this relies on laziness, no?

No, it doesn't rely on laziness. At no stage will GHC construct that Rational. The desugarer will generate something like

Var 'fromRational' `App` (Var 'readRational' `App` Lit (LitString "1e100"))

That is, it'll generate Core that will, when compiled and run, compute the rational (at runtime). But the compiler just manipulates this Core data structure, there is no bad Rational in it. Does that help?

comment:20 Changed 12 months ago by monoidal

In this part:

       , fl_before, fl_after, fl_exp :: Integer
                                   -- Denotes <before>.<after>E<exp>

I would be careful with fl_after: we don't want to merely convert the part after the dot to an integer, because 1.2e3 and 1.02e3 are not the same. I think we could always shift the dot and exponent so that we always have a whole number before "e" (e.g. 1.2e4 -> 12e3, 0.1e-4 -> 1e-5)

Regarding desugaring to readRational "1e100", do I understand correctly programs would then have to parse 1e100 every time a floating literal is evaluated? My feeling is that this will affect negatively performance (but it's just a feeling - I suggest benchmarking before checking this in.)

One more complication is hexadecimal floats, which have the same problem (:t 0x1.0p100000000 is slow)

I know I've said this and I already sound like a grouch, but I would still prefer to forbid big exponents and call it a day. There are less than 1e90 elementary particles in the observable universe, doubles go up to 1e308 and rarely used extended floating point representations only slightly further. If someone wants to write 2e10000000000000 we might just ask them to write the code as 2 * 10^^10000000000000 instead. This at least has a chance to work given an appropriate type and typeclass instance, unlike computing this expression as integer which will fail. After all, we shouldn't move problems known at compile time to problems at run time. (I don't want to stir up an argument here. If you disagree, you can just ignore what I said, I won't insist.)

Last edited 12 months ago by monoidal (previous) (diff)

comment:21 Changed 12 months ago by simonpj

I would be careful with fl_after: we don't want to merely convert the part after the dot to an integer, because 1.2e3 and 1.02e3 are not the same.

Excellent point.

Regarding desugaring to readRational "1e100", do I understand correctly programs would then have to parse 1e100 every time a floating literal is evaluated?

No; it'll be floated to top level, and done once in the run of the program.

I would still prefer to forbid big exponents and call it a day.

The difficulty is then you have to pick what is "big"; and one day it'll bite us. So we'll add a compiler flag, etc. It doesn't smell, to me, like something that should cause the program to be rejected.

comment:22 Changed 12 months ago by JulianLeviston

Owner: set to JulianLeviston

The discussion may be confusing but the proposed change is actually quite simple. Just refactor one type and follow the type errors.

Thanks for the direction. I can happily do that, but a) it wasn't clear to me that there *was* a proposed solution. It seemed like there was still quite a bit of a discussion going on. b) It'd be good to also know what on earth I'm doing when I do it, and it seemed to be getting deep into some really complicated territory that I felt was utterly beyond my interest or understanding.

However, I'll add myself back as owner and when it's at a point where it seems there's consensus on a proposed solution, try to work out what that exactly means and continue the process of implementing a patch (it seems like it's got to consensus now).

Thanks for your collective patience :)

comment:23 Changed 12 months ago by osa1

Hmm, I'm not sure if doing parsing in runtime is a good idea. Why not do the parsing in compile time generate Core for this instead:

(X GHC.Real.% 1) GHC.Num.* (10 :: Integer) GHC.Real.^^ Y

for XeY? Now the huge Integer will be computed in runtime but we won't be doing parsing.

I just realized that this is basically desugaring XeY to (X%1) * 10^^Y, similar to the idea in comment:20 except we don't ask users to write this, instead we convert the XeY notation to the other.

Simon, any opinions on this?

comment:24 Changed 12 months ago by simonpj

Why not do the parsing in compile time generate Core for this instead:

Yes, I agree. That's what I meant in comment:14 with fl_before etc, but I expressed it badly. Let me try to do better. Accounting for monoidal's excellent point we could have

data FractionalLit
  = FL { fl_text :: SourceText     -- How the value was written in the source
       , fl_neg :: Bool            -- See Note [Negative zero]
       , fl_mantissa, fl_exp :: Integer
                                   -- Denotes <mantissa>E<exp>
       }

So 1.077E400 would be represented with fl_mantissa = 1077 and fl_exp = 397.

Then we desugar to

Var 'makeRational'  `App` (Lit 1077) `App` (Lit 397)

where makeRational is a new library function defined thus

makeRational :: Integer -> Integer -> Integer
makeRational i e = (i % 1) * (10 ^^ e)

Probably we want it to inline, but that's a separate matter.

comment:25 Changed 12 months ago by osa1

OK, that makes sense.

JulianLeviston, feel free to ping me on IRC if you have any questions -- or let us know here.

comment:26 Changed 12 months ago by JulianLeviston

Ok, so I tried to tackle this again today. The very first thing I did was to refactor data FractionalLit as suggested, then follow the type errors.

Firstly I wasn't 100% sure how to best approach a workflow, so I tried using GHCi and loading the BasicTypes module in, but that failed because of missing dependencies, so then I tried to use make fast -j3 in the ghc/compiler directory and that seemed to work out ok in terms of speed. I was initially worried it was going to take too long, but it's fine.

So, adjusting FractionalLit is fine. I made the change, then it complained bout mkFractionalLit being incorrect, as I'd expect.

I'm not 100% sure of the intent of mkFractionalLit, though — it looks like it's a simple constructor; when I compare it to mkIntegralLit, etc, and when I looked up the use-sites it appears to passing a Rational value still makes sense, so I thought I'd keep that. Then I realised I'm going to have to effectively implement something like an adjusted version of fromBaseRational 10 from https://hackage.haskell.org/package/numeric-prelude-0.4.3/docs/src/Number-Positional.html#fromBaseRational to find the mantissa and exponent. Am I barking up the right tree here?

comment:27 Changed 12 months ago by JulianLeviston

Chatted a bit more with osa1 about this now on IRC. Seems TH has the same issue(s), because the use-sites for mkFractionalLit includes the cvtOverLit :: Lit -> CvtM (HsOverLit GhcPs) function from hsSync/Convert.hs... the cvtOverLit (RationalL r) clause, which reads:

cvtOverLit (RationalL r)
  = do { force r; return $ mkHsFractional (mkFractionalLit r) }

Last edited 12 months ago by JulianLeviston (previous) (diff)

comment:28 Changed 12 months ago by JulianLeviston

AFAICS, the only use-sites for mkFractionalLit are in HsSync/Convert.hs for the above function and also cvtLit.

Does anyone have any thoughts about what a good approach to doing this might be? Should I be changing mkFractionalLit to Integer -> Integer -> FractionalLit or leave it as Real a => a -> FractionalLit as it already is? (the latter would imply doing some conversion to the number, the former would indicate the same conversion being moved up into the TH related code).

Thoughts?

comment:29 Changed 12 months ago by osa1

Right, so just to summarize what JulianLeviston said in my words and give a concrete example, in this program:

{-# LANGUAGE TemplateHaskell #-}

module Lib where

import Language.Haskell.TH

foo :: Q Exp
foo = [| 1e1000000 |]

we use the same problematic code path when parsing the quasi-quotation and end up calculating a huge Integer when parsing (which can be seen easily with -ddump-parsed-ast). This much is probably obvious if you're familiar with TH internals.

This will be fixed when we update the parser. The actual problem is we desugar this quasi-quote to:

foo = litE (rationalL (GHC.Real.:% @ Integer <huge integer> 1))

where litE is simply return . LitE and rationalL is RationalL. RationalL takes a Rational argument, which is what we're trying to avoid doing in compile-time. Furthermore, because these are public types they are hard (if not impossible) to change. I see two options:

  • Do the computation when generating the TH quasi-quote expressions (during desugaring, in DsMeta). This way the TH syntax wouldn't change (we would generate exactly the same desugared expr for this program), and the example above would take forever to compile.
  • Add one more literal constructor to TH syntax that looks like our new FractionalLit for this purpose. Then we could use our new desugaring (comment:23) in Convert.hs when we actually splice the expression. So both quasi-quote compilation and splicing become fast.

Any thoughts?

Last edited 12 months ago by osa1 (previous) (diff)

comment:30 Changed 12 months ago by simonpj

Cc: goldfire added

I think the Right Thing is to make TH syntax look like Hs syntax. After all, rational literals in TH are (as you show) subject to precisely the problem of this ticket, and are amenable to precisely the same solution.

When we complete the Trees That Grow epic that equivalence will become true by construction.

Richard Eisenberg is the TH tzar; let's ask him. Generally we have changed the details of TH syntax in every GHC release, and I don't think it's a major source of complaints.

To get rolling, though, you could do some approximate way of converting a Rational to literal in the form we have mapped out here. The passage through TH would not be fully faithful, but it'd be close.

comment:31 Changed 11 months ago by JulianLeviston

Update: Been working on this today; I've done the changes required in BasicTypes.hs and I'm on Lexer.hs changes now. It's going well, currently in the process of refactoring readHexRational to use a new function for reading the significand and exponent out of hex floating literals. (from Util.hs). Just finished doing that for readRational.

comment:32 Changed 11 months ago by JulianLeviston

Okay so I'm a bit stuck on implementing a version of readHexRational :: String -> Rational that looks like readHexSignificandExponentPair :: String -> (Integer, Integer) instead (where the pair returned is the significand and exponent in base 10).

The hex float format has its exponent in base 2, not base 10. Should I also store the base within the FractionalLit type? Otherwise I'd have to do a (not very) precise conversion, and it strikes me that that'd bring us back to where we were before but on the hex side instead of the decimal side.

I might give that a crack, actually. Let me know if not a good idea.

comment:33 Changed 11 months ago by simonpj

Would you like to lay out the design, and its moving parts, in a draft Note that will eventually be part of the source code?

eg I think (but am not sure) that your question is this: in representation of a literal fractional number:

data FractionalLit
  = FL { fl_text :: SourceText     -- How the value was written in the source
       , fl_neg :: Bool            -- See Note [Negative zero]
       , fl_mantissa, fl_exp :: Integer
                                   -- Denotes <mantissa>E<exp>
       }

in what number base is the fl_exp expressed? For example

  • 1.7e30 has fl_mantissa = 17, and fl_exp = 29, assuming base 10
  • But what about 0x5e.ff2p12? Here the mantissa can reasonably still be 0x5eff, but the exponent must (presumably, given the spec) be in base 2.

To me it sounds as if we need another field for the exponent base to accurately represent the literal. So the value of the literal is mantissa * (base ^ exponent). My instinct is just to make the a sum-type rather than another Int or Integer:

data ExponentBase = Base10 | Base 2

comment:34 Changed 11 months ago by JulianLeviston

Simon, your supposition is just what I had in mind when I asked "Should I also store the base within the FractionalLit type?".

A sum type for the exponent base is a great idea, I hadn't even thought of it, but it makes more much sense for efficiency, intention conveyance and clarity. Thank you!

Yeah, I'll lay out the design and draft note. It would have been helpful to me in the past to have known the way hex rationals work, so this would fulfil that role, too.

Last edited 11 months ago by JulianLeviston (previous) (diff)

comment:35 Changed 11 months ago by JulianLeviston

Here's my suggested draft, so far:

data FractionalLit
  = FL { fl_text :: SourceText                 -- How the value was written in the source
       , fl_neg :: Bool                        -- See Note [Negative zero]
       , fl_signi :: Integer                   -- The significand component of the literal
       , fl_exp :: Integer                     -- The exponent component of the literal
       , fl_exp_base :: FractionalExponentBase -- See Note [Fractional exponent bases]
       }

data FractionalExponentBase
  = Base2
  | Base10

-- Note [fractional exponent bases] For hexadecimal rationals of
-- the form 0x0.3p10 the exponent is given on base 2 rather than
-- base 10. These are the only options, hence the sum type. See also #15646.

I noted you were using mantissa rather than significand; I was working from the apparent suggestion by IEEE committee and Knuth that mantissa could be ambiguous under certain contexts (fractional part of a logarithm). Happy to use whatever is clearest or recommended. https://en.wikipedia.org/wiki/Significand

Last edited 11 months ago by JulianLeviston (previous) (diff)

comment:36 Changed 11 months ago by JulianLeviston

Hm... now, following the refactoring of changing the types, I'm up against this, in Convert.hs:

cvtOverLit (RationalL r)
  = do { force r; return $ mkHsFractional (mkFractionalLit r) }

... which clearly doesn't type-check anymore because it's expecting a RationalL r where r :: Rational. Given the Lit value constructors for TH don't include anything remotely compatible with this, I'm not too sure what to do. Can we change TH syntax?

Last edited 11 months ago by JulianLeviston (previous) (diff)

comment:37 Changed 11 months ago by osa1

Right, so there's a brief discussion about this in comment:29, however we can't convert a Rational to the new FractionalLit type so the idea doesn't quite work. Changing TH syntax is not easy becuase it's a user-facing type (would need a proposal). One idea comes to mind is to make FractionalLit a sum type, with an alternative for the old FractionalLit values:

data FractonalLit
  = FL { ... }
  -- | TemplateHaskell fractional lit: we lose information during conversion
  -- from Haskell syntax to TH syntax (happens when desugaring quasiquotes, in
  -- DsMeta) where we convert a `FL` to a `Rational` because that's what TH
  -- syntax wants.
  | THFL
      { fl_text :: SourceText
      , fl_next :: Bool
      , fl_value :: Rational
      }

Then we can propose a change to the TH syntax, depending on the result we can remove the THFL constructor.

Alternatively, Simon suggests finding an approximate conversion of a Rational to the new FractionalLit in comment:30. That'd fine to get the code to compile but it won't be mergeable until the conversion is fixed.

comment:38 Changed 11 months ago by JulianLeviston

Thanks osa1. I'm working on doing the sum type for FractionalLit as well. It's coming along nicely, I think. I'm looking forward to finishing it and getting what I'm sure will be a lot of feedback.

comment:39 Changed 11 months ago by JulianLeviston

I got it compiling and typechecking, then set about writing a test. Discussed with you (osa1) and a couple others in IRC about writing tests. Looked more into how the testing framework works, read up on writing tests. Pretty sure I can figure that out now (it's really not going to be hard, once I read all the test guides).

So that's fine, but I had an issue compiling after doing a git pull (for some reason I had to recompile stage 1 of the compiler again), so I recompiled and thought I'd try out my changes. So I typechecked a large scientific notational literal, and it got *** Exception: No match in record selector thfl_value I realised immediately that was because I'd just made the types match in the desugarer and renamer by using the new TH part of the FractionalLit type because the desugarer talks bout pattern group matching on HsFractional and seemed to expect a Rational.

So, the PatGroup data type, has a constructor PgN :: Rational -> PatGroup which I'm now going to refactor to be PgN :: Integer -> Integer -> PatGroup.

comment:40 Changed 11 months ago by JulianLeviston

:dance: Changed the types over properly, and now I got it compiling again and it now typechecks large exponent fractionals very quickly. :excited: I'll finish it up and get it on Phabricator.

Update: oh... famous last words. It still seems to be taking a long time to typecheck 1e100000000 hm. :( Update again: Pretty sure this is because I haven't done anything in the Hs related types in the typechecker yet. Be good if I knew more about what I was doing :-)

Last edited 11 months ago by JulianLeviston (previous) (diff)

comment:41 Changed 11 months ago by JulianLeviston

Okay... I think I've put all the pieces into place now. I'm a little confused as to why, but dsLit seems to still be taking a while to desugar with the fromRational (readRational txt) pattern in place.

I'm not 100% sure I've done what was intended...

    HsRat _ fl ty -> do
      let txt = case fl of
                  THFL { thfl_text = SourceText t } -> t
                  FL { fl_text = SourceText t } -> t
          val = fromRational (readRational txt)
      num   <- mkIntegerExpr (numerator val)
      denom <- mkIntegerExpr (denominator val)
      return (mkCoreConApps ratio_data_con [Type integer_ty, num, denom])

However if I do ./inplace/bin/ghc-stage2 --interactive then :t 1e10000000 and it takes over 3 seconds still. (One less digit takes under 1 second) so the bug is still there — it's now taking *less* time to typecheck, but the desugarer seems to still be catching me out.

I'll attach a patch file so you can see what I've done to now.

Changed 11 months ago by JulianLeviston

Attachment: ghc_changes.patch added

PatchFile for 22 Oct 2018

comment:42 Changed 11 months ago by osa1

Could you submit your patch to Phabricator? It's really hard to review it this way.

I haven't looked at the patch in details yet, but the desugaring in comment:41 is not right. We want to avoid building a rational in compile time (the Integer components of Rational is what's taking all the time), but that's what you're doing in the code in comment:41 (in readRational). See comment:24 for how we want to desugar this.

comment:43 Changed 11 months ago by JulianLeviston

Yeah, ok, sorry I was reading comment:16 by mistake... I also thought desugaring wasn't part of type-checking. In the diagram it happens after type checking. For some reason I'd assumed once it'd type checked the expression it would stop.

Ok I can easily adjust that now I know why.

Yeah I didn't submit to Phabricator yet because I knew it was obviously wrong. Thanks for correcting me. I'll give it another shot. Again, thanks for all the help, I really really appreciate it, and hope to repay by hopefully being able to fix bugs more easily with the next ones.

Last edited 11 months ago by JulianLeviston (previous) (diff)

comment:44 Changed 11 months ago by JulianLeviston

Have been trying to create the expression referred to by SPJ in comment:24 in MatchList.hs but having trouble creating core.

    HsRat _ fl ty    -> case fl of
                          FL { fl_signi = i, fl_exp = e } ->
                            return (mkApps (Var "makeRational") [(mkIntegerExpr i) (mkIntegerExpr e)])

I don't know how to make a Var. The String above is clearly wrong, but I couldn't find out how to construct one from scratch. I read through MkId.hs, Id.hs and Var.hs but it's not clear to me what I should be doing. Also, comment:24 refers to making "a new library function" which seems easy enough, but I don't know what a library function is (like... where should it go such that it gets picked up at runtime and registered in the list of available variables, available to core?).

comment:45 Changed 11 months ago by JulianLeviston

Nevermind, mpickering on irc helped me work out where to look. :)

comment:46 Changed 11 months ago by JulianLeviston

Still floundering :). Anyone have any suggestions about the following? it's still just as slow as before, it seems. (That's to say it's now TWICE as fast as the current release of GHC, but that's still around 4 seconds on my machine for :t 1e123456789 and 40 for one extra digit)

    HsRat _ fl _     -> case fl of
                          FL { fl_signi = fl_signi, fl_exp = fl_exp } -> do
                            mkRational <- dsLookupGlobalId mkRationalName
                            litI <- mkIntegerExpr fl_signi
                            litE <- mkIntegerExpr fl_exp
                            return ((Var mkRational) `App` litI `App` litE)
Last edited 11 months ago by JulianLeviston (previous) (diff)

comment:47 Changed 11 months ago by bgamari

Hmm, interesting. I would first start by running GHC with -v3 to make sure that the cost is still showing up in the same place. I would then try to build a profiled compiler and see more precisely where things are blowing up.

comment:48 Changed 11 months ago by JulianLeviston

Thanks for the direction. Apologies for taking so long to get back; I've had the flu.

-v3 output is fascinating (thanks, I wasn't aware of it), because nothing is marked as slow. I'll look into building a profiled compiler.

The pause happens immediately before 1e100000000 :: Fractional p => p is output. Here's the output for anyone following along's interest:

Expanse:ghc julian$ ./inplace/bin/ghc-stage2 -v3 -Rghc-timing -e ':t 1e100000000'
Glasgow Haskell Compiler, Version 8.7.20181012, stage 2 booted by GHC version 8.4.3
Using binary package database: /Users/julian/code/haskell/ghc/inplace/lib/package.conf.d/package.cache
package flags []
loading package database /Users/julian/code/haskell/ghc/inplace/lib/package.conf.d
wired-in package ghc-prim mapped to ghc-prim-0.5.3
wired-in package integer-wired-in mapped to integer-gmp-1.0.2.0
wired-in package base mapped to base-4.12.0.0
wired-in package rts mapped to rts
wired-in package template-haskell mapped to template-haskell-2.14.0.0
wired-in package ghc mapped to ghc-8.7
*** Parser [source]:
!!! Parser [source]: finished in 0.83 milliseconds, allocated 0.489 megabytes
*** Desugar:
*** Simplify [expr]:
!!! Simplify [expr]: finished in 0.64 milliseconds, allocated 0.367 megabytes
*** CorePrep [expr]:
!!! CorePrep [expr]: finished in 5.45 milliseconds, allocated 5.770 megabytes
*** ByteCodeGen [Ghci1]:
!!! ByteCodeGen [Ghci1]: finished in 0.52 milliseconds, allocated 0.488 megabytes
Loading package ghc-prim-0.5.3 ... linking ... done.
Loading package integer-gmp-1.0.2.0 ... linking ... done.
*** gcc:
gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -B/Users/julian/code/haskell/ghc/libraries/base/dist-install/build --print-file-name libiconv.dylib
*** gcc:
gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -B/Users/julian/code/haskell/ghc/libraries/base/dist-install/build --print-file-name liblibiconv.dylib
*** gcc:
gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -B/Users/julian/code/haskell/ghc/libraries/base/dist-install/build --print-file-name iconv.lib
*** gcc:
gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -B/Users/julian/code/haskell/ghc/libraries/base/dist-install/build --print-file-name libiconv.lib
*** gcc:
gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -B/Users/julian/code/haskell/ghc/libraries/base/dist-install/build --print-file-name libiconv.dll.a
*** gcc:
gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -B/Users/julian/code/haskell/ghc/libraries/base/dist-install/build --print-file-name iconv.dll.a
*** gcc:
gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -B/Users/julian/code/haskell/ghc/libraries/base/dist-install/build --print-file-name libiconv.a
*** gcc:
gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -B/Users/julian/code/haskell/ghc/libraries/base/dist-install/build --print-file-name iconv.a
*** gcc:
gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -B/Users/julian/code/haskell/ghc/libraries/base/dist-install/build --print-file-name libiconv
*** gcc:
gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -B/Users/julian/code/haskell/ghc/libraries/base/dist-install/build --print-file-name iconv
Loading package base-4.12.0.0 ... linking ... done.
Search directories (user):
Search directories (gcc):
*** Parser [source]:
!!! Parser [source]: finished in 0.15 milliseconds, allocated 0.153 megabytes
*** Desugar:
*** Simplify [expr]:
!!! Simplify [expr]: finished in 0.32 milliseconds, allocated 0.203 megabytes
*** CorePrep [expr]:
!!! CorePrep [expr]: finished in 0.16 milliseconds, allocated 0.112 megabytes
*** ByteCodeGen [Ghci1]:
!!! ByteCodeGen [Ghci1]: finished in 0.31 milliseconds, allocated 0.299 megabytes
*** Parser [source]:
!!! Parser [source]: finished in 0.20 milliseconds, allocated 0.215 megabytes
*** Desugar:
*** Simplify [expr]:
!!! Simplify [expr]: finished in 0.35 milliseconds, allocated 0.224 megabytes
*** CorePrep [expr]:
!!! CorePrep [expr]: finished in 0.15 milliseconds, allocated 0.089 megabytes
*** ByteCodeGen [Ghci1]:
!!! ByteCodeGen [Ghci1]: finished in 0.36 milliseconds, allocated 0.351 megabytes
*** Parser [source]:
!!! Parser [source]: finished in 0.09 milliseconds, allocated 0.043 megabytes
1e100000000 :: Fractional p => p
Leaving GHCi.
*** Deleting temp files:
Deleting: 
*** Deleting temp dirs:
Deleting: 
<<ghc: 223511272 bytes, 80 GCs, 12572972/49798120 avg/max bytes residency (9 samples), 154M in use, 0.000 INIT (0.003 elapsed), 4.102 MUT (4.434 elapsed), 0.172 GC (0.187 elapsed) :ghc>>

comment:49 Changed 11 months ago by JulianLeviston

I've been trying to build a profiled compiler.

I adjusted build.mk by changing the BuildFlavour thusly: BuildFlavour = prof and commented out the stage = 2setting I had. I also added this to the bottom, to put profiling hooks across the typechecker:

define add_mods_flag =
  $(foreach mod,$(2),$(eval $(basename $(mod))_HC_OPTS += $(1)))
endef

$(call add_mods_flag,-fprof-auto,$(wildcard compiler/typecheck/*.hs))

I keep getting this error after make clean && make -j4 in either the ghc/compiler or ghc/ directories:

GHC error in desugarer lookup in GHC.Real:
  Can't find interface-file declaration for type constructor or class mkRational
    Probable cause: bug in .hi-boot file, or inconsistent .hi file
    Use -ddump-if-trace to get an idea of which file caused the error
ghc-stage1: panic! (the 'impossible' happened)
  (GHC version 8.7.20181012 for x86_64-apple-darwin):
	initDs

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

What should I do next?

comment:50 Changed 10 months ago by osa1

I think profiled GHC may not help that much because we already have a good idea of what's going wrong.

I'd print litI and litE in the desugaring code you showed in comment:46 and see if any of the integers are huge. Remember that the root cause is that we build huge integers in compile time and we're trying to avoid that. If this is still taking too long then we somehow still build huge integers in compile time. Given that we only build two integers, one them them should be the culprit. Printing would show us which one, and then we can figure out the rest.

Can you push your code on a git repo and give us the remote/branch?

comment:51 Changed 10 months ago by potato44

I have found an old related (duplicate?) ticket.

comment:52 Changed 10 months ago by osa1

Good find. Indeed that's a duplicate.

comment:53 Changed 10 months ago by JulianLeviston

Yeah I'll put it up on a fork on github.

comment:54 Changed 10 months ago by JulianLeviston

Here's my PR to my forked GHC branch: https://github.com/JulianLeviston/ghc/pull/1

comment:55 Changed 10 months ago by JulianLeviston

I can't seem to successfully build my ghc any more...

GHC error in desugarer lookup in GHC.Real:
  Can't find interface-file declaration for type constructor or class mkRational
    Probable cause: bug in .hi-boot file, or inconsistent .hi file
    Use -ddump-if-trace to get an idea of which file caused the error
ghc-stage1: panic! (the 'impossible' happened)
  (GHC version 8.7.20181012 for x86_64-apple-darwin):
	initDs

after I attempted to do the profiler build... that's from attempting a BuildFlavour = devel2 build without stage=2 specified. I might follow the build instructions from the ground up again without my code in place then put it back and see if that works.

comment:56 Changed 10 months ago by JulianLeviston

I got my build building again. I'm not sure why my interface files went out of sync. I guess I should re-read and watch the videos about the pipelines and stages.

comment:57 Changed 10 months ago by osa1

Cc: osa1 added

JulianLeviston, any news? I'll try to build your branch and see what's going wrong tomorrow. (CCing myself to add this to my ticket list)

comment:58 Changed 10 months ago by JulianLeviston

Sorry I've been distracting myself with thinking about 15617 and 3372. I'll switch back to this.

comment:59 Changed 10 months ago by JulianLeviston

So I tried to print out the values you suggested. I can't print out litE and litI because they don't have a show instance for Core, but I tried to print out the Integer values fl_signi and fl_exp that I use to build those Core expressions with:

    import IOEnv (liftIO)
    ...
    HsRat _ fl _     -> case fl of
                          FL { fl_signi = fl_signi, fl_exp = fl_exp } -> do
                            mkRational <- dsLookupGlobalId mkRationalName
                            litI <- mkIntegerExpr fl_signi
                            litE <- mkIntegerExpr fl_exp
                            -- temporary logging
                            liftIO $ putStrLn $ "fl_signi: " ++ show fl_signi
                            liftIO $ putStrLn $ "fl_exp: " ++ show fl_exp
                            return ((Var mkRational) `App` litI `App` litE)

That compiles, but unfortunately when I try to run it, it blows up with the same interface file stuff. I guess this is because I renumbered RubbishLit here: https://github.com/JulianLeviston/ghc/pull/1/files#diff-78b3c572cf078eb00f09974b18f3eedfR214 — I'm really interested in suggestions about a better way to do that. It didn't seem good to put it after (it's out of order then). But I'm not sure what changes I need to make if I'm going to adjust the interface file(s) as I seem to have done by doing that? (bit lost there).

So... yeah, this is what happens when I run it:

itsy:ghc julianleviston$ ./inplace/bin/ghc-stage2 -e ":t 1e302"
1e302 :: Fractional p => p
itsy:ghc julianleviston$ ./inplace/bin/ghc-stage2 -e "1e302"
GHC error in desugarer lookup in Ghci1:
  Can't find interface-file declaration for type constructor or class mkRational
    Probable cause: bug in .hi-boot file, or inconsistent .hi file
    Use -ddump-if-trace to get an idea of which file caused the error

comment:60 Changed 10 months ago by osa1

I can't print out litE and litI because they don't have a show instance for Core

In GHC you should be using Outputable instances (with pprTrace and frieds). Most GHC types don't have a Show instance, but most have Outputable. In this case pprTrace "..." (text "litI:" <+> ppr litI <+> "litE:" <+> ppr litE) should work.

So... yeah, this is what happens when I run it:

Right, so the problem is in your mkRational name definition:

mkRationalName      = tcQual  gHC_REAL (fsLit "mkRational")   mkRationalIdKey

You're saying that mkRational lives in GHC.Real, but that's not the case. It actually lives in the GHC module BasicTypes.

We need to put mkRational to somehwere in base. GHC.Real is a good place I think.

Secondly, I see some uses of mkRational in BasicTypes. Those also need to be moved, because when booting GHC, when building stage 1, we'll be using a GHC that won't have mkRational. So all those definitions that use mkRational need to be moved to GHC.Real too.

Does this make sense?

Last edited 10 months ago by osa1 (previous) (diff)

comment:61 Changed 10 months ago by JulianLeviston

Awesome explanations. Makes sense. Thanks! Having a go at it now.

comment:62 Changed 10 months ago by JulianLeviston

Ok, so I compiled and added in the line:

pprTrace "..." (text "litI:" <+> ppr litI <+> text "litE:" <+> ppr litE) $ pure ()

... however that's in MatchLit, so doesn't get executed when typechecking, only when evaluating. Makes sense because if I typecheck, I still get the slow behaviour, but if I evaluate, I get that error (coz I still haven't updated the location of mkRational yet).

So, this also seems correct, given I have made zero changes in the typechecker so far. So I'm looking in to that.

I hope I'm on the right track. I ran it with ./inplace/bin/ghc-stage2 -e ":t 1e289" -ddump-parsed-ast and it responded with something that ended with:

==================== Parser AST ====================

(Just
 ({ <interactive>:1:1-5 }
  (BodyStmt
   (NoExt)
   ({ <interactive>:1:1-5 }
    (HsOverLit
     (NoExt)
     (OverLit
      (NoExt)
      (HsFractional
       (FL
        (SourceText
         "1e289")
        (False)
        (1)
        (289)
        (Base10)))
      (HsLit
       (NoExt)
       (HsString
        (SourceText
         "noExpr")
        {FastString: "noExpr"})))))
   (SyntaxExpr
    (HsLit
     (NoExt)
     (HsString
      (NoSourceText)
      {FastString: "noSyntaxExpr"}))
    []
    (WpHole))
   (SyntaxExpr
    (HsLit
     (NoExt)
     (HsString
      (NoSourceText)
      {FastString: "noSyntaxExpr"}))
    []
    (WpHole)))))

And, seeing HsOverLit in this output, and looking at tcExpr in compiler/typecheck/TcExpr.hs, it seems to match the HsOverLit clause, which calls newOverloadedLit from compiler/typechecker/Inst.hs, which I'm not 100% sure, but it seems to be *not* matching on a short cut clause, and so applying newNonTrivialOverloadedLit.

I'll take another pass at understanding it all again tomorrow.

comment:63 Changed 9 months ago by JulianLeviston

I've been trying to move mkRational, but it doesn't seem to make any sense to put it into base itself... (ie GHC.Real). If we put it into there, won't it expose it to base? It seems like a strange thing to do, to me.

The reason it needs to be exposed *somewhere* is that it's looked up globally by id, here: https://github.com/JulianLeviston/ghc/pull/1/files#diff-116aa33edfc74c0dc68e471f30ac63d5R101 but perhaps we don't actually need to look it up there? I'm pretty lost.

Here's the code... it's the bottom of the dsLit function in compiler/deSugar/Matchlit.hs:

    HsRat _ fl _     -> case fl of
                          FL { fl_signi = fl_signi, fl_exp = fl_exp } -> do
                            mkRational <- dsLookupGlobalId mkRationalName
                            litI <- mkIntegerExpr fl_signi
                            litE <- mkIntegerExpr fl_exp
                            return ((Var mkRational) `App` litI `App` litE)

comment:64 Changed 9 months ago by osa1

As discussed on IRC, try to get it compiling for now and then we can worry about where to put the functions. In this line you specify mkRationals module as GHC.Real so that's where it should be. When you do that your dsLookupGlobalId call should work.

comment:65 Changed 9 months ago by JulianLeviston

As discussed on IRC, I encountered an issue where moving the code into GHC.Real doesn't help because Stage 1 is not using *that* GHC.Real, it's using the module from the GHC being used to compile the target GHC rather than the target GHC itself.

So, what I'm going to do is spend a little bit of time to summarize my understanding of exactly what my attempt is currently and how to understand the code I've written so that we're all on the same page and others can help me more easily. Stay tuned :)

comment:66 Changed 9 months ago by JulianLeviston

Here is the current summary of what I've been intending to do:

Problem is excessive duration of typechecking large fractionals.

Initial issue was renamer and desugarer were constructing the rational before typechecking, which was taking large amounts of time and space.

The implemented plan was to adjust FractionalLit to delay evaluation until the last possible moment by storing the significand and the exponent, as well as the FractionalExponentBase (because the fractional *may* be hex or dec) instead of just a rational, and to parse the source as an application of a new library functions mkFractionalLit and mkTHFractionalLit. In the process we also added a new core expression mkRationalExpr

Template haskell still uses a rational to represent its fractional lits, so we have *two* constructors in the FractionalLit type: FL and THFL (one for regular literals and one for Template Haskell literals).

We also had to adjust the lexer in the parser to deal with parsing the values into the new forms.

Last edited 9 months ago by JulianLeviston (previous) (diff)

comment:67 Changed 9 months ago by JulianLeviston

The current problems are:

##Problem 1##

As we need to store an application of mkRational upon desugaring, we need to have it exist in some library. Unfortunately, for Stage 1 compilation, the library functions we have access to seem to be only those that exist in the *GHC that is compiling* rather than the *GHC that is being compiled*.

This means the reference to mkRationalName that I've added to PrelNames.hs doesn't work, because we can't refer to any module that has the mkRational function in it; none exist. Of course, there is every chance we're not aware of a way to inject the bootstrapped code in — how does one refer to new library functions?

Note: initially, and in the current fork as I've written it, I refer to it via gHC_REAL but that obviously doesn't work.

##Problem 2##

Typechecking is still slow (less slow, granted, but only a little). I think this makes sense, as I haven't actually adjusted any typechecker code yet. I've been looking into TcExpr.hs which has the tcExpr function, but I need to figure through that code.

comment:68 Changed 9 months ago by JulianLeviston

(deleted)

Last edited 9 months ago by JulianLeviston (previous) (diff)

comment:69 Changed 9 months ago by JulianLeviston

(deleted)

Last edited 9 months ago by JulianLeviston (previous) (diff)

comment:70 Changed 9 months ago by mpickering

Julian, can you please paste the exact error you are experiencing as I don't see how Problem 1 can arise.

When we bootstrap GHC, we first compile stage1 with stage0. So no problems here as the call to lookup mkRational is made when stage1 is run. When we build stage2, we first build prelude with stage1, including GHC.Real so there might be some problems here if a module needs to call mkRational before GHC.Real is compiled but this is hard to debug without any error messages or other information to be going from!

I think at this point it would be very useful if you put your code up as a MR so that people can comment directly about what needs doing.

comment:71 Changed 9 months ago by JulianLeviston

This is the error (./boot && ./configure && make -j4):

WARNING: file compiler/iface/LoadIface.hs, line 428 GHC.Real
GHC error in desugarer lookup in GHC.Real:
  Can't find interface-file declaration for variable mkRational
    Probable cause: bug in .hi-boot file, or inconsistent .hi file
    Use -ddump-if-trace to get an idea of which file caused the error
ghc-stage1: panic! (the 'impossible' happened)
  (GHC version 8.7.20181219 for x86_64-apple-darwin):
	initDs

Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug

make[1]: *** [libraries/base/dist-install/build/GHC/Real.o] Error 1
make: *** [all] Error 2

I'll make an MR now...

comment:73 Changed 9 months ago by JulianLeviston

It's worth noting that this MR doesn't include the changes where I attempted to move the functions into GHC.Real. I stopped doing that because it would have involved somehow getting the FractionalLit type in, too, and I didn't know how to do that while also having it in BasicTypes.

comment:74 Changed 9 months ago by JulianLeviston

Rebased the MR.

comment:75 Changed 9 months ago by JulianLeviston

Hm. Seems I didn't fully rebase properly. Trying again now.

comment:76 Changed 9 months ago by JulianLeviston

Ok. Back to having the error above again, now. I'll try moving all the types and functions around FractionalLit into GHC.Real again now, and hope that importing them into BasicTypes makes sense from there (seems weird to me, but I dno't really understand how it works fully, I guess — feels like I'm trying random stuff without understanding it, which is definitely not how I like to work).

comment:77 Changed 9 months ago by JulianLeviston

moving the functions and types relating to FractionalLit causes a bunch of errors after running ./boot && ./configure && make -j4 ... I'll paste them below.

It seems like the GHC.Real that's imported into BasicTypes is not the one that I've added things to in libraries/base/GHC/Real.hs however after talking to bgamari the other day he seemed to be sugesting that this was the only one that gets bootstrapped, so I'm pretty confused. All I did was change the import to import GHC.Real hiding (infinity) and I would have thought it'd import everything needed. Obviously something's wrong with my assumption there somewhere, because FractionalLit is *definitely* at the bottom of the GHC.Real module, and AFAICS it's exporting everything (no explicit exports)

Can anyone give me some direction about how I can understand enough to learn how to understand what's wrong here, and how to fix it?

compiler/basicTypes/BasicTypes.hs:100:26: error:
    Not in scope: type constructor or class ‘FractionalLit’
    Perhaps you meant ‘Fractional’ (imported from GHC.Real)
    |
100 |         IntegralLit(..), FractionalLit(..), FractionalExponentBase(..),
    |                          ^^^^^^^^^^^^^^^^^

compiler/basicTypes/BasicTypes.hs:100:45: error:
    Not in scope: type constructor or class ‘FractionalExponentBase’
    |
100 |         IntegralLit(..), FractionalLit(..), FractionalExponentBase(..),
    |                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^

compiler/basicTypes/BasicTypes.hs:101:28: error:
    Not in scope: ‘negateFractionalLit’
    |
101 |         negateIntegralLit, negateFractionalLit,
    |                            ^^^^^^^^^^^^^^^^^^^

compiler/basicTypes/BasicTypes.hs:102:24: error:
    Not in scope: ‘mkFractionalLit’
    |
102 |         mkIntegralLit, mkFractionalLit, mkTHFractionalLit, rationalFromFractionalLit,
    |                        ^^^^^^^^^^^^^^^

compiler/basicTypes/BasicTypes.hs:102:41: error:
    Not in scope: ‘mkTHFractionalLit’
    |
102 |         mkIntegralLit, mkFractionalLit, mkTHFractionalLit, rationalFromFractionalLit,
    |                                         ^^^^^^^^^^^^^^^^^

compiler/basicTypes/BasicTypes.hs:102:60: error:
    Not in scope: ‘rationalFromFractionalLit’
    |
102 |         mkIntegralLit, mkFractionalLit, mkTHFractionalLit, rationalFromFractionalLit,
    |                                                            ^^^^^^^^^^^^^^^^^^^^^^^^^

compiler/basicTypes/BasicTypes.hs:103:9: error:
    Not in scope: ‘integralFractionalLit’
    |
103 |         integralFractionalLit,
    |         ^^^^^^^^^^^^^^^^^^^^^
<<ghc: 241948240 bytes, 67 GCs, 9308388/20905880 avg/max bytes residency (5 samples), 60M in use, 0.001 INIT (0.004 elapsed), 0.185 MUT (0.215 elapsed), 0.177 GC (0.194 elapsed) :ghc>>
make[1]: *** [compiler/stage1/build/BasicTypes.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [all] Error 2

comment:78 Changed 9 months ago by JulianLeviston

I duplicated the library code into both GHC.Real and left it in BasicTypes.hs as well as per osa1's suggestion on IRC.

Got this error after running ./boot && ./configure && make -j4

libraries/ghc-prim/dist-install/build/GHC/PrimopWrappers.hs:5:1: error:
    Bad interface file: libraries/ghc-prim/dist-install/build/GHC/Tuple.hi
        mismatched interface file versions (wanted "80720181230", got "80720181219")
  |
5 | import GHC.Tuple ()
  | ^^^^^^^^^^^^^^^^^^^
make[1]: *** [libraries/ghc-prim/dist-install/build/GHC/PrimopWrappers.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [all] Error 2

comment:79 Changed 9 months ago by ulysses4ever

Just in case, doing make maintainer-clean sometimes helps when changes don't seem to propagate correctly.

comment:80 Changed 9 months ago by JulianLeviston

Thanks os1. I did make clean. Then the error became:

libraries/base/GHC/Real.hs:839:21: error:
    Not in scope: type constructor or class ‘SourceText’
    |
839 |   = FL { fl_text :: SourceText                 -- How the value was written in the source
    |                     ^^^^^^^^^^

libraries/base/GHC/Real.hs:849:25: error:
    Not in scope: type constructor or class ‘SourceText’
    |
849 |   | THFL { thfl_text :: SourceText -- How the value was written in the source
    |                         ^^^^^^^^^^

and a bunch more errors of a similar type... at which point you suggested removing SourceText, but that'd require rewriting all the functions, so ... yeah, now I believe you're looking up the ticket again and refreshing your brain with it :) :thx:

comment:81 Changed 9 months ago by ulysses4ever

Is the last version you are talking about on Gitlab already? Cause I'm getting the same Can't find interface-file declaration for variable mkRational on your branch currently.

Last edited 9 months ago by ulysses4ever (previous) (diff)

comment:82 Changed 9 months ago by JulianLeviston

It's currently on gitlab, ulysses4ever

comment:83 Changed 9 months ago by osa1

I added a comment on gitlab MR. Let's maybe continue discussing there as it also shows the code and makes things easier.

comment:84 Changed 9 months ago by JulianLeviston

In the process of finalising the ticket. Working on test failures and adding an additional test. Shouldn't be too long now :)

comment:85 Changed 8 months ago by JulianLeviston

Test Case: T15646
Note: See TracTickets for help on using tickets.