Opened 20 months ago

Last modified 16 months ago

#14929 new bug

Program compiled with -O2 exhibits much worse performance

Reported by: mpickering Owned by:
Priority: high Milestone:
Component: Compiler Version: 8.2.2
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

A user on reddit reports that compiling the program with -O2 makes his program use a lot more memory.

https://www.reddit.com/r/haskell/comments/84qovv/an_example_of_code_where_ghcs_o2_makes_things/

It runs in constant memory without -O2 and leaks memory with `-O2. It seems worth investigating as the example is quite small (< 1000 lines) and self-contained.

https://github.com/luispedro/TestingNGLESS

Change History (4)

comment:1 Changed 20 months ago by bgamari

Priority: normalhigh
Type of failure: None/UnknownRuntime performance bug

comment:2 Changed 16 months ago by George

According to https://www.reddit.com/r/haskell/comments/84qovv/an_example_of_code_where_ghcs_o2_makes_things/ it's now down to 60 lines but I don't see any way to get those 60 lines: https://www.reddit.com/r/haskell/comments/84qovv/an_example_of_code_where_ghcs_o2_makes_things/dvshdb8

Also according to that page, compiling with -fno-full-laziness is a workaround

comment:3 Changed 16 months ago by sgraf

Retainer profiling shows mentions this CAF:

-- RHS size: {terms: 2, types: 3, coercions: 0, joins: 0/0}
lvl24_r7ue
  :: conduit-1.2.13:Data.Conduit.Internal.Pipe.Pipe
       B.ByteString B.ByteString Data.Void.Void () (ResourceT IO) ()
[GblId]
lvl24_r7ue
  = scc<interpretTop>
    scc<parseFile>
    scc<interpretTop>
    scc<parseFile>
    scc<>>=>
    scc<>>=.\>
    scc<>>=.\.\>
    scc<>>=>
    scc<>>=.\>
    scc<>>=.\.\>
    scc<interpretTop>
    scc<parseFile>
    scc<>>=>
    scc<>>=.\>
    scc<>>=.\.\>
    tick<>>=>
    scc<>>=>
    scctick<>>=.\>
    countC_r7tY @ Data.Void.Void @ B.ByteString @ () lvl23_r7ud

-fno-full-laziness is a work-around, but maybe we should have a more granular way to influence float out. We probably still want to float lambdas, for example.

comment:4 Changed 16 months ago by simonpj

I have not looked at the code, but full laziness is definitely capable of increasing space usage. Consider even

f xs = sum [x+n | n <- [1..], x <- xs]

Full laziness will turn this into

ns = [1..]
f xs = sum [x+n | n<-ns, x<-xs]

which will retain a top-level CAF whose length is the longest list ever passed to f. In this case it is probably better to re-generate the list [1..] on every call, but it's not so clear if it is map expensive [1..].

The OP doesn't say if the same effect happens with -O. Is there something -O2 specific going on, I wonder?

Regardless

  • Extracting the test case and uploading it here with repro instructions would be good
  • More specific insight into exactly what is happening would be good
Note: See TracTickets for help on using tickets.