Opened 10 years ago

Closed 8 years ago

#3629 closed bug (fixed)

Code compiled WITHOUT profiling many times slower than compiled WITH profiling on

Reported by: gchrupala Owned by:
Priority: low Milestone: 7.2.1
Component: Compiler Version: 6.13
Keywords: Cc: aruiz, michal.terepeta@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I have a program which runs extremely slow when I compile it with profiling disabled. It only becomes usable when compiled with profiling options on (-prof -auto-all).

I reproduced it with GHC 6.10, 6.12 and 6.13.

The source is attached.

I compiled and ran like this:

ghc-6.13.20091027 --make -O2 runST.hs -fforce-recomp
time ./runST > /dev/null
real	5m9.670s
user	5m7.627s
sys	0m0.468s

ghc-6.13.20091027 --make -O2 runST.hs -fforce-recomp -prof -auto-all
real	0m39.544s
user	0m39.050s
sys	0m0.148s

In the meantime, is there is a workaround other than compiling with profiling options on? I would prefer to modify the source rather than ask users to install profiling libraries and mess with compiler options.

Best,

-- Grzegorz Chrupala

Attachments (1)

src-bug.tgz (47.7 KB) - added by gchrupala 10 years ago.

Download all attachments as: .zip

Change History (15)

Changed 10 years ago by gchrupala

Attachment: src-bug.tgz added

comment:1 Changed 10 years ago by igloo

difficulty: Unknown
Milestone: 6.12 branch

Thanks for the report. There's 256k of code there; if you are able to boil it down to a smaller testcase then that would make it easier to investigate what's going on.

comment:2 Changed 10 years ago by gchrupala

Actually most of those bytes is a data file, but sure, I'll see if I can reduce the code more.

comment:3 Changed 10 years ago by aruiz

Cc: aruiz added
Type of failure: None/Unknown

I have a similar problem using foreign calls (in the hmatrix library). A program is 10x slower without profiling (and in ghci) than compiled with profiling on. This happens in ghc-6.12.1 (ghc-6.12.1-i386-unknown-linux-n.tar.bz2 on Ubuntu ), but not in ghc-6.10. I include a small test case.

We need:

$ sudo apt-get install libgsl0-dev liblapack-dev

$ cabal install hmatrix -p

$ cat slow.hs

import Numeric.LinearAlgebra          

m = ident 2000 :: Matrix Double

main = print $ vectorMax (takeDiag (m <> m))

With ghc-6.12.1

$ ghc -V           
The Glorious Glasgow Haskell Compilation System, version 6.12.1

$ ghc --make slow.hs -fforce-recomp
[1 of 1] Compiling Main             ( slow.hs, slow.o )     
Linking slow ...                                            

$ time ./slow 
1.0                                    

real    1m15.790s
user    1m15.489s
sys     0m0.228s 

$ ghc --make slow.hs -prof -fforce-recomp
[1 of 1] Compiling Main             ( slow.hs, slow.o )           
Linking slow ...                                                  

$ time ./slow 
1.0                                    

real    0m7.366s
user    0m7.264s
sys     0m0.100s

With ghc-6.10.3

$ ghc -V
The Glorious Glasgow Haskell Compilation System, version 6.10.3

$ ghc --make slow.hs -fforce-recomp
[1 of 1] Compiling Main             ( slow.hs, slow.o )
Linking slow ...

$ time ./slow
1.0

real    0m7.142s
user    0m7.060s
sys     0m0.072s

$ ghc --make slow.hs -prof -fforce-recomp
[1 of 1] Compiling Main             ( slow.hs, slow.o )
Linking slow ...

$ time ./slow
1.0

real    0m7.242s
user    0m7.132s
sys     0m0.096s

Thanks,

Alberto Ruiz

comment:4 Changed 10 years ago by igloo

With default builds from source tarballs, I can reproduce this with 6.10.4 but not HEAD:

ProfilingNormal
6.10.40:44.889:49.92
6.13.2010030314:13.918:23.82

Also, I get different output (on stderr, I assume) for the two compilers. gchrupala, what output do you expect?

comment:5 Changed 10 years ago by gchrupala

So with HEAD, both profiled and regular code runs much slower than profiled 6.10.4. That's a pity..

I checked the differing output for the two compilers and it's because the input file is in UTF-8 and the unicode IO changed from 6.10 to 6.13. If you remove non-ascii characters from the input file there is no difference in output.

comment:6 Changed 10 years ago by simonmar

We suggest trying the following (with both 6.10.4 and 6.13) to see if they make any difference:

  • -fno-state-hack (see #1168)
  • -fno-cse
  • -fno-full-laziness

all 3 of these "optimizations" can introduce slowdowns as well as speedups, so it may be that this program is an example of that.

comment:7 Changed 10 years ago by igloo

Doesn't look like any of those is the answer.

comment:8 Changed 9 years ago by igloo

Milestone: 6.12 branch6.12.3

comment:9 Changed 9 years ago by igloo

Milestone: 6.12.36.14.1
Priority: normallow

comment:10 Changed 9 years ago by igloo

Milestone: 7.0.17.0.2

comment:11 Changed 9 years ago by igloo

Milestone: 7.0.27.2.1

comment:12 Changed 8 years ago by michalt

Cc: michal.terepeta@… added
Status: newinfoneeded
Type of failure: None/UnknownCompile-time performance bug

Seems that the issue is fixed. GHC 7.0.3 without profiling:

..s/ghc/3629/src-bug 0 > time ./runST > /dev/null
Iter       Acc-train  P-train    R-train    F-train    Acc-test   P-test     R-test     F-test    
1          0.7098     0.7836     0.3571     0.4907     0.7098     0.7836     0.3571     0.4907    

real    0m9.713s
user    0m9.670s
sys     0m0.010s

GHC 7.0.3 with profiling:

..s/ghc/3629/src-bug 0 > time ./runST > /dev/null
Iter       Acc-train  P-train    R-train    F-train    Acc-test   P-test     R-test     F-test    
1          0.7098     0.7836     0.3571     0.4907     0.7098     0.7836     0.3571     0.4907    

real    0m19.390s
user    0m19.290s
sys     0m0.040s

And on current HEAD without profiling:

..s/ghc/3629/src-bug 1 > time ./runST > /dev/null
Iter       Acc-train  P-train    R-train    F-train    Acc-test   P-test     R-test     F-test    
1          0.7098     0.7836     0.3571     0.4907     0.7098     0.7836     0.3571     0.4907    

real    0m6.792s
user    0m6.700s
sys     0m0.070s

I don't have hmatrix installed (it requires some external libraries) so I didn't check the other example --- it would be great if someone could try it.

comment:13 in reply to:  12 Changed 8 years ago by aruiz

The hmatrix example runs fast in GHC 7.0.3 both with and without profiling, so I also think that the issue is fixed.

comment:14 Changed 8 years ago by igloo

Resolution: fixed
Status: infoneededclosed

Thanks guys!

Note: See TracTickets for help on using tickets.