Opened 12 years ago

Closed 8 years ago

#1607 closed bug (fixed)

seq can make code slower

Reported by: simonpj Owned by: igloo
Priority: low Milestone: 7.2.1
Component: Compiler Version: 6.6.1
Keywords: Cc: michal.terepeta@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Adrian Hey produced a program that goes a little slower when he added "!" annotations to his data constructors. This was a surprise to me, but I understand why now. This bug report just captures the problem so that I don't forget about it. I'm treating it as a "bug" because it really is a performance bug.

The problem turns out to be this. In the attached code, genPush looks like this:

 put  E        = Z    E e0 E
 put (N l e r) = putN l e  r
 put (Z l e r) = putZ l e  r
 put (P l e r) = putP l e  r

With a strict data type, we know that 'r' is evaluated, but putN (although strict) does not know that its argument is evaluated, so it does an extra and unnecessary eval.

There are two difficulties. First, most of the time we don’t want to require that a strict function gets an evaluated argument. E.g. map is strict, but it does its own 'case' on the list arg, and we don't want the caller to have to do that too. What we're trying to catch is the calls to 'seq' when the caller already knows that.

If we could spot these cases, then we could w/w the function to expose the 'seq' to the caller. I'm not sure how we'd explain to the callee that the arg was definitely evaluated.

How to spot the cases? We want to spot function bodies that only 'seq' their argument without doing a proper case or function applications. Sounds like enriching the demand domain (again!).

Simon

The attached code demonstrates the problem. There are two versions of AVL routines..

AVL.hs (no strictness annotation in data type, explicit seqs instead) StrictAVL.hs (uses strictness annotation instead of seqs)

Running BUILD.BAT generates 2 executables..

Lazy.exe    which uses AVL.hs
Strict.exe  which uses StrictAVL.hs

These test the execution times of the genWriteFast and genPush functions defined in modules AVL.hs and StrictAVL.hs respectively.

For genWriteFast the two tests give virtually identical times (as expected). But for some reason the version of genPush using strictness annotation instead of explicit seqs seems to take about 15% longer.

The size of the object file seems a bit bigger too.

The fact that genWriteFast times are the same seems to indicate that my original hypothesis is wrong, but there's something strange about the genPush from StrictAVL.hs.

Attachments (1)

StrictFieldsTest.zip (10.4 KB) - added by simonpj 12 years ago.

Download all attachments as: .zip

Change History (13)

Changed 12 years ago by simonpj

Attachment: StrictFieldsTest.zip added

comment:1 Changed 12 years ago by simonpj

Component: Compilerlibraries/haskell98
Milestone: 6.1

comment:2 Changed 12 years ago by simonpj

Component: libraries/haskell98Compiler

comment:3 Changed 11 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:4 Changed 11 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:5 Changed 10 years ago by igloo

Milestone: 6.10 branch6.12 branch

comment:6 Changed 9 years ago by igloo

Milestone: 6.12 branch6.12.3

comment:7 Changed 9 years ago by igloo

Milestone: 6.12.36.14.1
Priority: normallow

comment:8 Changed 9 years ago by michalt

Cc: michal.terepeta@… added
Type of failure: Runtime performance bug

I've just tried the attached tests and I'm really not sure what to think about the results:

Lazy tests
writeTime = 928
pushTime = 929
Strict tests
writeTime = 915
pushTime = 959
Lazy tests
writeTime = 931
pushTime = 928
Strict tests
writeTime = 918
pushTime = 961

So genWriteFast is a little bit faster with the strict annotations, which would be a counter-example to your hypothesis. But at the same time the genPush is slower with strict data type, which would agree with your initial idea. And this is because seq not applied to both children:

putNL (N ll le lr) e r = let l' = putN ll le lr
                         in l' `seq` N l' e r

Notice that only l' is forced to evaluate. Whereas in case of strict AVL data type when we have

putNL (N ll le lr) e r = N (putN ll le lr) e r

I guess both putN ... and r be evaluated (i.e. as you've mentioned in the description - N doesn't know that r is already evaluated). Adding additional seq in the "Lazy" version:

putNL (N ll le lr) e r = let l' = putN ll le lr
                          in r `seq` l' `seq` N l' e r

makes the running times for genPush pretty much the same in both cases.

Anyway, there is also some good news - with current HEAD it seems that the differences are nowhere near 15% and the sizes of binaries are the almost the same. :)

comment:9 Changed 9 years ago by igloo

Milestone: 7.0.17.0.2

comment:10 Changed 9 years ago by simonpj

Owner: set to igloo

Ian: I wonder if you could try to characterise this more closely, or alternativley decide that it doesn't matter? Thanks.

Simon

comment:11 Changed 9 years ago by igloo

Milestone: 7.0.27.2.1

comment:12 Changed 8 years ago by igloo

Resolution: fixed
Status: newclosed

Stats from 10 runs:

Lazy tests
writeTime
c(2132, 2129, 2124, 2122, 2162, 2140, 2139, 2127, 2151, 2132)
    mean = 2135.8  sd = 12.59453
pushTime
c(2252, 2132, 2247, 2107, 2101, 2107, 2116, 2062, 2180, 2122)
    mean = 2142.6  sd = 63.48438

Strict tests
writeTime
c(2152, 2120, 2175, 2169, 2151, 2006, 2007, 2168, 2168, 2143)
    mean = 2125.9  sd = 64.94861
pushTime
c(2157, 2125, 2197, 2092, 2148, 2076, 2164, 2163, 2128, 2144)
    mean = 2139.4  sd = 35.72798

Looks like the strict code is now actually slightly faster.

Note: See TracTickets for help on using tickets.