Opened 6 years ago

Last modified 6 years ago

#8173 new bug

GHC uses nub

Reported by: nh2 Owned by: leroux
Priority: low Milestone:
Component: Compiler Version: 7.6.3
Keywords: Cc: mail@…, leroux@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

nub is O(n²).

I bet all usages have Ord instances.

https://github.com/nh2/haskell-ordnub

Attachments (4)

0001-Replace-most-nub-uses-with-ordNub-in-compiler.patch (18.1 KB) - added by leroux 6 years ago.
Initial ordNub patch
ordNub-analysis (165.8 KB) - added by leroux 6 years ago.
nofib-analyse of HEAD and ordNub-patched
0001-ordNub-Pattern-match-trival-cases.patch (1.2 KB) - added by leroux 6 years ago.
full-ordNub-analysis.tar.gz (22.9 KB) - added by leroux 6 years ago.

Download all attachments as: .zip

Change History (20)

comment:2 Changed 6 years ago by leroux

Cc: leroux@… added

comment:4 Changed 6 years ago by simonpj

By all means add nubOrd to compiler/utils and change calls to nub in GHC. Could you check what effect doing so has on the performance of the compiler? thanks!

Simon

comment:5 Changed 6 years ago by leroux

Related: Proposal to add sortNub (http://ghc.haskell.org/trac/ghc/ticket/1218) The only reason I mention this is because the compiler uses sort . nub in a few places.

comment:6 Changed 6 years ago by leroux

Owner: set to leroux

Changed 6 years ago by leroux

Initial ordNub patch

comment:7 Changed 6 years ago by leroux

Patch: attachment:0001-Replace-most-nub-uses-with-ordNub-in-compiler.patch​ (just an initial point to start at)

Based on nofib-analyse attachment:ordNub-analysis​, it seems that it may be a good idea to selectively replace nub's where it is applied over larger lists so that it cuts down on overhead.

Snippet from nofib-analysis

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
           anna          +0.0%     +0.0%      0.11      0.12     +0.0%
           ansi          +0.0%     +0.0%      0.00      0.00     +0.0%
           atom          +0.0%     +0.0%     -0.9%     -0.4%     +0.0%
         awards          +0.0%     +0.0%      0.00      0.00     +0.0%
         banner          +0.0%     +0.0%      0.00      0.00     +0.0%
     bernouilli          +0.0%     +0.0%      0.17      0.18     +0.0%
   binary-trees          +0.0%     +0.0%     -7.3%     -8.8%     +0.0%
          boyer          +0.0%     +0.0%      0.04      0.05     +0.0%
         boyer2          +0.0%     +0.0%      0.00      0.00     +0.0%
           bspt          -0.1%     +0.0%      0.00      0.01     +0.0%
      cacheprof          +0.1%     +0.0%     -3.2%     -2.1%     +0.9%
       calendar          +0.0%     +0.0%      0.00      0.00     +0.0%
       cichelli          +0.0%     +0.0%      0.08      0.09     +0.0%
        circsim          +0.0%     +0.0%     +6.0%     +6.1%     +0.0%
       clausify          +0.0%     +0.0%      0.04      0.05     +0.0%
  comp_lab_zift          +0.0%     +0.0%      0.20     +5.0%     +0.0%
       compress          +0.0%     +0.0%      0.17      0.18     +0.0%
      compress2          +0.0%     +0.0%      0.16      0.17     +0.0%
    constraints          +0.0%     +0.0%     +1.7%     +2.0%     +0.0%
   cryptarithm1          +0.0%     +0.0%     -2.5%     -2.4%     +0.0%
   cryptarithm2          +0.0%     +0.0%      0.01      0.01     +0.0%
            cse          +0.0%     +0.0%      0.00      0.00     +0.0%
          eliza          +0.0%     +0.0%      0.00      0.00     +0.0%
          event          +0.0%     +0.0%      0.14      0.15     +0.0%
         exp3_8          -0.1%     +0.0%     +1.9%     +1.8%     +0.0%
         expert          +0.0%     +0.0%      0.00      0.00     +0.0%
 fannkuch-redux          -0.1%     +0.0%     +0.9%     +1.0%     +0.0%
          fasta          -0.1%     +0.0%     +2.7%     +4.5%     +0.0%
            fem          +0.0%     +0.0%      0.02      0.03     +0.0%
            fft          -0.1%     +0.0%      0.03      0.04     +0.0%
           fft2          +0.0%     +0.0%      0.05      0.05     +0.0%
       fibheaps          +0.1%     +0.0%      0.03      0.03     +0.0%
           fish          +0.0%     +0.0%      0.01      0.02     +0.0%
          fluid          +0.0%     +0.0%      0.01      0.01     +0.0%
         fulsom          +0.1%     +0.0%     +1.3%     +3.2%     -1.3%
         gamteb          +0.0%     +0.0%      0.05      0.05     +0.0%
            gcd          +0.0%     +0.0%      0.03      0.03     +0.0%
    gen_regexps          +0.0%     +0.0%      0.00      0.00     +0.0%
         genfft          -0.1%     +0.0%      0.03      0.03     +0.0%
             gg          +0.0%     +0.0%      0.00      0.02     +0.0%
           grep          +0.0%     +0.0%      0.00      0.00     +0.0%
         hidden          +0.0%     +0.0%     +2.5%     +1.9%     +0.0%
            hpg          +0.0%     +0.0%      0.10     +1.8%     +0.0%
            ida          +0.0%     +0.0%      0.08      0.08     +0.0%
          infer          +0.0%     +0.0%      0.06      0.07     +0.0%
        integer          +0.0%     +0.0%     +0.2%     +0.0%     +0.0%
      integrate          +0.0%     +0.0%      0.16      0.20     +0.0%
   k-nucleotide          -0.1%     +0.0%     +3.3%     +5.5%     +0.0%
          kahan          +0.0%     +0.0%      0.17      0.17     +0.0%
        knights          +0.0%     +0.0%      0.00      0.01     +0.0%
           lcss          +0.0%     +0.0%     -2.4%     -2.2%     +0.0%
           life          +0.0%     +0.0%     -5.2%     -8.5%     +0.0%
           lift          +0.0%     +0.0%      0.00      0.00     +0.0%
      listcompr          +0.0%     +0.0%      0.07      0.08     +0.0%
       listcopy          +0.0%     +0.0%      0.07      0.09     +0.0%
       maillist          +0.0%     -0.0%      0.05     -8.1%    +10.0%
         mandel          +0.0%     +0.0%      0.06      0.07     +0.0%
        mandel2          +0.0%     +0.0%      0.00      0.00     +0.0%
        minimax          +0.0%     +0.0%      0.00      0.00     +0.0%
        mkhprog          +0.0%     +0.0%      0.00      0.00     +0.0%
     multiplier          +0.0%     +0.0%      0.11      0.12     +0.0%
         n-body          +0.0%     +0.0%    +19.0%    +22.2%     +0.0%
       nucleic2          +0.0%     +0.0%      0.06      0.07     +0.0%
           para          +0.0%     +0.0%     -4.5%     -3.1%     +0.0%
      paraffins          +0.0%     +0.0%      0.11      0.12     +0.0%
         parser          +0.0%     +0.0%      0.03      0.03     +0.0%
        parstof          +0.0%     +0.0%      0.00      0.00     +0.0%
            pic          +0.0%     +0.0%      0.00      0.00     +0.0%
       pidigits          +0.0%     +0.0%     +2.3%     +1.7%     +0.0%
          power          +0.0%     +0.0%     +3.4%     +4.2%     +0.0%
         pretty          +0.0%     +0.0%      0.00      0.00     +0.0%
         primes          +0.0%     +0.0%      0.06      0.07     +0.0%
      primetest          +0.0%     +0.0%      0.11      0.11     +0.0%
         prolog          +0.0%     +0.0%      0.00      0.01     +0.0%
         puzzle          +0.0%     +0.0%      0.14      0.15     +0.0%
         queens          +0.0%     +0.0%      0.01      0.02     +0.0%
        reptile          +0.0%     +0.0%      0.00      0.01     +0.0%
reverse-complem          +0.0%     +0.0%      0.10      0.17     +0.0%
        rewrite          +0.0%     +0.0%      0.01      0.02     +0.0%
           rfib          +0.0%     +0.0%      0.01      0.01     +0.0%
            rsa          +0.1%     +0.0%      0.02      0.03     +0.0%
            scc          +0.0%     +0.0%      0.00      0.00     +0.0%
          sched          +0.0%     +0.0%      0.01      0.02     +0.0%
            scs          +0.0%     +0.0%    -25.4%    -32.2%     +0.0%
         simple          +0.0%     +0.0%     +0.0%     -0.7%     +0.0%
          solid          +0.0%     +0.0%      0.16      0.16     +0.0%
        sorting          +0.0%     +0.0%      0.00      0.00     +0.0%
  spectral-norm          +0.0%     +0.0%     -0.4%     -0.1%     +0.0%
         sphere          +0.0%     +0.0%      0.04      0.05     +0.0%
         symalg          +0.0%     +0.0%      0.01      0.01     +0.0%
            tak          +0.0%     +0.0%      0.01      0.01     +0.0%
      transform          +0.0%     +0.0%     +2.5%     +1.8%     +0.0%
       treejoin          +0.0%     +0.0%      0.15      0.17     +0.0%
      typecheck          +0.1%     +0.0%     +1.0%     +1.9%     +0.0%
        veritas          +0.0%     +0.0%      0.00      0.01     +0.0%
           wang          +0.1%     +0.0%      0.13      0.14     +0.0%
      wave4main          +0.0%     +0.0%     +3.0%     +7.3%     +0.0%
   wheel-sieve1          +0.0%     +0.0%     -1.1%     -1.6%     +0.0%
   wheel-sieve2          +0.0%     +0.0%      0.21     +5.6%     +0.0%
           x2n1          -0.1%     +0.0%      0.00      0.01     +0.0%
--------------------------------------------------------------------------------
            Min          -0.1%     -0.0%    -25.4%    -32.2%     -1.3%
            Max          +0.1%     +0.0%    +19.0%    +22.2%    +10.0%
 Geometric Mean          -0.0%     -0.0%     -0.3%     -0.1%     +0.1%

There's more work to be done!

Last edited 6 years ago by leroux (previous) (diff)

comment:8 Changed 6 years ago by simonpj

You don't want to compare the performance of the compiled nofib programs. After all, they should be bit-for-bit identical whether GHC internally uses nub or nubOrd!

What you want to compare is the compile time! The compile times are in the nofib logs, and nofib-analyse should compare them, but in the log you attach it says (no modules compiled). I don't know why.

In any case, the differences in compile times will probably be mostly noise (a tiny difference in a large number, measured to only 0.1s precision or whatever). You might instead want to focus on the amount of space allocated by GHC as it compiles. That is at least precise and repeatable. I'm not sure if the nofib infrastructure measures these.

comment:9 Changed 6 years ago by leroux

nofib benchmark comparisons of HEAD vs HEAD with ordNub patch applied. (BuildFlavour=quick)

https://gist.github.com/leroux/6725810#file-headvordnub-analysis-L2988.


hvr has pointed out that having cases for [], [a], and [a, b] will most probably prevent the overhead from using Set for trivial calls. I'll post an updated benchmark later today with that implemented.

comment:10 in reply to:  9 Changed 6 years ago by nh2

Replying to leroux:

hvr has pointed out that having cases for [], [a], and [a, b] will most probably prevent the overhead from using Set for trivial calls.

Before you implement that, have you already found out why ordNub even is faster on singleton list? It is weird, but I am not joking: ordNub [1] is slightly faster than nub [1] in my Criterion benchmark.

comment:11 Changed 6 years ago by simonpj

A 5% improvement in compile time is remarkable, if it's true. Great! But I'm always worried about the noise in compile times measured in seconds. It would be great if you could do a sanity check by looking at how much GHC itself allocates in the before and after cases.

Simon

Changed 6 years ago by leroux

Attachment: ordNub-analysis added

nofib-analyse of HEAD and ordNub-patched

Changed 6 years ago by leroux

comment:12 Changed 6 years ago by leroux

Resolution: wontfix
Status: newclosed

Full nofib analysis of head, ordNub, and ordNub-cases (each ran twice): https://gist.github.com/leroux/6750216, attachment:full-ordNub-analysis.tar.gz.

I took certain measures to make sure the nofib benchmarks would return "truer" results. I think you'll find the results much more reliable now.

According to the results, sadly the 5% performance gain in compile time was complete bogus. Sorry. =(

Snippet from https://gist.github.com/leroux/6750216#file-full-ordnub-analysis-L2880-L3467 (compile times):

Compile Times
-------------------------------------------------------------------------------
        Program               head-0          head-1        ordNub-0        ordNub-1  ordNub-cases-0  ordNub-cases-1
-------------------------------------------------------------------------------
        -1 s.d.                -----           -1.6%           -1.6%           -1.7%           -1.7%           -1.7%
        +1 s.d.                -----           +1.9%           +2.1%           +1.6%           +2.0%           +1.7%
        Average                -----           +0.1%           +0.2%           -0.0%           +0.1%           -0.0%

I'm closing this ticket now since there is no obvious improvement from replacing nub with ordNub.

Last edited 6 years ago by leroux (previous) (diff)

Changed 6 years ago by leroux

Attachment: full-ordNub-analysis.tar.gz added

comment:13 Changed 6 years ago by simonpj

Don't give up so easily :-)

Did you look at the amount of heap the compiler allocated? It's easily discoverable (eg use +RTS -s, and it's in the nofib logs too, though I don't think nofib-analyse looks for it. (You could fix nofib-analyse so that it does.)

The point is that reducing allocation is almost always good; and it is a FAR more repeatable number than runtime, for which it is hard to measure small differences. My bet is that ordNub will be a tiny win in most cases.

The other thing is that while using ordNub may not benefit typical programs, it may have a dramatic effect on some weird program which happens to produce very long lists for nub to digest. Provided using ordNub doesn't slow things down for typical programs (use allocation numbers to check) I'm all for using it so that the edge cases don't fall off a cliff.

Simon

comment:14 Changed 6 years ago by leroux

difficulty: Moderate (less than a day)Unknown
Owner: leroux deleted
Priority: normallow
Resolution: wontfix
Status: closednew

Did you look at the amount of heap the compiler allocated? It's easily discoverable (eg use +RTS -s, and it's in the nofib logs too, though I > don't think nofib-analyse looks for it. (You could fix nofib-analyse so that it does.)

I'm working on fixing nofib-analyse up so that we can extract compile allocation info.

Thanks for the motivation, SPJ! =)

comment:15 Changed 6 years ago by leroux

Owner: set to leroux

comment:16 Changed 6 years ago by leroux

I've attached a patch to include compile time alloc info on #8409.

I'll keep digging in but so far it doesn't look like allocation is affected significantly either.

Compile Allocations

-------------------------------------------------------------------------------
        Program               head-0          head-1        ordNub-0        ordNub-1  ordNub-cases-0  ordNub-cases-1
-------------------------------------------------------------------------------
        -1 s.d.                -----           -0.0%           -0.0%           -0.0%           -0.0%           -0.0%
        +1 s.d.                -----           +0.0%           +0.0%           +0.0%           +0.0%           +0.0%
        Average                -----           +0.0%           +0.0%           +0.0%           +0.0%           +0.0%

0% across the board, apparently. Any other suggestions anyone? Perhaps I'll try to replace the few nub's I left laying around in the compiler.

Note: See TracTickets for help on using tickets.