Opened 20 months ago

Last modified 9 months ago

#14727 new bug

Unboxed sum performance surprisingly poor

Reported by: dfeuer Owned by:
Priority: normal Milestone: 8.10.1
Component: Compiler Version: 8.2.2
Keywords: UnboxedSums Cc: michalt, maoe
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I tried performing worker-wrapper manually on Data.IntMap.lookup:

lookup# :: Int -> IntMap a -> (# (# #) | a #)
lookup# = -- The obvious modification of the current implementation

lookup :: Int -> IntMap a -> Maybe a
lookup k m = case lookup# k m of
  (# | a #) -> Just a
  _ -> Nothing

Unfortunately, the lookup benchmark slowed down. I verified that the benchmark indeed performs an immediate case analysis on the result (with fromMaybe), so it should go faster. And yet it goes slower.

Caveat: I have not yet gotten things set up to be able to check with 8.4, so if there have been improvements in UnboxedSum performance since 8.2.2, this may all be silly.

Change History (9)

comment:1 Changed 20 months ago by osa1

I'd make sure lookup is inlined, in which case allocations should be reduced.

OTOH you use one more register to return two values in lookup# instead of one as before so that may make some things worse.

It'd be helpful to see benchmark code's Cmm for both versions (with lookup inlined in the unboxed sums version).

comment:2 in reply to:  1 Changed 20 months ago by dfeuer

Replying to osa1:

I'd make sure lookup is inlined, in which case allocations should be reduced.

It is inlined. I'll have to look at allocations. Of course, it's somewhat possible that performance goes down because there's less allocation, if we're unknowingly relying on the GC to improve memory layout.

OTOH you use one more register to return two values in lookup# instead of one as before so that may make some things worse.

It'd be helpful to see benchmark code's Cmm for both versions (with lookup inlined in the unboxed sums version).

I'm not sure how much you're likely to get from a Criterion benchmark. Is there a specific thing you'd like me to try dumping?

comment:3 Changed 20 months ago by osa1

I'm not sure how much you're likely to get from a Criterion benchmark. Is there a specific thing you'd like me to try dumping?

I think looking at lookup in benchmarks/IntMap.hs (and any functions referenced by that function) would be helpful.

I also think that it may be a good idea to add {-# NOINLINE lookup #-} in benchmarks/IntMap.hs just to avoid accidentally optimising the code at the use site (i.e. the benchmarking code).

comment:4 Changed 20 months ago by michalt

Cc: michalt added

comment:5 Changed 20 months ago by simonpj

It'd be great to make a reproducible test case that Omer (the author of unboxed sums) can look at. Thanks!

comment:6 Changed 15 months ago by bgamari

Milestone: 8.6.18.8.1

These won't be addressed in GHC 8.6.

comment:7 Changed 10 months ago by maoe

Cc: maoe added

comment:8 Changed 9 months ago by osa1

Milestone: 8.8.18.10.1

Bumping milestones of low-priority tickets.

comment:9 Changed 9 months ago by osa1

Bumping milestones of low-priority tickets.

Note: See TracTickets for help on using tickets.