Opened 3 years ago

Closed 21 months ago

#12022 closed feature request (invalid)

unsafeShiftL and unsafeShiftR are not marked as INLINE

Reported by: Rufflewind Owned by: dfeuer
Priority: normal Milestone:
Component: libraries/base Version: 7.10.3
Keywords: performance, inline, bits Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Given that they are fairly essential operations in core libraries such as containers, I think it would be useful to have them marked as INLINE (or INLINABLE at the very least).

See also: https://github.com/haskell/containers/pull/216

Change History (12)

comment:1 Changed 3 years ago by dfeuer

I don't think INLINABLE would help this at all. I would lean toward either INLINE or INLINE CONLIKE. I don't have an example, but I think I saw GHC allocate when just INLINE for a copy of one of these.

comment:2 Changed 3 years ago by thomie

Type of failure: None/UnknownRuntime performance bug

I was confused at first. From Data.Bits in libraries/base, it is clear that unsafeShiftL and unsafeShiftR are marked INLINE:

class Eq a => Bits a where
    ...
    unsafeShiftL            :: a -> Int -> a
    {-# INLINE unsafeShiftL #-}
    x `unsafeShiftL` i = x `shiftL` i

    unsafeShiftR            :: a -> Int -> a
    {-# INLINE unsafeShiftR #-}
    x `unsafeShiftR` i = x `shiftR` i
    ....

But I guess you're talking about these instances, which indeed aren't marked INLINE:

instance Bits Int where
    ...
    (I# x#) `unsafeShiftL` (I# i#) = I# (x# `uncheckedIShiftL#` i#)
    (I# x#) `unsafeShiftR` (I# i#) = I# (x# `uncheckedIShiftRA#` i#)
    ...
instance Bits Word where
    ...
    (W# x#) `unsafeShiftL` (I# i#) = W# (x# `uncheckedShiftL#` i#)
    (W# x#) `unsafeShiftR` (I# i#) = W# (x# `uncheckedShiftRL#` i#)
    ...

There is a Note which suggests explicit INLINEs aren't necessary, except for rotate:

{-      Note [Constant folding for rotate]
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The INLINE on the Int instance of rotate enables it to be constant
folded.  For example:
  
...

All other Bits instances seem to inline well enough on their
own to enable constant folding; for example 'shift':
     sumU . mapU (`shift` 3) . replicateU 10000000 $ (7 :: Int)
 goes to:
     Main.$wfold =
       \ (ww_sOb :: Int#) (ww1_sOf :: Int#) ->
         case ww1_sOf of wild_XM {
           __DEFAULT -> Main.$wfold (+# ww_sOb 56) (+# wild_XM 1);
           10000000 -> ww_sOb
         }
-}

But that Note is older (2008) than the commit that introduced unsafeShiftL/R (f1c593e01d740fde1202f84aa37ad4cc95ec7272, 2011).

comment:3 Changed 3 years ago by dfeuer

More importantly, the note is nonsense. Things not marked INLINE or (more strongly) INLINE CONLIKE sometimes won't be inlined even though they can be, once the code around them gets complicated enough. An operation like an unsafe shift, mask, etc., that's literally cheaper than a function call should *always* be inlined, unless I'm extremely confused.

comment:4 Changed 21 months ago by m-renaud

Hey, just following up on this, there's still an open issue in containers[1] awaiting this change to go in. Is there any fundamental reason why these instance functions cannot be marked INLINE?

[1] https://github.com/haskell/containers/pull/216

comment:5 Changed 21 months ago by mpickering

Is there a reason why they should be marked INLINE? The note seems to indicate that GHC inlines these functions anyway, they look small so I suspect this is the case.

comment:6 Changed 21 months ago by mpickering

Operating System: Unknown/MultipleWindows

comment:7 Changed 21 months ago by mpickering

Operating System: WindowsUnknown/Multiple

comment:8 Changed 21 months ago by m-renaud

This all predates me but in comment 2 above thomie@ points out: "But that Note is older (2008) than the commit that introduced unsafeShiftL/R ([f1c593e01d740fde1202f84aa37ad4cc95ec7272], 2011)." In fact, these functions are marked INLINE in the typeclass definition when they were introduced, and if the assumption is that GHC should always inline them, then why not make that explicit in the Int and Word instances? It appears this may have simply been an oversight.

In containers if these functions are not inlined it is my understanding that there is a performance hit which is why there is custom code to ensure that these shifts are inlined (https://github.com/haskell/containers/blob/master/Utils/Containers/Internal/BitUtil.hs#L74-L89).

comment:9 Changed 21 months ago by mpickering

To take that example you linked, it is peculiar as depending on whether __GLASGOW_HASKELL__ is set, shiftLL has different semantics.

Running the program

{-# LANGUAGE MagicHash #-}
module Main where
import Bits
import GHC.Exts (Word(..), Int(..))
import GHC.Prim (uncheckedShiftL#, uncheckedShiftRL#)

shiftRL :: Word -> Int -> Word
shiftRL x i   = shiftR x i

shiftRL' :: Word -> Int -> Word
shiftRL' (W# x) (I# i) = W# (uncheckedShiftRL# x i)

main = do
  print $ shiftRL 1 65
  print $ shiftRL' 1 65

Yields

0
shift: Bad shift length65

But ignoring this fact, if you inspect the core you see that the result is the same as the handwritten "optimised" definition so I'm not sure what the intention is here with the additional complexity...

In fact, these functions are marked INLINE in the typeclass definition when they were introduced, and if the assumption is that GHC should always inline them, then why not make that explicit in the Int and Word instances?

In general the compiler has much more information available at call sites to decide whether to inline a definition than a library author does at definition site. Preoptimisation, adding INLINE pragmas everywhere, can be very detrimental as the compiler can no longer decide whether it is worthwhile to do the inlining itself.

In this case, I think the compiler will decide to inline all these small functions so there is no reason to add an INLINE pragma.

comment:10 Changed 21 months ago by bgamari

Perhaps we should close this in that case?

comment:11 Changed 21 months ago by dfeuer

Owner: set to dfeuer

I'll run containers benchmarks with and without the custom shifts and see if it matters.

comment:12 Changed 21 months ago by dfeuer

Resolution: invalid
Status: newclosed

I stripped out the custom code in containers and to the (fairly small) extent that the benchmarks changed, they improved. So I'll close this.

Note: See TracTickets for help on using tickets.