Opened 5 years ago

Last modified 14 months ago

#9688 new feature request

Improve the interaction between CSE and the join point transformation

Reported by: dfeuer Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.9
Keywords: CSE, JoinPoints Cc: AndreasK
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


It appears that the join point transformation sometimes interferes with CSE when CSE would be much better. Two examples:


Suppose we define

isHexDigit              :: Char -> Bool
isHexDigit c            =  (fromIntegral (ord c - ord '0')::Word) <= 9 ||
                           (fromIntegral (ord c - ord 'a')::Word) <= 5 ||
                           (fromIntegral (ord c - ord 'A')::Word) <= 5

digitToInt c
  | (fromIntegral dec::Word) <= 9 = dec
  | (fromIntegral hexl::Word) <= 5 = hexl + 10
  | (fromIntegral hexu::Word) <= 5 = hexu + 10
  | otherwise = error ("Char.digitToInt: not a digit " ++ show c) -- sigh
    dec = ord c - ord '0'
    hexl = ord c - ord 'a'
    hexu = ord c - ord 'A'
-- We could also expand this out in cases manually, but it makes no
-- difference as far as I can tell.

Suppose we then write a naive digitToIntMaybe function:

digitToIntMaybe c
  | isHexDigit c = Just (digitToInt c)
  | otherwise    = Nothing

What I would want this to do is "zip" the nested cases and give Core like this:

$wdigitToIntMaybe =
  \ ww_s2Ag ->
    let {
      x#_a2yy = -# (ord# ww_s2Ag) 48 } in
    case tagToEnum# (leWord# (int2Word# x#_a2yy) (__word 9)) of _ {
      False ->
        let {
          x#1_X2z7 = -# (ord# ww_s2Ag) 97 } in
        case tagToEnum# (leWord# (int2Word# x#1_X2z7) (__word 5)) of _ {
          False ->
            let {
              x#2_X2zh = -# (ord# ww_s2Ag) 65 } in
            case tagToEnum# (leWord# (int2Word# x#2_X2zh) (__word 5)) of _ {
              False -> Nothing;
              True -> Just (I# (+# x#2_X2zh 10))
          True -> Just (I# (+# x#1_X2z7 10))
      True -> Just (I# x#_a2yy)

digitToIntMaybe =
  \ w_s2Ad ->
    case w_s2Ad of _ { C# ww1_s2Ag -> $wdigitToIntMaybe ww1_s2Ag }

But instead, the join point transformation triggers, and we get this:

digitToIntMaybe1 =
  \ ww_s2Cp ->
         "Char.digitToInt: not a digit "# ($w$cshowsPrec15 ww_s2Cp ([])))

$wdigitToIntMaybe =
  \ ww_s2Cp ->
    let {
      $j_s2Bc =
        \ _ ->
            (let {
               a_s2B5 = int2Word# (-# (ord# ww_s2Cp) 48) } in
             case tagToEnum# (leWord# a_s2B5 (__word 9)) of _ {
               False ->
                 let {
                   a1_s2B7 = int2Word# (-# (ord# ww_s2Cp) 97) } in
                 case tagToEnum# (leWord# a1_s2B7 (__word 5)) of _ {
                   False ->
                     let {
                       a2_s2B9 = int2Word# (-# (ord# ww_s2Cp) 65) } in
                     case tagToEnum# (leWord# a2_s2B9 (__word 5)) of _ {
                       False -> digitToIntMaybe1 ww_s2Cp;
                       True -> I# (+# (word2Int# a2_s2B9) 10)
                   True -> I# (+# (word2Int# a1_s2B7) 10)
               True -> I# (word2Int# a_s2B5)
             }) } in
    case tagToEnum#
           (leWord# (int2Word# (-# (ord# ww_s2Cp) 48)) (__word 9))
    of _ {
      False ->
        case tagToEnum#
               (leWord# (int2Word# (-# (ord# ww_s2Cp) 97)) (__word 5))
        of _ {
          False ->
            case tagToEnum#
                   (leWord# (int2Word# (-# (ord# ww_s2Cp) 65)) (__word 5))
            of _ {
              False -> Nothing;
              True -> $j_s2Bc void#
          True -> $j_s2Bc void#
      True -> $j_s2Bc void#

digitToIntMaybe =
  \ w_s2Cm ->
    case w_s2Cm of _ { C# ww1_s2Cp -> $wdigitToIntMaybe ww1_s2Cp }

We perform the same three tests twice each, and test for an error condition that obviously can't happen.

quotRem and divMod

If we define

x `quot` y = fst (x `quotRem` y)
x `rem` y = snd (x `quotRem` y)

and then write something like

f x y | x `rem` y == 0 = x `quot` y
      | otherwise = 17

then CSE works some magic and we only calculate quotRem x y once.

Unfortunately, if we do this:

whatever x y = if x `myRem` y == 0 then (x `myQuot` y) + 14 else x `myQuot` y

then the join point transformation fires, collecting the myQuot x y expressions in the case branches and preventing CSE from recognizing the much better opportunity to eliminate those calculations altogether.

The situation with divMod is much worse. The join point transformation applied to the cases defining divMod prevents CSE from working magic on it in even simple situations, unless one of the arguments is known, making this definition unusable (the resulting Core is too horrifyingly long to paste here). It would probably be possible to improve the divMod situation to something close to the quotRem one by making divMod NOINLINE and adding special divModLit rules, but I'd much rather see a general solution.

Change History (4)

comment:1 Changed 5 years ago by simonpj

Indeed! It looks like a classic code-specialisation question to me. In digitToIntMaybe you use some code isHexDigit that just happens to use the same tests as the RHS, digitToInt. Moreover there are three ways for isHexDigit to succeed. If you duplicate the RHS into those three outcomes you'll get the result you want; this amounts to inlining $j_s2BC. But that entails code duplication -- and might gain nothing whatsoever.

Or, to put it another way, if the user wrote the above Core, complete with the local function $j_s2BC, would you expect it to be optimised?

It would be cool if so. But I don't yet see how to achieve that at reasonable cost.

To put it another way, if you wrote this in C:

if blah then x = e1 else x = e2
if similar-blah then s1 else s2

you'd usually expect to test blah, assign to x, and then test similar-blah and do s1 or s2. But it might be better to duplicate (i.e. specialise) the first if into the branches:

if similar-blah then {
  if blah then x=e1 else x=e2
} else {
  if blah then x=e1 else x=e2

and now the inner if might be optimised away entirely.

These are not simple choices. By representing them as simple inlining choices ("shall I inline $j_s2Bc?") GHC piggy backs on a lot of careful heuristics for inlining. But, I agree, it does not always work well.

I have taken time to explain here, in the hope that someone can do better!


comment:2 in reply to:  1 Changed 5 years ago by dfeuer

Replying to simonpj:

Or, to put it another way, if the user wrote the above Core, complete with the local function $j_s2BC, would you expect it to be optimised?

Probably so, but I don't really know. I think the basic concept is that we should let more information flow from the analysis of the main body into the analysis of j. I think these two steps would probably be a good start, although I imagine they could be refined. Of course, you may well know of seven fatal flaws in this approach.

  1. The simple one: If the value of a certain expression is always available when calling j, check if that expression is used in j; if so, pass it in. This is really just CSE, broadened to cross the boundary between the local function and the main body.
  1. The less simple one: If the scrutinee of a case in j has always been scrutinized before j is called, "trim" the branches to only the ones that are still possible, and then consider splitting j into its remaining branches. In the examples described above, one relevant case will be the outermost one in j, and that should pretty much always be merged upwards. Doing the same thing a few times will take care of the rest. I think this basically does what would have happened had j been inlined, but we actually know that we're getting something out of it.

comment:3 Changed 3 years ago by simonpj

Keywords: JoinPoints added

comment:4 Changed 14 months ago by AndreasK

Cc: AndreasK added
Note: See TracTickets for help on using tickets.