Opened 9 years ago

Closed 8 years ago

#5173 closed feature request (fixed)

Implement forward substitution of constants in the Cmm mini-inliner

Reported by: tibbe Owned by: simonmar
Priority: high Milestone: 7.2.1
Component: Compiler Version: 7.0.3
Keywords: Cc: johan.tibell@…, dterei
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

While working on a new set of primops for copying arrays, I ran into a Cmm optimizer issue where constants weren't being inlined if the temporary that the constant was assigned to was used more than once. Example:

fn
{
    bits64 a, b;

    a = 1;
    b = a + a;
    RET_N(b);
}

is optimized as

fn()    { []
        }
    ca: _cb::I64 = 1;
        R1 = _cb::I64 + _cb::I64;
        jump (I64[Sp + 0]) ();
}

Change History (13)

comment:1 Changed 9 years ago by tibbe

The attached patch implement forward substitution of constants, with the drawback that the original assignment isn't removed. We could either leave the assignment to a later dead-assignment elimination pass or try to improve this pass to remove it as well.

comment:2 Changed 9 years ago by tibbe

Status: newpatch

I've updated the implementation to also remove the dead binding. Please review.

comment:3 Changed 9 years ago by tibbe

Here are the results for validate:

OVERALL SUMMARY for test run started at Fri May  6 16:03:28 CEST 2011
    2756 total tests, which gave rise to
    9978 test cases, of which
       0 caused framework failures
    7533 were skipped

    2362 expected passes
      80 expected failures
       0 unexpected passes
       3 unexpected failures

Unexpected failures:
   3816(normal)
   T5084(normal)
   user001(normal)

The three failures don't seem related. I believer if seen user001 and 3816 before, when working on the I/O manager.

comment:4 Changed 9 years ago by tibbe

Here are the nofib results:

            Min          -0.0%     -0.0%     -2.0%     -2.0%     -2.4%
            Max          +0.0%     +0.1%     +0.4%     +0.8%     +0.7%
 Geometric Mean          -0.0%     +0.0%     -0.3%     -0.2%     -0.0%

A tiny decrease in runtime and allocation which is probably just noise.

Note that I added this optimization in order to enable other optimizations, that depend on calls at the Cmm level have constant arguments whenever possible, so the lack of any improvements on nofib isn't a problem.

comment:5 Changed 9 years ago by tibbe

For completeness, here's some real code that benefits from this optimization:

stH_ret()
        { update_frame: <none>
          has static closure: False type: 0
          desc: 0
          tag: 32
          stack: [Just _ssK::I64]
          srt: _no_srt_
        }
    cv1:
        _cv5::I64 = I64[R1 + 7];
        _cv7::I64 = 1;
        _cv9::I64 = I64[Sp + 8];
        _cvb::I64 = 1;
        _cvd::I64 = 8;
        _cvf::I64 = _cvd::I64 * 8;
        _cvh::I64 = (_cv5::I64 + 24) + (_cv7::I64 << 3);
        _cvj::I64 = (_cv9::I64 + 24) + (_cvb::I64 << 3);
        foreign "ccall"
          MO_Memcpy((_cvj::I64, PtrHint), (_cvh::I64, PtrHint), (_cvf::I64,),
                    (8,))[_unsafe_call_];
        I64[_cv9::I64] = stg_MUT_ARR_PTRS_DIRTY_info;
        _cvl::I64 = (_cv9::I64 + 24) + (I64[_cv9::I64 + 8] << 3);
        _cvn::I64 = _cvb::I64 >> 7;
        foreign "ccall"
          MO_Memset((_cvl::I64 + _cvn::I64, PtrHint), (1,),
                    ((_cvb::I64 + _cvd::I64 >> 7) - _cvn::I64 + 1,),
                    (8,))[_unsafe_call_];
        R1 = ghczmprim_GHCziUnit_Z0T_closure+1;
        Sp = Sp + 16;
        jump (I64[Sp + 0]) ();
}

(MO_Memcpy and MO_Memset are CallishMachOps that I'm currently adding. These MachOps will generate unrolled loops in the backend, if the number of iterations is statically known.)

Note how _cvb and _cvd are both constant and used more than once. Here's the optimized version, without forward substitution of constants:

stH_ret()
        { [const 1;, const 32;]
        }
    cv1:
        _cv9::I64 = I64[Sp + 8];
        _cvb::I64 = 1;
        _cvd::I64 = 8;
        _cvh::I64 = I64[R1 + 7] + 32;
        foreign "ccall"
          MO_Memcpy(((_cv9::I64 + 24) + (_cvb::I64 << 3), PtrHint),
                    (_cvh::I64, PtrHint), (_cvd::I64 << 3,), (8,))[_unsafe_call_];
        I64[_cv9::I64] = stg_MUT_ARR_PTRS_DIRTY_info;
        _cvn::I64 = _cvb::I64 >> 7;
        foreign "ccall"
          MO_Memset(((_cv9::I64 + 24) + ((I64[_cv9::I64 + 8] << 3) + _cvn::I64),
                     PtrHint),
                    (1,), ((_cvb::I64 + _cvd::I64 >> 7) + (1 - _cvn::I64),),
                    (8,))[_unsafe_call_];
        R1 = GHC.Unit.()_closure+1;
        Sp = Sp + 16;
        jump (I64[Sp + 0]) ();
}

_cv7 has been inlined, as it was used exactly once, but _cvb and _cvd haven't.

Here's the optimized version with forward substitution of constants:

stH_ret()
        { [const 1;, const 32;]
        }
    cv1:
        _cv9::I64 = I64[Sp + 8];
        _cvh::I64 = I64[R1 + 7] + 32;
        foreign "ccall"
          MO_Memcpy((_cv9::I64 + 32, PtrHint), (_cvh::I64, PtrHint), (64,),
                    (8,))[_unsafe_call_];
        I64[_cv9::I64] = stg_MUT_ARR_PTRS_DIRTY_info;
        _cvn::I64 = 0;
        foreign "ccall"
          MO_Memset(((_cv9::I64 + 24) + ((I64[_cv9::I64 + 8] << 3) + _cvn::I64),
                     PtrHint),
                    (1,), (0 - _cvn::I64 + 1,), (8,))[_unsafe_call_];
        R1 = GHC.Unit.()_closure+1;
        Sp = Sp + 16;
        jump (I64[Sp + 0]) ();
}

The code is better, but not perfect. The remaining reference to a register with a constant value is due to the mini-inliner only inlining constants, not expressions that could be folded into a constant.

I'll leave this as future work for now.

comment:6 Changed 9 years ago by tibbe

I added the constant folding as well, the last example in the previous comment now looks great:

stH_ret()
        { [const 1;, const 32;]
        }
    cv1:
        _cv9::I64 = I64[Sp + 8];
        _cvh::I64 = I64[R1 + 7] + 32;
        foreign "ccall"
          MO_Memcpy((_cv9::I64 + 32, PtrHint), (_cvh::I64, PtrHint), (64,),
                    (8,))[_unsafe_call_];
        I64[_cv9::I64] = stg_MUT_ARR_PTRS_DIRTY_info;
        foreign "ccall"
          MO_Memset(((_cv9::I64 + 24) + (I64[_cv9::I64 + 8] << 3), PtrHint),
                    (1,), (1,), (8,))[_unsafe_call_];
        R1 = GHC.Unit.()_closure+1;
        Sp = Sp + 16;
        jump (I64[Sp + 0]) ();
}

comment:7 Changed 9 years ago by dterei

Cc: dterei added

comment:8 Changed 9 years ago by simonmar

Milestone: 7.2.1
Owner: set to simonmar
Priority: normalhigh

Thanks - I will review for 7.2.1.

comment:9 Changed 8 years ago by tibbe

Type: bugfeature request

comment:10 Changed 8 years ago by tibbe

I've updated the patches to apply constant folding to the source expression rather to the target expression after inlining. This allows assignments like this one to be inlined:

_ccZ::I64 = 0 << 7;
_cd3::I64 = _ccZ;

Previously the forward substitution would only take place if _ccZ had already been substituted "in to", like in this example:

_cd1::I64 = 0;
_ccZ::I64 = 0 << _cd1;
_cd3::I64 = _ccZ;

comment:11 Changed 8 years ago by simonmar

Resolution: fixed
Status: patchclosed
Note: See TracTickets for help on using tickets.