Opened 2 years ago

Last modified 2 years ago

#14359 new bug

C-- pipeline/NCG fails to optimize simple repeated addition

Reported by: bgamari Owned by:
Priority: low Milestone:
Component: Compiler Version: 8.2.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by bgamari)

While debugging #14346 I noticed some rather abhorrent code in a disassembly of the newPinnedByteArray# primop:

Dump of assembler code for function stg_newPinnedByteArrayzh:
   0x00000000004a8518 <+0>:	mov    0x378(%r13),%rax
   0x00000000004a851f <+7>:	cmpq   $0x0,0x10(%rax)
   0x00000000004a8524 <+12>:	je     0x4a8593 <stg_newPinnedByteArrayzh+123>
   0x00000000004a8526 <+14>:	mov    0x4f5730,%rax
   0x00000000004a852e <+22>:	mov    0x38(%rax),%rax
   0x00000000004a8532 <+26>:	cmp    0x4f5718,%rax
   0x00000000004a853a <+34>:	jae    0x4a8593 <stg_newPinnedByteArrayzh+123>
   0x00000000004a853c <+36>:	mov    %rbx,%rax
   0x00000000004a853f <+39>:	lea    0x7(%rax),%rcx
   0x00000000004a8543 <+43>:	shr    $0x3,%rcx
   0x00000000004a8547 <+47>:	add    $0x10,%rax       <--- starts here
   0x00000000004a854b <+51>:	add    $0xf,%rax
   0x00000000004a854f <+55>:	add    $0x7,%rax
   0x00000000004a8553 <+59>:	shr    $0x3,%rax
   0x00000000004a8557 <+63>:	mov    $0x49d820,%ecx
   ...

That is three successive add instructions; surely those should be collapsed into one by the Cmm-to-Cmm pipeline.

Change History (3)

comment:1 Changed 2 years ago by bgamari

Description: modified (diff)

comment:2 Changed 2 years ago by alexbiehl

Actually compiling PrimOps.cmm with -O already results in the desired constant folding:

$ ghc -ddump-asm -O -c rts/PrimOps.cmm | less
...
stg_newPinnedByteArrayzh:
_cv:
        movq 888(%r13),%rax
        cmpq $0,16(%rax)
        je _cl
_cn:
        movq g0@GOTPCREL(%rip),%rax
        movq (%rax),%rax
        movq 56(%rax),%rax
        movq large_alloc_lim@GOTPCREL(%rip),%rcx
        cmpq (%rcx),%rax
        jae _cl
_co:
        subq $8,%rsp
        leaq -24(%r13),%rax
        leaq 38(%rbx),%rsi <- see here
        shrq $3,%rsi
        movq %rax,%rdi
        xorl %eax,%eax
        call allocatePinned
        addq $8,%rsp
        testq %rax,%rax
...
Last edited 2 years ago by alexbiehl (previous) (diff)

comment:3 Changed 2 years ago by bgamari

Well that is a relief. I guess this might just be an artifact from the fact I was using a validate build. I'll have to check this.

Note: See TracTickets for help on using tickets.