Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#10676 closed bug (duplicate)

silly assembly for comparing the result of comparisons that return Int# against 0#

Reported by: rwbarton Owned by:
Priority: normal Milestone:
Component: Compiler (CodeGen) Version: 7.10.1
Keywords: Cc: jstolarek
Operating System: Unknown/Multiple Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #8326, #8327, #9661 Differential Rev(s):
Wiki Page:

Description

Consider the module

{-# LANGUAGE MagicHash #-}
module Min where

import GHC.Exts

fgood :: Int# -> Int# -> Int
fgood x# y# = case isTrue# (x# <# y#) of
  False -> I# y#
  True  -> I# x#

fbad :: Int# -> Int# -> Int
fbad x# y# = case x# <# y# of
  0# -> I# y#
  _  -> I# x#

The code for fgood looks fine:

0000000000000130 <Min_fgood_info>:
 130:	49 83 c4 10          	add    $0x10,%r12
 134:	4d 3b a5 58 03 00 00 	cmp    0x358(%r13),%r12
 13b:	77 1a                	ja     157 <Min_fgood_info+0x27>
 13d:	49 39 f6             	cmp    %rsi,%r14
 140:	7c 29                	jl     16b <Min_fgood_info+0x3b>
 142:	49 c7 44 24 f8 00 00 	movq   $0x0,-0x8(%r12)
 149:	00 00 
			147: R_X86_64_32S	ghczmprim_GHCziTypes_Izh_con_info
 14b:	49 89 34 24          	mov    %rsi,(%r12)
 14f:	49 8d 5c 24 f9       	lea    -0x7(%r12),%rbx
 154:	ff 65 00             	jmpq   *0x0(%rbp)
 157:	49 c7 85 88 03 00 00 	movq   $0x10,0x388(%r13)
 15e:	10 00 00 00 
 162:	bb 00 00 00 00       	mov    $0x0,%ebx
			163: R_X86_64_32	Min_fgood_closure
 167:	41 ff 65 f8          	jmpq   *-0x8(%r13)
 16b:	49 c7 44 24 f8 00 00 	movq   $0x0,-0x8(%r12)
 172:	00 00 
			170: R_X86_64_32S	ghczmprim_GHCziTypes_Izh_con_info
 174:	4d 89 34 24          	mov    %r14,(%r12)
 178:	49 8d 5c 24 f9       	lea    -0x7(%r12),%rbx
 17d:	ff 65 00             	jmpq   *0x0(%rbp)

But the code for fbad has several problems:

0000000000000018 <Min_fbad_info>:
  18:	48 8d 45 f0          	lea    -0x10(%rbp),%rax
  1c:	4c 39 f8             	cmp    %r15,%rax
  1f:	72 3a                	jb     5b <Min_fbad_info+0x43>
  21:	48 89 f0             	mov    %rsi,%rax
  24:	4c 89 f3             	mov    %r14,%rbx
  27:	49 39 f6             	cmp    %rsi,%r14
  2a:	0f 9c c1             	setl   %cl
  2d:	0f b6 c9             	movzbl %cl,%ecx
  30:	48 85 c9             	test   %rcx,%rcx
  33:	75 51                	jne    86 <c1Sm_info+0xe>
  35:	49 83 c4 10          	add    $0x10,%r12
  39:	4d 3b a5 58 03 00 00 	cmp    0x358(%r13),%r12
  40:	0f 87 aa 00 00 00    	ja     f0 <c1Su_info+0x10>
  46:	49 c7 44 24 f8 00 00 	movq   $0x0,-0x8(%r12)
  4d:	00 00 
			4b: R_X86_64_32S	ghczmprim_GHCziTypes_Izh_con_info
  4f:	49 89 04 24          	mov    %rax,(%r12)
  53:	49 8d 5c 24 f9       	lea    -0x7(%r12),%rbx
  58:	ff 65 00             	jmpq   *0x0(%rbp)
  5b:	bb 00 00 00 00       	mov    $0x0,%ebx
			5c: R_X86_64_32	Min_fbad_closure
  60:	41 ff 65 f8          	jmpq   *-0x8(%r13)
	...

; c1Sm_info is the other case with its own heap check and GC entry code
; c1Su_info is another GC entry
; in total, another 160 bytes of code

For some reason, the heap checks were moved into the alternatives, which was not a good decision in this case. But the silly thing here is the cmp/setl/movzbl/test/jne sequence in Min_fbad_info, which should be replaced by a cmp/jl as in Min_fgood_info.

Same behavior in 7.8 and HEAD.

Change History (10)

comment:1 Changed 4 years ago by thomie

Cc: jstolarek added

CC jstolarek as the author of PrimBool.

comment:2 Changed 4 years ago by bgamari

I think you are essentially seeing #8326.

comment:3 Changed 4 years ago by rwbarton

Yes, I think so. But there is a problem besides the fact that the heap check is duplicated. Referring to the Cmm listing in #8326, something seems to think that _sEV (the 0#-or-1# result of the primop) is live in the alternatives, even to the point of passing it through the GC, when really it will never be used after the branch. I think that's the reason that later passes are unable to avoid this ugly cmp/setCC/movzbl/test/jne sequence. Otherwise the definition of _sEV would be inlined at its only use site, the branch, and the code generator would then generate a nice cmp/jCC sequence.

comment:4 Changed 4 years ago by rwbarton

comment:5 Changed 4 years ago by rwbarton

Resolution: duplicate
Status: newclosed

Oh, I just found #8327. Will close this one as a duplicate then.

comment:6 Changed 4 years ago by Reid Barton <rwbarton@…>

In 7e70c063/ghc:

Use isTrue# around primitive comparisons in integer-gmp

Summary:
The form
  case na# ==# nb# of
    0# -> ...
    _  -> ...
sometimes generates convoluted assembly, see #10676.
timesInt2Integer was the most spectacular offender, especially as
it is a rather cheap function overall (no calls to gmp).

I checked a few instances and some of the old generated assembly
was fine already, but I changed them all for consistency. The new
form is also more consistent with use of these primops in general.

Test Plan: validate

Reviewers: hvr, bgamari, goldfire, austin

Reviewed By: hvr

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D1094

comment:7 Changed 4 years ago by simonpj

Wait!

I believe that you are saying that

 case na# ==# nb# of
    0# -> e1
    _  -> e2

generates much worse code than

  case isTrue# (na# ==# nb#) of
    False -> e1
    True  -> e2

even though the former appears more primitive. This is all very odd and either deserves to be fixed, or at least documented somewhere prominent!

Where would be a good place to document it? Perhaps with the primops for (==)#, (>=)#, etc? Or with isTrue#? And we need a ticket to say "let's fix this".

Speaking of which do you know why it behaves so badly?

Just changing the code is leaving land-mines for future generations :-).

Simon

comment:8 Changed 4 years ago by jstolarek

comment:9 Changed 4 years ago by jstolarek

#9661 might be related here.

comment:10 in reply to:  7 Changed 4 years ago by rwbarton

Replying to simonpj:

Wait!

I believe that you are saying that [...]

Yes that's right. Tickets #8326 and #8327 cover everything I had to say about the code generated here.

Where would be a good place to document it? Perhaps with the primops for (==)#, (>=)#, etc? Or with isTrue#?

Ideally with the comparison primops, but they are numerous and scattered throughout the list of primops, which is organized principally by type (Char#, then the integral types, etc.). There is some mention of this phenomenon in the Note [Optimizing isTrue#], though you have to read between the lines a bit.

It would be best to fix this of course, but if that stalls then I'll add a comment near the top of primops.txt.pp.

Speaking of which do you know why it behaves so badly?

There is a special case for cgCase on an enumeration type of a primop application: in this case we always raise heap checks out of the alternatives. I think this is the entire reason for the difference between the code generated for the two versions. I don't see why it makes sense to have different logic for how to place the heap checks in this case compared to the general case, and I'm not entirely sure it isn't a historical accident.

Further comments on #8326.

Note: See TracTickets for help on using tickets.