Opened 5 years ago

Closed 5 years ago

#9435 closed bug (fixed)

x86 sse4.2 popCnt16# needs to zero-extend its result

Reported by: rwbarton Owned by:
Priority: normal Milestone: 7.8.4
Component: Compiler (NCG) Version: 7.9
Keywords: Cc: tibbe, simonmar
Operating System: Unknown/Multiple Architecture: x86_64 (amd64)
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D147
Wiki Page:


make TEST=cgrun071 EXTRA_HC_OPTS=-msse42 fails for me in all the non-ghci non-llvm ways.

For popCnt16# we emit popcnt %ax,%ax which doesn't clear the high 48 bits of the result.

Patch incoming.

Change History (5)

comment:1 Changed 5 years ago by rwbarton

popCnt8# is affected too, but (possibly by luck, possibly not) we reuse the source register as the destination register and we needed to zero-extend the source anyways as there is no 8-bit popcnt instruction.

comment:2 Changed 5 years ago by rwbarton

Differential Rev(s): Phab:D147

comment:3 Changed 5 years ago by Reid Barton <rwbarton@…>

In 64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1/ghc:

x86: zero extend the result of 16-bit popcnt instructions (#9435)

The 'popcnt r16, r/m16' instruction only writes the low 16 bits of
the destination register, so we have to zero-extend the result to
a full word as popCnt16# is supposed to return a Word#.

For popCnt8# we could instead zero-extend the input to 32 bits
and then do a 32-bit popcnt, and not have to zero-extend the result.
LLVM produces the 16-bit popcnt sequence with two zero extensions,
though, and who am I to argue?

Test Plan:
 - ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42"
 - then ran again adding "WAY=optasm", and verified that
   the popcnt sequences we generate match the ones produced
   by LLVM for its @llvm.ctpop.* intrinsics

Reviewers: austin, hvr, tibbe

Reviewed By: austin, hvr, tibbe

Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter

Differential Revision:

GHC Trac Issues: #9435

comment:4 Changed 5 years ago by rwbarton

Status: newmerge

Marking as "merge" per hvr's suggestion.

comment:5 Changed 5 years ago by thoughtpolice

Milestone: 7.8.4
Resolution: fixed
Status: mergeclosed

Merged to 7.8.4.

Note: See TracTickets for help on using tickets.