Opened 3 years ago

Closed 3 years ago

#12095 closed bug (invalid)

GHC and LLVM don't agree on what to do with byteSwap16#

Reported by: thoughtpolice Owned by:
Priority: high Milestone: 8.0.2
Component: Compiler (LLVM) Version: 8.0.1
Keywords: codegen, llvm Cc: tibbe, tab
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Consider this test case (taken from here and lightly modified to work on big/little endian machines):

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE MagicHash    #-}
{-# LANGUAGE CPP          #-}
module Main
  ( main -- :: IO ()
  ) where

#include "ghcconfig.h"

import           GHC.Prim
import           GHC.Word

data T = T !Addr#

t :: T
#ifndef WORDS_BIGENDIAN
t = T "\xcf\xb1"#
#else
t = T "\xb1\xcf"#
#endif

grabWord16 :: T -> Word64
grabWord16 (T addr#) = W64# (byteSwap16# (indexWord16OffAddr# addr# 0#))

trip :: Int
trip = fromIntegral (grabWord16 t)

main :: IO ()
main = print trip

With GHC 7.10.3 using the NCG, the results given are correct:

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.10.3
$ ghc -Wall -fforce-recomp -O2 Issue67.hs && ./Issue67
[1 of 1] Compiling Main             ( Issue67.hs, Issue67.o )
Linking Issue67 ...
53169

This also is the same on GHC 8.0.1 using the NCG, on both PowerPC and AMD64 as well. This answer is correct: 53169 is 0xCFB1 in hex, so the byteSwap16# primitive correctly works to decode the swapped-endian number.

However, the story is not the same with GHC 7.10.3+LLVM 3.5, or GHC 8.0.1+LLVM 3.7:

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.10.3
$ llc --version | head -2
LLVM (http://llvm.org/):
  LLVM version 3.5.2
$ ghc -Wall -fforce-recomp -O2 Issue67.hs -fllvm && ./Issue67
[1 of 1] Compiling Main             ( Issue67.hs, Issue67.o )
Linking Issue67 ...
-12367

Note:

-12367 == (fromIntegral (53169 :: Word16) :: Int16)

The relevant snippet looks like this at the CMM level (GHC 7.10.3):

==================== Output Cmm ====================
[section "data" {
     Main.main2_closure:
         const Main.main2_info;
         const 0;
         const 0;
         const 0;
 },
 section "readonly" {
     c3rq_str:
         I8[] [207,177]
 },
 section "readonly" {
     c3rr_str:
         I8[] [207,177]
 },
 Main.main2_entry() //  [R1]
         { info_tbl: [(c3ru,
                       label: Main.main2_info
                       rep:HeapRep static { Thunk }),
                      (c3rD,
                       label: block_c3rD_info
                       rep:StackRep [])]
           stack_info: arg_space: 8 updfr_space: Just 8
         }
     {offset
       c3ru:
           ...
       c3ro:
           I64[Sp - 16] = stg_bh_upd_frame_info;
           I64[Sp - 8] = _c3rn::I64;
           (_c3rw::I64) = call MO_BSwap W16(%MO_UU_Conv_W16_W64(I16[c3rr_str]));
           I64[Sp - 24] = c3rD;
           R4 = GHC.Types.[]_closure+1;
           R3 = _c3rw::I64;
           R2 = 0;
           Sp = Sp - 24;
           call GHC.Show.$wshowSignedInt_info(R4,
                                              R3,
                                              R2) returns to c3rD, args: 8, res: 8, upd: 24;
...

Pre-optimized LLVM basic block:

c3rB:
  %ln3sc = ptrtoint i8* @stg_bh_upd_frame_info to i64
  %ln3sb = load i64** %Sp_Var
  %ln3sd = getelementptr inbounds i64* %ln3sb, i32 -2
  store i64 %ln3sc, i64* %ln3sd, !tbaa !1
  %ln3sf = load i64* %lc3rA
  %ln3se = load i64** %Sp_Var
  %ln3sg = getelementptr inbounds i64* %ln3se, i32 -1
  store i64 %ln3sf, i64* %ln3sg, !tbaa !1
  %ln3sh = ptrtoint %c3rE_str_struct* @c3rE_str$def to i64
  %ln3si = inttoptr i64 %ln3sh to i16*
  %ln3sj = load i16* %ln3si, !tbaa !5
  %ln3sk = zext i16 %ln3sj to i64
  %ln3sl = trunc i64 %ln3sk to i16
  %ln3sm = call ccc i16 (i16)* @llvm.bswap.i16( i16 %ln3sl )
  %ln3sn = sext i16 %ln3sm to i64
  store i64 %ln3sn, i64* %lc3rJ
  %ln3sp = ptrtoint void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @c3rQ_info$def to i64
  %ln3so = load i64** %Sp_Var
  %ln3sq = getelementptr inbounds i64* %ln3so, i32 -3
  store i64 %ln3sp, i64* %ln3sq, !tbaa !1
  %ln3sr = ptrtoint i8* @ghczmprim_GHCziTypes_ZMZN_closure to i64
  %ln3ss = add i64 %ln3sr, 1
  store i64 %ln3ss, i64* %R4_Var
  %ln3st = load i64* %lc3rJ
  store i64 %ln3st, i64* %R3_Var
  store i64 0, i64* %R2_Var
  %ln3su = load i64** %Sp_Var
  %ln3sv = getelementptr inbounds i64* %ln3su, i32 -3
  %ln3sw = ptrtoint i64* %ln3sv to i64
  %ln3sx = inttoptr i64 %ln3sw to i64*
  store i64* %ln3sx, i64** %Sp_Var
  %ln3sy = bitcast i8* @base_GHCziShow_zdwshowSignedInt_info to void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*

Post-optimized block (opt --enable-tbaa=true -O2 out-llvm-orig.ll -o out-llvm.bc):

c3rB:                                             ; preds = %c3rU
  %ln3s8 = ptrtoint i8* %ln3s7 to i64
  %ln3sd = getelementptr inbounds i64* %Sp_Arg, i64 -2
  store i64 ptrtoint (i8* @stg_bh_upd_frame_info to i64), i64* %ln3sd, align 8, !tbaa !5
  %ln3sg = getelementptr inbounds i64* %Sp_Arg, i64 -1
  store i64 %ln3s8, i64* %ln3sg, align 8, !tbaa !5
  store i64 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @"c3rQ_info$def" to i64), i64* %ln3rZ, align 8, !tbaa !5
  tail call cc10 void bitcast (i8* @base_GHCziShow_zdwshowSignedInt_info to void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*)(i64* %Base_Arg, i64* %ln3rZ, i64* %Hp_Arg, i64 %R1_Arg, i64 0, i64 -12367, i64 add (i64 ptrtoint (i8* @ghczmprim_GHCziTypes_ZMZN_closure to i64), i64 1), i64 undef, i64 undef, i64 %SpLim_Arg) #0
  ret void

Folds it right into a constant!

I haven't spent time diagnosing this much further, yet.

Change History (3)

comment:1 Changed 3 years ago by rwbarton

byteSwap16# is defined in primops.txt.pp as

primop   BSwap16Op   "byteSwap16#"   Monadic   Word# -> Word#
    {Swap bytes in the lower 16 bits of a word. The higher bytes are undefined. }

So, I don't see the problem. Writing W64# (byteSwap16# (...)) is wrong.

comment:2 Changed 3 years ago by thoughtpolice

So you're right, and the answer with a little more detail is outlined in this commit in binary-serialise-cbor: https://github.com/well-typed/binary-serialise-cbor/commit/d419bdeae39e56531a86d8208e059967863cffb5 if you want to read the novel about the LLVM view of things.

But something about this ticket still makes me feel uneasy. In particular, is there ever any reason why the byteSwap# primitives do not imply such narrowing themselves? Are the high bits always undefined even if they were set before? Why does it imply that - because it can't be guaranteed on every platform? (e.g. on x86 I believe should be able to do a 16-byte swap with simply xchg ah, al which should not modify the high 16 bits, that's legitimate).

Obviously not every primitive can be safe (if it pokes memory or whatever), but I'm strongly in favor of having sensible behavior for operators like this which are merely 'bit fiddling' wrapped up in a single MachOp, as opposed to leaving them to have 'undefined behavior' like in C which is just a cue for clever compilers to torment me more than they need to.

(Although, if anyone thinks that's reasonable, it should probably go into another ticket and I should close this one.)

Last edited 3 years ago by thoughtpolice (previous) (diff)

comment:3 Changed 3 years ago by thomie

Cc: tibbe tab added
Resolution: invalid
Status: newclosed

CC @tab (Vincent Hanquez), who added these primops in 18087a119b47368b15231c43402c81888c75957d (#7902), and tibbe who reviewed them. Please see comment:2.

(it should probably go into another ticket and I should close this one.)

Yes, please open a new (feature request) ticket if necessary.

Note: See TracTickets for help on using tickets.