Opened 2 years ago

Last modified 20 months ago

#13852 new feature request

Can we have more SIMD primops, corresponding to the untapped AVX etc. instructions?

Reported by: leftaroundabout Owned by:
Priority: normal Milestone:
Component: Compiler (LLVM) Version: 8.0.1
Keywords: SIMD Cc:
Operating System: Unknown/Multiple Architecture: x86_64 (amd64)
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

GHC.Prim contains a good couple of vectorised instructions, which can be used by libraries for generating nice fast e.g. sums of floating-point vectors.

However, several instructions that modern processors could vectorise are missing there. In particular, I would like to be able to use the VPSLLVD...VPSRAVD shifting operations, and at some point perhaps VPMAXSQ...VPMINUQ maximum/minimum operations.

It would be great if corresponding primops could be added. Else I would like to know – where is this stuff even defined? GHC.Prim as such seems to be merely an automatically-generated dummy module, mostly for Haddock.

(On the other hand, I find it also a bit strange that there are primops for integer division, which is apparently not supported by SSE/AVX at all!)

Change History (7)

comment:1 Changed 2 years ago by simonpj

comment:2 Changed 2 years ago by leftaroundabout

That definitely helps, but I'm still far from understanding what I'd need to do do enable those AVX operations myself. The linked Wiki article doesn't seem to be quite up-to-date WRT Primops.cmm, in which I can neither find any hint of any of the vectorised instructions, nor the quotIntegerzh example.

comment:3 Changed 2 years ago by hsyl20

Primitive operations on vectors are named Vec* in prelude/primops.txt.pp (e.g, VecDivOp). The genprimopcode utility generates a primop per vector type and width. For instance in compiler/stage1/build/primop-list.hs-incl:

   , (VecDivOp FloatVec 4 W32)
   , (VecDivOp FloatVec 2 W64)
   , (VecDivOp FloatVec 8 W32)
   , (VecDivOp FloatVec 4 W64)
   , (VecDivOp FloatVec 16 W32)
   , (VecDivOp FloatVec 8 W64)

and in compiler/stage1/build/primop-primop-info.hs-incl:

primOpInfo (VecDivOp FloatVec 4 W32) = mkDyadic (fsLit "divideFloatX4#") floatX4PrimTy
primOpInfo (VecDivOp FloatVec 2 W64) = mkDyadic (fsLit "divideDoubleX2#") doubleX2PrimTy
primOpInfo (VecDivOp FloatVec 8 W32) = mkDyadic (fsLit "divideFloatX8#") floatX8PrimTy
primOpInfo (VecDivOp FloatVec 4 W64) = mkDyadic (fsLit "divideDoubleX4#") doubleX4PrimTy
primOpInfo (VecDivOp FloatVec 16 W32) = mkDyadic (fsLit "divideFloatX16#") floatX16PrimTy
primOpInfo (VecDivOp FloatVec 8 W64) = mkDyadic (fsLit "divideDoubleX8#") doubleX8PrimTy

These are converted from Stg to Cmm by translateOp in codeGen/StgCmmPrim.hs. For instance, VecDivOp FloatVec becomes MO_VF_Quot.

Then you need to use the LLVM backend to convert Cmm into LLVM (textual) IR. This is done by genMachOp_slow in llvmGen/LlvmCodeGen/CodeGen.hs.

Finally LLVM generates the assembly and GHC replaces some instructions because it can't guarantee that the alignment is correct. Note that the native code generator don't support them yet: you have to use the LLVM backend.

If the instructions you want are supported by LLVM, they should be relatively easy to add.

Last edited 2 years ago by hsyl20 (previous) (diff)

comment:4 Changed 2 years ago by bgamari

Keywords: SIMD added

comment:5 Changed 20 months ago by dominic

This looks like it would be a good GSoC project to me. What do others think?

comment:6 Changed 20 months ago by bgamari

While technically the ptoject is of the right scale for a GSoC student, I'm a bit weary of suggesting it as a relatively small fraction of the community would stand to benefit from its completion.

comment:7 Changed 20 months ago by dominic

Maybe there are other bits of the code generator which would give more bang for buck? I think I will propose this anyway but with a comment to ask the GSoC student to spend a few days seeing if this is the highest priority (but doable) part of the CG that needs addressing.

Note: See TracTickets for help on using tickets.