Opened 7 years ago

Last modified 13 months ago

#7741 new feature request

Add SIMD support to x86/x86_64 NCG

Reported by: shelarcy Owned by: abhir00p
Priority: normal Milestone:
Component: Compiler (NCG) Version: 7.7
Keywords: SIMD Cc: shelarcy@…, simonmar, gmainland, winter, maoe
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #3557 Differential Rev(s):
Wiki Page: wiki:SIMD

Description

ghc-7.7.20130301 has SIMD support. But only LLVM backend supports SIMD currently. If we want to use SIMD, we should use LLVM backend. I request to add SIMD support to x86/x86_64 NCG.

Change History (24)

comment:1 Changed 7 years ago by shelarcy

Cc: shelarcy@… added

Adding SIMD support to x86/x86_64 NCG is important for Windows users. Because LLVM backend doesn't work on Windows now [1], and GHC's SIMD support reuires to modify LLVM on Windows 32 bit [2].

We want to use SIMD seamlessly on Windows environment.

comment:2 Changed 6 years ago by igloo

difficulty: Unknown
Milestone: 7.8.1

comment:3 Changed 6 years ago by simonmar

Cc: simonmar added

comment:4 Changed 6 years ago by carter

Assuming I have time over the coming months to cleanup the native codegen, this is one major subtask I hope to do. (For a number of reasons, we really need to have feature parity on ncg and llvm backends)

comment:5 Changed 5 years ago by thoughtpolice

Milestone: 7.8.37.10.1

Moving to 7.10.1

comment:6 Changed 5 years ago by thomie

Component: CompilerCompiler (NCG)

comment:7 Changed 5 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:8 Changed 4 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:9 Changed 4 years ago by thomie

Milestone: 8.0.1

comment:10 Changed 4 years ago by thomie

Cc: gmainland added
Type of failure: None/UnknownRuntime performance bug

comment:11 Changed 2 years ago by bgamari

Keywords: SIMD added
Wiki Page: wiki:SIMD

comment:12 Changed 2 years ago by winter

Cc: winter added

comment:13 Changed 2 years ago by bgamari

In response to recent interest in SIMD support, I pondered this for a bit while wandering in the woods yesterday. I think there's a pretty straightforward path to introducing SIMD (including AVX) support in the NCG. Here's my proposed plan,

  • Introduce a few types,
    -- | The format of a vector value
    data VecFormat = VecFormat { vecLength :: Length       -- ^ vector length (e.g. how many scalars?)
                               , vecFormat :: ScalarFormat -- ^ the format of each scalar
                               , vecWidth  :: Width        -- ^ the width of each scalar
                               }
    -- | What type of quantity is a scalar?
    data ScalarFormat = FmtInt | FmtFloat
    
    -- this already exists
    type Length = Int
    
  • Rework the instructions in nativeGen/X86/Instr.hs to carry a VecFormat instead of a Format. Perhaps just start with ADD before moving on to the others just to make sure this plan works.
  • For the purposes for register allocation pretend there are only ZMM registers (e.g. ignoring XMM and YMM). This saves us from having to worry about register aliasing. We can then use the VecFormat to determine what kind of register we really mean. I believe this can be done in nativeGen/X86/Regs.hs:allocatableRegs.
  • I think the calling convention logic (e.g. in cmm/CmmCallConv.hs) should require no change.
  • Add the necessary pretty-printing logic (in nativeGen/X86/Ppr.hs) to produce the new instructions
  • Add the necessary logic to the code generator to implement the MachOps (e.g. nativeGen/X86/CodeGen.hs)

All-in-all this seems quite feasible and likely no more than a day or two of work.

Last edited 2 years ago by bgamari (previous) (diff)

comment:14 Changed 2 years ago by winter

Wow, this is a good news!

One thing makes situation complicated is that there're several versions of SIMD implementations out there(SSE4, AVX, AVX2...), and we seems don't have a clear plan to support which version, or provide user compile flag to do fallback.

After thinking about this over. I think we'd be better split SIMD primitives into GHC.Prim.SSE4, GHC.Prim.AVX, etc. modules to avoid this compatibility hell. Compiler should be able to produce whatever instructions programmer want it to.

We also have to provide runtime detections in RTS, use some unsafe magic we should be able to provide SIMD supporting conditions as constants.

Now a programmer will be able to write a portable program with adaptable SIMD optimization.

BTW, is SSE4 suppose to be the baseline of x86_64 now?

Last edited 2 years ago by winter (previous) (diff)

comment:15 Changed 2 years ago by bgamari

After thinking about this over. I think we'd be better split SIMD primitives into GHC.Prim.SSE4, GHC.Prim.AVX, etc. modules to avoid this compatibility hell.

I'm afraid all of the primops will likely remain in GHC.Prim for tiresome engineering reasons (namely the GHC.Prim module is treated specially by the compiler; while we perhaps could add more wired-in modules, we'd rather not). However GHC.Prim isn't intended to be used by the user anyways. Perhaps we could reexport the SIMD primitives in a new GHC.Exts.SIMD module.

Regardless, I don't think we want to bake architecture-specific details into GHC's module naming. Really, the SIMD support provided by GHC is, like all primops, intended to be a substrate over which library authors can write safer, more convenient abstractions.

BTW, is SSE4 suppose to be the baseline of x86_64 now?

Builds on x86_64 assume SSE2 and no more, AFAIK.

comment:16 Changed 2 years ago by winter

I see, ghc-prim's current SIMD APIs seems support up to 512bits instructions. Does that means we are going to support up to AVX512?

comment:17 Changed 2 years ago by bgamari

It depends upon who steps up to implement this plan :)

comment:18 Changed 18 months ago by carter

Ben: what are your current thoughts about how to handle supporting different microarchitectures?

1) Allow generating the instructions but require the application to do cpu detection to avoid bad instructions ? (I’d be fine with that )

Or

2) add micro architecture logic to ghc compilation and have a fall back path?

Or

3) some mix of both supported?

I’d think near term 1 would be simpler to get working and 3 is what we want to love to

comment:19 Changed 18 months ago by bgamari

I think (1) is the best option given the amount of effort we have available to expend on this.

comment:20 Changed 16 months ago by abhir00p

Owner: set to abhir00p

comment:21 Changed 15 months ago by newhoggy

Has any work been done to introduce the VecFormat type?

comment:22 Changed 15 months ago by abhir00p

Some work has been done to add the VecFormat although we have restructured the data representation and instead of creating a new VecFormat type added an additional constructor in the Format data type.

The new representation is also likely to change depending on the use cases.

comment:23 Changed 15 months ago by carter

I'm supervising abhirr00p on this, theres a bunch of stuff still to do for even rudimentary portable SIMD support. Theres a few gotchas that will be clear from sharp edges in this first iteration

comment:24 Changed 13 months ago by maoe

Cc: maoe added
Note: See TracTickets for help on using tickets.