Opened 7 years ago

Closed 4 years ago

#7883 closed task (fixed)

enable GHC LLVM backend to use LLVM provided CAS / Atomicity primitives?

Reported by: carter Owned by: carter
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 7.9
Keywords: Cc: carter.schonwald@…, pho@…, johan.tibell@…, idhameed@…, brooks.brian@…, erikd
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D1282
Wiki Page:


LLVM provides a number of atomicity / memory ordering primitives.

currently the CAS primops exposed at the CMM level are ffi'd inline assembly fun calls. this means that for certain concurrency primops, we've got fun calls within funcalls that could be easily inlined if we did it in a more llvm direct way. A notable example of this is the implementation of stg_atomicModifyMutVarzh

relevant basic llmv docs are the following Atomic llvm ops

semantics of the various ordering levels

relevant locations in the ghc source CAS inline assembly

defn of atomicModifyMutVar cmm code

Based upon my reading of the relevant GHC cmm code, and reading the semantics of the LLVM operations, the right level of atomic ordering in the generated bit code would be "SequentiallyConsistent".

a first step would be to modify the CMM -> LLVM pass to substitute the cas funcall with the right llvm operation. This SEEMS like it'd be very very easy to do and low hanging fruit.

Theres a few upsides that might come out of this.

  1. would be easy to augment to also provide a doubleCAS primitive, which would be useful in writing various lock free data structures.
  2. would increase the portability / ease of retargeting new platforms / hardware via LLVM. (not needing to worry about the target's memory consistency model / write new cases of inline assembly)
  3. (this would need benchmarks) could potentially improve the performance of certain heavy concurrency operations by eliminating a funcall in an otherwise tight loop.
  4. Theres probably other interesting possible upside, such as perhaps having more of the rts be nicely writeable in CMM? (esp since theres now native funcall support)
  5. Also would make the CMM memory model a bit more explicit perhaps?

This seems like a *relatively* managable patch to write. I'm up for spending time on this if, should it work, it'd be likely to be merged in.

it'd also be a good warm up for a number of other things I want to do, including

Change History (26)

comment:1 Changed 7 years ago by carter

the language at the end of the ticket is vague:

would this be a change folks would like / or at least value experimenting with?

comment:2 Changed 7 years ago by carter

Cc: carter.schonwald@… added
Owner: set to carter

go ahead by Simon M here

David Terei points out related work i can refer to as a model for the work, when Tibbe was adding popcount


I"ll start hacking on this in a week or so.

comment:3 Changed 7 years ago by PHO

Cc: pho@… added

comment:4 Changed 6 years ago by carter

difficulty: Unknown

In light of how some other work folks are doing would benefit from this, i'll get this finished up by the close of august

comment:5 Changed 6 years ago by rrnewton

A couple additional suggestions copied over from the mailing list:

(1) we should use more unique symbols than "cas", especially for this rewriting trick. How about "ghc_cas" or something?

(2) it would be great to get at least fetch-and-add in addition to CAS and barriers

(3) if we reliably provide this set of special symbols, libraries like atomic-primops may use them in the .cmm and benefit from the CMM->LLVM substitutions

comment:6 Changed 6 years ago by carter

Ryan, could you explain what you mean by "rewriting trick"?

I have no idea what you mean.

Ryan, could you please explain what primops you want using the terminology / vocabulary in and ?

comment:7 in reply to:  6 Changed 6 years ago by carter

hrm, I guess you mean the making the CAS ops inline native primops as "rewriting"?

comment:8 Changed 6 years ago by rrnewton

Sorry, I was just referring to the proposal to "substitute the cas funcall with the right llvm operation" as rewriting.

That is, the approach would pattern match for the CMM code "ccall cas" or "foreign "C" cas" (I'm afraid I don't know the difference between those) and replace it with the equivalent LLVM op, right?

comment:9 Changed 6 years ago by carter

nope, the idea is to provide CMM / Haskell level prim ops. Normal primops. (see my reply on list :) ).

comment:10 Changed 6 years ago by carter

Theres a few other bits that need to be added to the ticket based upon the discussion on list, but one that hasn't been addressed is how to support both compiling to a "sequential" version as well as the "-threaded" ATOMIC analogues based upon which RTS is linked into the application

comment:11 Changed 6 years ago by rrnewton

Here are some notes on which operations the "atomic-primops" library is currently exposing, and the other concurrent data structure libs (e.g. chaselev-deque) are using:

  • CAS on MutVars, MutableArray#, and MutableByteArray#
  • fetch and add on MutableByteArray#
  • barriers / memory fences

Drafts of .cmm for these can be found inside the atomic-primops library. Note that *only* casMutVar# is currently shipped with GHC.

Ultimately there's no reason that we shouldn't aim for a fairly "complete set". For example, why not have fetch-and-sub and the other "atomicrmw" variants? Relating these to the LLVM atomics and memory orderings, they become:

  • CAS variants = LLVM cmpxchg with SequentiallyConsistent ordering
  • fetch-and-X variants = LLVM atomicrmw with SequentiallyConsistent
  • store_load_barrier = LLVM fenceInst with SequentiallyConsistent
  • write_barrier and load_load_barrier = I *think* these are both covered by a FenceInst with AcquireRelease ordering...

I'm not an LLVM expert, so please double check these yourself before trusting them.

Last edited 6 years ago by rrnewton (previous) (diff)

comment:12 Changed 6 years ago by carter

note: the primary first user for this planned work, ryan newton, is ok with the atomic operations being the threaded versions even with the single threaded RTS. (this will simplify initial engineering, though it does mean there will be a small perf penalty on sequential RTS relative to theoretical peformance).

comment:13 Changed 6 years ago by carter

Type: feature requesttask

changing this from a feature request to a task since Ryan Newton is getting the ball rolling with preliminary C implementations for the proposed primops

comment:14 Changed 6 years ago by rrnewton

Update: the task in ticket #8157 is a prerequisite to this task. A preliminary solution is checked into the atomics branch.

comment:15 Changed 6 years ago by carter

i'll work on adding this stuff after the 7.8 rush.

comment:16 Changed 6 years ago by carter

Milestone: 7.10.1
Version: 7.77.9

comment:17 Changed 6 years ago by rrnewton

FYI, Johan (@tibbe) is interested in this. Carter, have you started the implementation somewhere?

comment:18 Changed 6 years ago by tibbe

Cc: johan.tibell@… added

comment:19 Changed 5 years ago by ihameed

Cc: idhameed@… added

comment:20 Changed 5 years ago by brbr

Cc: brooks.brian@… added

comment:21 Changed 5 years ago by thoughtpolice


Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:22 Changed 4 years ago by thoughtpolice


Milestone renamed

comment:23 Changed 4 years ago by erikd

Cc: erikd added

comment:24 Changed 4 years ago by bgamari

Differential Rev(s): Phab:D1282
Status: newpatch

I have Phab:D1282 up on Phabricator covering this.

comment:25 Changed 4 years ago by Ben Gamari <ben@…>

In bd41eb2a/ghc:

LLVM: Implement atomic operations in terms of LLVM primitives

This fixes Trac #7883.

This adds proper support for,
  * `MO_AtomicRMW`
  * `MO_AtomicWrite`
  * `MO_CmpXChg`

Test Plan: Validate

Reviewers: rrnewton, austin

Subscribers: thomie

Differential Revision:

GHC Trac Issues: #7883

comment:26 Changed 4 years ago by bgamari

Resolution: fixed
Status: patchclosed


Note: See TracTickets for help on using tickets.