Opened 7 years ago
Closed 4 years ago
#7883 closed task (fixed)
enable GHC LLVM backend to use LLVM provided CAS / Atomicity primitives?
Reported by: | carter | Owned by: | carter |
---|---|---|---|
Priority: | normal | Milestone: | 8.0.1 |
Component: | Compiler | Version: | 7.9 |
Keywords: | Cc: | carter.schonwald@…, pho@…, johan.tibell@…, idhameed@…, brooks.brian@…, erikd | |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | None/Unknown | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | Phab:D1282 | |
Wiki Page: |
Description
LLVM provides a number of atomicity / memory ordering primitives.
currently the CAS primops exposed at the CMM level are ffi'd inline assembly fun calls. this means that for certain concurrency primops, we've got fun calls within funcalls that could be easily inlined if we did it in a more llvm direct way. A notable example of this is the implementation of stg_atomicModifyMutVarzh
relevant basic llmv docs are the following Atomic llvm ops
semantics of the various ordering levels
relevant locations in the ghc source CAS inline assembly
defn of atomicModifyMutVar cmm code
Based upon my reading of the relevant GHC cmm code, and reading the semantics of the LLVM operations, the right level of atomic ordering in the generated bit code would be "SequentiallyConsistent".
a first step would be to modify the CMM -> LLVM pass to substitute the cas funcall with the right llvm operation. This SEEMS like it'd be very very easy to do and low hanging fruit.
Theres a few upsides that might come out of this.
- would be easy to augment to also provide a doubleCAS primitive, which would be useful in writing various lock free data structures.
- would increase the portability / ease of retargeting new platforms / hardware via LLVM. (not needing to worry about the target's memory consistency model / write new cases of inline assembly)
- (this would need benchmarks) could potentially improve the performance of certain heavy concurrency operations by eliminating a funcall in an otherwise tight loop.
- Theres probably other interesting possible upside, such as perhaps having more of the rts be nicely writeable in CMM? (esp since theres now native funcall support)
- Also would make the CMM memory model a bit more explicit perhaps?
This seems like a *relatively* managable patch to write. I'm up for spending time on this if, should it work, it'd be likely to be merged in.
it'd also be a good warm up for a number of other things I want to do, including http://hackage.haskell.org/trac/ghc/ticket/5567#comment:10
Change History (26)
comment:1 Changed 7 years ago by
comment:2 Changed 7 years ago by
Cc: | carter.schonwald@… added |
---|---|
Owner: | set to carter |
go ahead by Simon M here http://www.haskell.org/pipermail/ghc-devs/2013-May/001224.html
David Terei points out related work i can refer to as a model for the work, when Tibbe was adding popcount
https://github.com/ghc/ghc/commit/2906db6c3a3f1000bd7347c7d8e45e65eb2806cb
and
https://github.com/ghc/ghc/commit/2d0438f329ac153f9e59155f405d27fac0c43d65
I"ll start hacking on this in a week or so.
comment:3 Changed 7 years ago by
Cc: | pho@… added |
---|
comment:4 Changed 6 years ago by
difficulty: | → Unknown |
---|
In light of how some other work folks are doing would benefit from this, i'll get this finished up by the close of august
comment:5 Changed 6 years ago by
A couple additional suggestions copied over from the mailing list:
(1) we should use more unique symbols than "cas", especially for this rewriting trick. How about "ghc_cas" or something?
(2) it would be great to get at least fetch-and-add in addition to CAS and barriers
(3) if we reliably provide this set of special symbols, libraries like atomic-primops may use them in the .cmm and benefit from the CMM->LLVM substitutions
comment:6 follow-up: 7 Changed 6 years ago by
Ryan, could you explain what you mean by "rewriting trick"?
I have no idea what you mean.
Ryan, could you please explain what primops you want using the terminology / vocabulary in http://llvm.org/docs/LangRef.html#ordering and http://llvm.org/docs/Atomics.html ?
comment:7 Changed 6 years ago by
hrm, I guess you mean the making the CAS ops inline native primops as "rewriting"?
comment:8 Changed 6 years ago by
Sorry, I was just referring to the proposal to "substitute the cas funcall with the right llvm operation" as rewriting.
That is, the approach would pattern match for the CMM code "ccall cas" or "foreign "C" cas" (I'm afraid I don't know the difference between those) and replace it with the equivalent LLVM op, right?
comment:9 Changed 6 years ago by
nope, the idea is to provide CMM / Haskell level prim ops. Normal primops. (see my reply on list :) ).
comment:10 Changed 6 years ago by
Theres a few other bits that need to be added to the ticket based upon the discussion on list, but one that hasn't been addressed is how to support both compiling to a "sequential" version as well as the "-threaded" ATOMIC analogues based upon which RTS is linked into the application
comment:11 Changed 6 years ago by
Here are some notes on which operations the "atomic-primops" library is currently exposing, and the other concurrent data structure libs (e.g. chaselev-deque) are using:
- CAS on
MutVars
,MutableArray#
, andMutableByteArray#
- fetch and add on
MutableByteArray#
- barriers / memory fences
Drafts of .cmm for these can be found inside the atomic-primops library. Note that *only* casMutVar# is currently shipped with GHC.
Ultimately there's no reason that we shouldn't aim for a fairly "complete set". For example, why not have fetch-and-sub and the other "atomicrmw" variants? Relating these to the LLVM atomics and memory orderings, they become:
- CAS variants = LLVM cmpxchg with
SequentiallyConsistent
ordering - fetch-and-X variants = LLVM atomicrmw with
SequentiallyConsistent
- store_load_barrier = LLVM fenceInst with
SequentiallyConsistent
- write_barrier and load_load_barrier = I *think* these are both covered by a
FenceInst
withAcquireRelease
ordering...
I'm not an LLVM expert, so please double check these yourself before trusting them.
comment:12 Changed 6 years ago by
note: the primary first user for this planned work, ryan newton, is ok with the atomic operations being the threaded versions even with the single threaded RTS. (this will simplify initial engineering, though it does mean there will be a small perf penalty on sequential RTS relative to theoretical peformance).
comment:13 Changed 6 years ago by
Type: | feature request → task |
---|
changing this from a feature request to a task since Ryan Newton is getting the ball rolling with preliminary C implementations for the proposed primops
comment:14 Changed 6 years ago by
Update: the task in ticket #8157 is a prerequisite to this task. A preliminary solution is checked into the atomics
branch.
comment:16 Changed 6 years ago by
Milestone: | → 7.10.1 |
---|---|
Version: | 7.7 → 7.9 |
comment:17 Changed 6 years ago by
FYI, Johan (@tibbe) is interested in this. Carter, have you started the implementation somewhere?
comment:18 Changed 6 years ago by
Cc: | johan.tibell@… added |
---|
comment:19 Changed 5 years ago by
Cc: | idhameed@… added |
---|
comment:20 Changed 5 years ago by
Cc: | brooks.brian@… added |
---|
comment:21 Changed 5 years ago by
Milestone: | 7.10.1 → 7.12.1 |
---|
Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.
comment:23 Changed 4 years ago by
Cc: | erikd added |
---|
comment:24 Changed 4 years ago by
Differential Rev(s): | → Phab:D1282 |
---|---|
Status: | new → patch |
I have Phab:D1282 up on Phabricator covering this.
the language at the end of the ticket is vague:
would this be a change folks would like / or at least value experimenting with?