Opened 5 years ago

Last modified 9 months ago

#9251 new task

ghc does not expose branchless max/min operations as primops

Reported by: carter Owned by: osa1
Priority: normal Milestone:
Component: Compiler Version: 7.8.2
Keywords: newcomer Cc: wren@…, sjakobi, ulysses4ever, AndreasK
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #9246 Differential Rev(s):
Wiki Page:


It was pointed out in #9246 that GHC currently does not expose primops for Max and Min on various unlifted types such as Float# and Double# and Int# and Word#, but that on most modern CPU architectures there is instruction level support for these operations. (Eg, Float# and #Double min and max can be provided on any 64bit x86 system pretty easily )

This ticket is to track doing that task. I'll probably do it one of the next two weekends to get back in the GHC hacking groove, or maybe this should be used as a "getting started" ticket for a new contributor?

Change History (26)

comment:1 Changed 5 years ago by carter

Summary: ghc does not expose brancheless max/min operations as primopsghc does not expose branchless max/min operations as primops

comment:2 Changed 5 years ago by arotenberg

Question: What "should" the behaviors of min and max be for the special cases of negative zero and NaN? Currently GHC does this:

Prelude> min 0.0 (-0.0) :: Double
Prelude> min (-0.0) 0.0 :: Double
Prelude> min 0.0 (0.0/0.0) :: Double
Prelude> min (0.0/0.0) 0.0 :: Double

The SSE instructions, at least, work similarly if you get the order right:

If the values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. If a value in the second operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).

If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MINSD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

However, according to Wikipedia, IEEE 754-2008 specifies

The min and max operations are defined but leave some leeway for the case where the inputs are equal in value but differ in representation. In particular:

min(+0,−0) or min(−0,+0) must produce something with a value of zero but may always return the first argument.

In order to support operations such as windowing in which a NaN input should be quietly replaced with one of the end points, min and max are defined to select a number, x, in preference to a quiet NaN:

min(x,NaN) = min(NaN,x) = x max(x,NaN) = max(NaN,x) = x

In the current draft, these functions are called minNum and maxNum to indicate their preference for a number over a quiet NaN.

Some further comparisons: Java's Math.min specifies

Returns the smaller of two double values. That is, the result is the value closer to negative infinity. If the arguments have the same value, the result is that same value. If either value is NaN, then the result is NaN. Unlike the numerical comparison operators, this method considers negative zero to be strictly smaller than positive zero. If one argument is positive zero and the other is negative zero, the result is negative zero.

The .NET Framework's Math.Min specifies

Parameter val1 or val2, whichever is smaller. If val1, val2, or both val1 and val2 are equal to NaN, NaN is returned.

comment:3 Changed 5 years ago by carter

i'm working on that stuff, fret not!

There's going to be a IEEE *must* conformance compliant version of every operation, plus a possibly system dependent one.

It looks like IEEE actually specifies TWO versions of min and max.

I've got a local copy of IEEE-754-2008 downloaded after I googled around, I'll spec out the operations later this week.

Wrt the signedness of 0 piece, it says **may** always return the first argument rather than must. Point being, the primops layer will likely have more than 1 variant, and the Num / Ord instances will have some choice of IEEE must condition + H2010 satisfying definition. So I think we'd want to have the haskell default min/max to treat arguments ±0 as ±1 wrt max and min. (ie max will pick +, min will pick -)

Last edited 5 years ago by carter (previous) (diff)

comment:4 Changed 5 years ago by carter

actually, importantly, in section 5.11 of the 2008 standard they say

Four mutually exclusive relations are possible: less than , equal, greater than, and unordered. The last case arises when at least one operand is NaN. Every NaN shall compare unordered with everything, including itself. Comparisons shall ignore the sign of zero (so +0 = −0). Infinite operands of the same sign shall compare equal.

thus the Ord compare based instance should actually perhaps throw an exception or otherwise fail when either argument is Nan? I'll need read a bit more.

ANYWAY, theres several *distinct* semantics that can be chosen, and adding suitable support to ghc for the different semantics that aren't interexpressible is something I'll spend some time thinking about.

don't worry, you'll have the version you want, but there will be an IEEE compliant one that also satisfies the corresponding haskell standard (BOTH, if they disagree theres a serious problem i'll have to have some discussion about)

comment:5 Changed 5 years ago by carter

Java is full of bad floating point design issues, not looking there for ANY design input. theres a fun rant here

As i'm a very hefty user of ghc for floating point computation, i'm going to be very very thoughtful on the design :)

comment:6 Changed 5 years ago by arotenberg

Yeah, I've seen those slides before. The problem with the arguments presented is that I know there is a class of users who demands "linguistically-legislated exact reproducibility" for floating-point computations, because I'm part of it! For my day job, I use Java to write a tool for modeling safety-critical software systems on desktop hardware. For our purposes, it is far more important that the results of the simulation be exactly identical across different platforms than it is they be numerically accurate or efficiently computable. We use Java's strictfp keyword and StrictMath class ubiquitously in our simulation engine, because it is essential that the output of our program be identical, no matter where or when it is run. When it is necessary to compute different or more numerically-accurate floating-point outputs, we emulate it in software. This is slow but allows us 100% confidence that you will always get the same results.

That said, our application is a rare case. In most situations, faster and more accurate computations that can give slightly different results on different machines are preferable to calculations that are wrong in exactly-reproducible ways.

Which section (if any) of the Haskell 2010 Report specifies the behavior of floating-point operations? I've glanced through it quickly but all I've been able to find is the Prelude definitions of the instances for Float and Double, which just list them as primitives.

comment:7 Changed 5 years ago by carter

To clarify what I said: both the perf and reproducible primops will be added and the one I'll pick for base will the reproducible/ portable one.

We agree. Bring the worry to the code review once I've got a proposed patch. :-). I write tooling for both workloads, I'm well well well aware of reproducibility being valuable.

comment:8 Changed 5 years ago by thomie

Keywords: newcomer added
Type of failure: None/UnknownRuntime performance bug

Adding 'newcomer' keyword by carter's suggestion in the description.

comment:9 Changed 5 years ago by thomie

Owner: carter deleted

Sorry carter, I'm trying to get this ticket to show up in Newcomers. Reclaim it if you want of course.

comment:10 Changed 5 years ago by carter

sure thing. I think the only newcomer friendly bit would be writing the branchless min/max for Word and Int and friends. its a bit more subtle for the floating point versions

comment:11 Changed 5 years ago by thoughtpolice


Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:12 Changed 4 years ago by osa1

I'm trying to understand this ticket; do we need to compile these functions:

minDbl :: Double -> Double -> Double
minDbl !d1 !d2 = if d1 < d2 then d1 else d2

minDbl' :: Double -> Double -> Double
minDbl' !d1 !d2 = if d2 < d1 then d2 else d1

to branchless code(using movsd with right argument ordering, and similarly movss for Float), or are we only interested in branchless Ord.min, Ord.max etc.? E.g. something like this:

mindbl'' :: Double -> Double -> double
mindbl'' (D# d1) (D# d2) = ... use new primop here ...

comment:13 Changed 4 years ago by carter

the idea is not to use it for the normal comparison code, but to make it available for those who need suitable supporting primops. *THOUGH* if we can provide suitable instruction level versions of the Ord based min and max, thats a thing we can think about too.

comment:14 Changed 4 years ago by osa1

Owner: set to osa1

Just to give a status update: I made some progress on this I'm also assigning this to myself.

comment:15 Changed 4 years ago by thoughtpolice


Milestone renamed

comment:16 Changed 4 years ago by WrenThornton

Cc: wren@… added

comment:17 Changed 4 years ago by thomie

Milestone: 8.0.1

comment:18 Changed 20 months ago by sjakobi

Cc: sjakobi added

comment:19 Changed 13 months ago by rockbmb


What is the status on this issue?

comment:20 Changed 13 months ago by bgamari

I believe it is just waiting for someone to come along with a patch. Feel free to finish up osa1's patch.

comment:21 Changed 13 months ago by ulysses4ever

Ben, I'm interested in this. For osa1's work: I don't see a patch. His link above has a tiny example program using something called min## which is not defined there.

Is there a clear strategy on how to attack this ticket? Let's say, starting with the most conservative step: provide primop version of integer min / max which, for x86, is implemented in assembly (CMOV or SSE4's PMINSD / PMAXSD).

comment:22 Changed 13 months ago by bgamari

​> His link above has a tiny example program using something called min## which is not defined there.

Whoops! Sorry about that; I had just assumed there was a patch sitting there.

As you suggest, I would start by thinking about providing primops; in particular I would start by focusing on Int# and Word#. As pointed out above, floating point is a whole can of worms.

However, before this I would first try to show that such a primop would improve real code. Afterall, introducing a primop means that we are forced to maintain that primop for eternity. This carries a cost which we want to ensure would be justified by the benefits it brings.

comment:23 Changed 13 months ago by ulysses4ever

I agree with measure-first reasoning. How would one compare the performance of the current code with the one using primop which we haven't defined yet, though?

comment:24 Changed 13 months ago by ulysses4ever

Cc: ulysses4ever added

comment:25 Changed 13 months ago by bgamari

Well, implementing then measuring (and accept the fact that the implementation may be thrown away as a consequence) is one option.

However, another option would be to work incrementally: measuring a best-case speed-up in a C (or similar) program.

comment:26 Changed 9 months ago by AndreasK

Cc: AndreasK added

For cases where branch prediction is impossible this makes quite the difference in my experience.

I have some proof of concept code for Int/Word already. I plan to keep the branch [here]( updated until I got a patch ready.

If someone feels like taking this over contact me, otherwise I will probably put up a patch for the Int/Word in a month or two depending on how time I can spend on this.

Note: See TracTickets for help on using tickets.