Opened 14 years ago

Closed 10 years ago

#594 closed task (fixed)

Support use of SSE2 in the x86 native code genreator

Reported by: simonmar Owned by: simonmar
Priority: normal Milestone: 7.0.1
Component: Compiler (NCG) Version: 6.4.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case: N/A
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Currently only the x86_64 native code generator supports SSE2, but it would be worthwhile enabling this in the x86 backend too.

Change History (12)

comment:1 Changed 14 years ago by simonmar

Architecture: Unknown
difficulty: Moderate (1 day)
Operating System: Unknown

comment:2 Changed 13 years ago by igloo

Milestone: 6.8
Test Case: N/A

comment:3 Changed 13 years ago by simonmar

Owner: set to simonmar

I'm probably going to do this.

comment:4 Changed 12 years ago by simonmar

Milestone: 6.8 branch6.10 branch
Owner: simonmar deleted

comment:5 Changed 12 years ago by simonmar

See #1890 for a test case (actually we could put that test into nofib).

comment:6 Changed 11 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:7 Changed 11 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:8 Changed 10 years ago by igloo

Milestone: 6.10 branch6.12 branch

comment:9 Changed 10 years ago by simonmar

difficulty: Moderate (1 day)Moderate (less than a day)

comment:10 Changed 10 years ago by igloo

Type of failure: Runtime performance bug

comment:11 Changed 10 years ago by simonmar

Milestone: 6.12 branch6.14.1
Owner: set to simonmar
Status: newassigned

I'm on this.

comment:12 Changed 10 years ago by simonmar

Resolution: fixed
Status: assignedclosed

Done:

Thu Feb  4 10:48:49 GMT 2010  Simon Marlow <marlowsd@gmail.com>
  * Implement SSE2 floating-point support in the x86 native code generator (#594)
  
  The new flag -msse2 enables code generation for SSE2 on x86.  It
  results in substantially faster floating-point performance; the main
  reason for doing this was that our x87 code generation is appallingly
  bad, and since we plan to drop -fvia-C soon, we need a way to generate
  half-decent floating-point code.
  
  The catch is that SSE2 is only available on CPUs that support it (P4+,
  AMD K8+).  We'll have to think hard about whether we should enable it
  by default for the libraries we ship.  In the meantime, at least
  -msse2 should be an acceptable replacement for "-fvia-C
  -optc-ffast-math -fexcess-precision".
  
  SSE2 also has the advantage of performing all operations at the
  correct precision, so floating-point results are consistent with other
  platforms.
  
  I also tweaked the x87 code generation a bit while I was here, now
  it's slighlty less bad than before.

I measured the FF ray tracer benchmark, and -msse2 seems on par with, or possibly better than, "-fvia-C -optc-O3 -fexcess-precision -ffast-math", although the results are quite variable on the machine I tried it on. I suspect we're suffering from randomly misaligned Doubles on the stack and heap.

Note: See TracTickets for help on using tickets.