Opened 9 years ago

Closed 9 years ago

#4004 closed task (fixed)

Improve performance of a few functions in Foreign.Marshal.*

Reported by: rtvd Owned by:
Priority: normal Milestone: 7.0.1
Component: Runtime System Version: 6.12.2
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

A number of functions in Foreign.Marshal.* are relatively slow. The reasons for it are:

  • Division and multiplication operations when determining the size of memory block in words (bit shifts should be used instead).
  • The functions do not get inlined and so do not optimize away things dependent on the data type in question.

A couple of patches fix at least some of the performance issues. With both of them applied, the results of performance improvement, as tested by a basic benchmark in non-threaded RTS are:

TEST NAME              BEFORE     AFTER
withCString:         146.391 ns 133.646 ns
alloca:               51.424 ns  15.208 ns
allocaBytes:          31.872 ns  14.501 ns
mallocForeignPointer: 34.630 ns  17.498 ns
bytestring:           94.872 ns  58.938 ns
mvar:                 61.473 ns  54.806 ns
alloca+advancePtr:    54.480 ns  14.687 ns
new/finalizerFree:    61.172 ns  44.144 ns
with:                 69.096 ns  14.600 ns

Please could someone take a look at the patches I offer and merge them into the repository?

One of them is for the runtime system (definitions for Cmm), another one is for Foreign.Marsha.*.

Attachments (3)

fastdivisionandmultiplication.patch (75.1 KB) - added by rtvd 9 years ago.
patch for RTS (include/Cmm.h)
inlininghints.patch (18.5 KB) - added by rtvd 9 years ago.
Inlining hints for Foreign.Marshal.*
Benchmark.hs (1.5 KB) - added by rtvd 9 years ago.
Benchmark I used to tune the performance.

Download all attachments as: .zip

Change History (4)

Changed 9 years ago by rtvd

patch for RTS (include/Cmm.h)

Changed 9 years ago by rtvd

Attachment: inlininghints.patch added

Inlining hints for Foreign.Marshal.*

Changed 9 years ago by rtvd

Attachment: Benchmark.hs added

Benchmark I used to tune the performance.

comment:1 Changed 9 years ago by simonmar

Milestone: 6.14.1
Resolution: fixed
Status: newclosed

Thanks for the patches and benchmark. I found the missing optimisation in the backend and fixed that:

Thu Apr 22 22:34:43 BST 2010  Simon Marlow <marlowsd@gmail.com>
  * Add missing constant folding and optimisation for unsigned division
  Noticed by Denys Rtveliashvili <rtvd@mac.com>, see #4004

and added some inlinings:

Fri Apr 23 05:47:29 PDT 2010  Simon Marlow <marlowsd@gmail.com>
  * inline allocaArray0, to fix withCString benchmark

Mon Apr 19 14:53:33 BST 2010  Simon Marlow <marlowsd@gmail.com>
  * INLINE alloca and malloc

With these changes I get similar results to you with the benchmark program.

Note: See TracTickets for help on using tickets.