#7674 closed task (fixed)
Separate StablePtr table from StableName table.
Reported by: | EyalLotem | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.8.1 |
Component: | Runtime System | Version: | 7.6.2 |
Keywords: | rts stable_ptr stable_name performance | Cc: | eyal.lotem@… |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | None/Unknown | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | #15555 | Differential Rev(s): | |
Wiki Page: |
Description
Currently, there is one table for these two difference concepts which makes each one of them less efficient.
The patches that address this improve some benchmarks' performance by a factor of ~3.
Attachments (8)
Change History (22)
comment:1 Changed 7 years ago by
Status: | new → patch |
---|
comment:2 Changed 7 years ago by
Before patch | After patch |
Hash insertion time | 2.4 | 0.8 |
Total time | 3.8 | 1.2 |
Hash insertion time (-threaded) | 2.9 | 1.2 |
Total time (-threaded) | 4.8 | 1.6 |
comment:3 Changed 7 years ago by
Also ran the benchmark without the specified heap size:
unpatched | patched | |
-threaded, no heap size | 24.5s | 10.1 |
no heap size | 23.1s | 8.3s |
comment:4 Changed 7 years ago by
As an interesting reference point, this benchmark is doing 10 million insertions of unboxed ints to boxed ints and then destruction of the hash table.
The small_hash benchmark, which inserts 10 million unboxed ints to unboxed ints takes 0.39sec.
It does bulk allocations of the hash nodes, which is hard to do in the FFI binding because the allocation happens in general-purpose code, so it should support releasing back memory.
The 1.2 sec is about a factor of 3 of overhead which may be reasonable given the extra indirection due to stable ptrs, the int boxing and the more fine-grained allocation.
Changed 7 years ago by
Attachment: | 0001-only-StableName-has-hash-table-entries-not-StablePtr.patch added |
---|
Changed 7 years ago by
Attachment: | 0002-Stable.c-Indentation-unmess-up.patch added |
---|
Changed 7 years ago by
Attachment: | 0003-Separate-stable-names-table-from-stable-ptrs-table.patch added |
---|
Changed 7 years ago by
Attachment: | 0004-Expose-Stable.c-lock-unlock.-Allow-bulk-freeStablePt.patch added |
---|
Changed 7 years ago by
Attachment: | 0005-Stable.-ch-Add-some-comments.patch added |
---|
Changed 7 years ago by
Attachment: | 0006-Abstract-away-stable-ptr-tables-loop-killing-a-coupl.patch added |
---|
Changed 7 years ago by
Attachment: | 0007-Use-ALLOC_PRIM-to-only-allocate-a-StableName-if-one-.patch added |
---|
comment:5 Changed 7 years ago by
Owner: | EyalLotem deleted |
---|
comment:6 Changed 7 years ago by
difficulty: | → Unknown |
---|
I'm afraid something is broken with StableName
s in your patch: test memo001
in libraries/base
fails for me (this is a stress test for StableName
s, it isn't run by default in validate, but it should be).
comment:7 Changed 7 years ago by
Status: | patch → new |
---|
comment:8 Changed 7 years ago by
Cc: | eyal.lotem@… added |
---|
comment:9 Changed 7 years ago by
Status: | new → patch |
---|
I replaced all 7 patches with "StableTables.patch", it is more manageable.
comment:10 Changed 7 years ago by
commit 7e7a4e4d7e9e84b2c57d3d55e372e738b5f8dbf5
Author: Simon Marlow <marlowsd@gmail.com> Date: Thu Feb 14 08:46:55 2013 +0000 Separate StablePtr and StableName tables (#7674) To improve performance of StablePtr. includes/HsFFI.h | 4 + includes/rts/Stable.h | 17 +- includes/stg/MiscClosures.h | 2 +- rts/Hash.c | 5 + rts/Hash.h | 5 +- rts/HsFFI.c | 18 ++ rts/Linker.c | 8 +- rts/PrimOps.cmm | 22 +- rts/RetainerProfile.c | 2 +- rts/RtsStartup.c | 4 +- rts/Stable.c | 460 +++++++++++++++++------------- rts/Stable.h | 29 ++- rts/sm/Compact.c | 2 +- rts/sm/GC.c | 12 +- utils/deriveConstants/DeriveConstants.hs | 3 + 15 files changed, 355 insertions(+), 238 deletions(-)
comment:11 Changed 7 years ago by
Resolution: | → fixed |
---|---|
Status: | patch → closed |
comment:12 Changed 7 years ago by
Milestone: | → 7.8.1 |
---|
comment:13 Changed 16 months ago by
Related Tickets: | → #15555 |
---|
The benchmark is the small_hash_hs FFI binding to the small_hash library. They're at:
https://github.com/Peaker/small_hash https://github.com/Peaker/small_hash_hs
(To build the latter, you need to clone the former directly inside it).
If you run small_hash_hs, it's runtime will be dominated by GC's, because of #7670. (~8 sec on my machine).
However, if you set the min heap size to 200M (via "+RTS -H200M"), the run time becomes dominated by the stable ptr hash table/etc. To run the benchmark on ordinary GHC, you need to use small_hash_hs's "unpatched_ghc" branch.
Patches at: https://github.com/Peaker/ghc/commits/separate_stable_tables
On my hardware:
Hash insertion time 2.4 0.8
Total time 3.8 1.2
Hash insertion time 2.9 1.2 (-threaded)
Total time 4.8 1.6 (-threaded)