Opened 12 years ago

Closed 9 years ago

#2090 closed feature request (fixed)

Better stack management please

Reported by: guest Owned by:
Priority: normal Milestone: 7.4.1
Component: Runtime System Version: 6.8.2
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

(From Adrian Hey)

The current ghc rts stack management has a number of undesirable properties, at least as far as I understand it. The current implementation is explained in this post:

http://haskell.org/pipermail/glasgow-haskell-users/2007-May/012472.html

But it's worse than that. What the above post doesn't explain is that if the stack temporarily "blows up" due to deep recursion, the memory claimed from the heap for that threads stack will never be released, at least according to this post:

http://www.haskell.org/pipermail/haskell-cafe/2008-February/039010.html

There have been a number of threads discussing this, all of which contain too much information and too many points of view to reproduce in this ticket, so here are some relevant starting points:

http://haskell.org/pipermail/glasgow-haskell-users/2007-May/012467.html http://www.haskell.org/pipermail/haskell-cafe/2008-January/038790.html

What I would like is summarised here:

http://www.haskell.org/pipermail/haskell-cafe/2008-February/039115.html

Change History (6)

comment:1 Changed 12 years ago by simonmar

difficulty: Unknown

Essentially there are three suggestions here, I'll deal with them one at a time:

use a different default for the max stack size (+RTS -K)

I'm happy to go along with whatever is the popular opinion here. Personally I never encounter a program that exceeds the current stack limit and is not in an infinite loop, but I'm prepared to believe that others do. Please let us know.

reduce the memory allocated to a stack when it shrinks

This is not hard to implement - probably a couple of hours work, and confined to just one place. It could be done possibly without even copying the stack: just split the TSO into two, copy the TSO structure into the higher bit, and leave the low bit as a ThreadRelocated, the GC will clean it up. Don't forget for this to be really useful we also need to do #698.

use a linked list of stack chunks instead of a single contiguous stack

This is a much harder proposition - there's lots of code in the RTS that traverses stacks, and it would all have to change (and get more complicated, at that). The current decision to use contiguous stacks was made consciously in order to keep things simple. We'd need some convincing evidence that the lack of stack chunks is really hurting, to make it worthwhile doing this. My impression is that there are more important things.

comment:2 Changed 12 years ago by igloo

Milestone: _|_

comment:3 Changed 11 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:4 Changed 11 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:5 Changed 9 years ago by simonmar

Type of failure: None/Unknown

Done:

Wed Dec 15 04:08:43 PST 2010  Simon Marlow <marlowsd@gmail.com>
  * Implement stack chunks and separate TSO/STACK objects
  
  This patch makes two changes to the way stacks are managed:
  
  1. The stack is now stored in a separate object from the TSO.
  
  This means that it is easier to replace the stack object for a thread
  when the stack overflows or underflows; we don't have to leave behind
  the old TSO as an indirection any more.  Consequently, we can remove
  ThreadRelocated and deRefTSO(), which were a pain.
  
  This is obviously the right thing, but the last time I tried to do it
  it made performance worse.  This time I seem to have cracked it.
  
  2. Stacks are now represented as a chain of chunks, rather than
     a single monolithic object.
  
  The big advantage here is that individual chunks are marked clean or
  dirty according to whether they contain pointers to the young
  generation, and the GC can avoid traversing clean stack chunks during
  a young-generation collection.  This means that programs with deep
  stacks will see a big saving in GC overhead when using the default GC
  settings.
  
  A secondary advantage is that there is much less copying involved as
  the stack grows.  Programs that quickly grow a deep stack will see big
  improvements.
  
  In some ways the implementation is simpler, as nothing special needs
  to be done to reclaim stack as the stack shrinks (the GC just recovers
  the dead stack chunks).  On the other hand, we have to manage stack
  underflow between chunks, so there's a new stack frame
  (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects.
  The total amount of code is probably about the same as before.
  
  There are new RTS flags:
  
     -ki<size> Sets the initial thread stack size (default 1k)  Egs: -ki4k -ki2m
     -kc<size> Sets the stack chunk size (default 32k)
     -kb<size> Sets the stack chunk buffer size (default 1k)
  
  -ki was previously called just -k, and the old name is still accepted
  for backwards compatibility.  These new options are documented.

comment:6 Changed 9 years ago by simonmar

Milestone: _|_7.2.1
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.