Opened 6 years ago

Closed 6 years ago

Last modified 3 years ago

#8255 closed feature request (invalid)

GC Less Operation

Reported by: sirinath Owned by:
Priority: lowest Milestone:
Component: Compiler Version: 7.7
Keywords: Cc: mentheta
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Hi,

Is it possible to eliminated GC through Automatic Reference Counting (ARC) like in Objective C / Clang.

S

Change History (13)

comment:1 Changed 6 years ago by ezyang

difficulty: UnknownProject (more than a week)
Milestone: _|_
Priority: highestlowest
Version: 7.6.37.7

A very interesting research question! It should be possible, but there are a number of problems you have to overcome (and if overcome, would probably be a very interesting publication):

  • GHC programs generate a lot of garbage, and it is probably still better to do tracing GC for nurseries (if it's dead, a reference counting implementation has to traverse it anyway) and only turn to reference counts when objects graduate. So you'll want to actually implement some sort of deferred reference counting such as Ulterior Reference Counting, where non-nursery is reference counted.
  • Reference counting schemes have to deal with data cycles. Immutability gets you part of the way there, since a non-cyclic immutable data structure is guaranteed to remain non-cyclic. But mutually recursive definitions as well as traditional mutation would have to be dealt with, or you'd have garbage.
  • And of course, you'd have to actually implement reference counting.

comment:2 Changed 6 years ago by sirinath

I meant compile time ARC not RC gc technique

comment:3 Changed 6 years ago by carter

Static memory management is great when possible, but for the programs people write in GHC, that would not be possible in general.

However, static allocation / mem management would be an interesting optimization. Alternatively, some of the ideas in http://hackage.haskell.org/package/regions could perhaps be adapted, and it would be interesting to enable the application of a regions discipline to Haskell memory management.

Relatedly: you should check out the DDC compiler and language, which has a number of these ideas more deeply baked in http://disciple.ouroborus.net/

comment:4 Changed 6 years ago by sirinath

Clan c++ Objective c both have it with no restrictions.

comment:5 Changed 6 years ago by thoughtpolice

Resolution: invalid
Status: newclosed

ARC does not make sense for a language like Haskell. Objective-C is already a language where you manually manage memory in essence anyway, and it's possible to screw ARC up too by breaking conventions. All the compiler does is insert retain/release calls for you, and it's not much more fancy than that. Haskell is a very different language and we do not manually manage memory - we do not have a distinction between things like 'values' and 'references', or have to concern ourselves with the lifetime of an object. This alone means ARC is inapplicable.

There are other actual mechanical problems with ARC (like the fact it doesn't handle cycles without manual breaks, via weak pointers,) but that's really besides the point I'm afraid, since the idea in itself doesn't really make a lot of sense in the context of a high level garbage collected language.

comment:6 Changed 6 years ago by sirinath

Thought memory is not manually managed does not mean that memory can only be managed in a language like Haskell though a GC. GC can be fall back.

At the language level there may be no distinction between value and reference under the hood they are either on the stack or heap. The heap has to be managed. Also allocation / de allocation can be in bocks in Haskell opposed to object at a time based on memory allocation in execution locality. Allocation and de allocation can be executing thread by inseting these into the flow of the program at the compiled code level.

ARC in Objective c context cannot be implemented as it is in Haskell.

Romacing with the GC is more of a philosophical decision with current state of compiler technology. Some years back gc would have been the only option for languages without explicit memory management.

comment:7 Changed 6 years ago by thoughtpolice

I'm not entirely sure what you're proposing. First, GHC's runtime stack is quite different from a traditional C stack - it also must be managed completely by the runtime system. The is no such thing like a difference between 'stack allocation' and 'heap allocation' at the language level like in another language - the stack is used to store continuations when evaluating to WHNF. Objects are always allocated on the heap - even stacks are (since they must grow and be chained.) Just removing stacks through aggressive optimization or some analysis would not remove the need for a GC.

Also allocation / de allocation can be in bocks in Haskell opposed to object at a time based on memory allocation in execution locality.

Based on this, I assume you mean something more like region inference, where the compiler attempts to infer memory regions which may be 'stacked' or nested in each other, and are allocated in bulk, more like a traditional call stack in an imperative language?

These schemes are still quite theoretical; Tofte and Talpin introduced this in Standard ML, and to my knowledge there's still some work here on Region/GC integration. The JHC (and AJHC) Haskell compilers also employ a form of region inference, although I am not sure if the author, John Meacham, ever got it truly working - he too eventually included a garbage collector

comment:8 Changed 6 years ago by sirinath

I am not saying you can get it right in one bang. Until you get it right you need a GC as a fall back.

Lets take a simple case.

In a function evocation which takes simple parameters and has closure on simple variable we can divide that what data came from the surrounding context (both closure and parameters), what actually is used in the local evaluation, and what needs to survive the function (return values, local created parameters which closures in locally defined function, etc.) This can translate to what memory needed to be allocated, what can de de allocated and what needs to be retained. This can be mapped into a set holding these symbols.

For each function you maintain this information. If a function, a list of function etc. is passed to a function then these sets will hold some production of sets of the above 3.

In the parent context functional application and closures will define what needs to survive this context if a function is returned.

Based on this the compiler can build a region map on what needs allocating and when as well as when it can be freed. Some memory requirements sizes may not be know (e.g. input dependent) but allocation and de allocations can be inferred. Even if this is not globally optimised this requirement can be inferred in the functions context.

When implementing something new like this it will not be perfect to handle all cases. Stating point may be 20% of the use cases which matter the most or used in a programme 80% of the time. In the rest fall back to GC as the safety net. Gradually can add more cases and some point you might not need the GC.

comment:9 Changed 6 years ago by carter

With all due respect Sirinth, if you wish for a no gc language,

you should look at Rust-lang,

the sort type system machinery needed to safely doing no GC is very different from the style GHC / haskell currently supports. The differences needed are quite substantial, and any effort to adjoin them to GHC would be quite a substantial type system research project, in addition to all the attendant systems engineering.

unless you have a concrete type system and engineering roadmap for such experimentation, this discussion is better on haskell-cafe or another community forum.

We've given you a number of background reading references to help you understand the nuances attendant with the goals you want (which are valuable, but HARD goals).

That said, if you have specific *detailed* technical insights, please share them with us.

comment:10 Changed 6 years ago by sirinath

If you can send me the references.

comment:11 Changed 6 years ago by carter

you need only google the languages and projects and papers we've named above. cheers

please direct further questions on this topic to haskell cafe mailing list if you wish to discuss the ideas with fellow haskellers.

this ticket is closed.

comment:12 Changed 6 years ago by ezyang

I'd also remark that with a copying collector, the scheme described in comment 8 probably isn't buying you much. Recall that the cost of tracing is only applicable to data that's live: so (paradoxically), the more garbage you have, the faster GC runs. Regions only begin to buy you performance when you are able to reason about lifetimes which carry beyond life-and-death in the nursery. Additionally when the lifetime of an object is really short, you can usually rely on the optimizer to remove the heap allocation altogether. So the current GC is quite a tough benchmark to beat!

comment:13 Changed 3 years ago by mentheta

Cc: mentheta added
Note: See TracTickets for help on using tickets.