Opened 12 years ago

Closed 9 years ago

Last modified 7 years ago

#1338 closed task (fixed)

base package breakup

Reported by: simonmar Owned by:
Priority: low Milestone: 7.0.1
Component: libraries/base Version: 6.6.1
Keywords: Cc: Bulat.Ziganshin@…, id@…, jpm@…, mokus@…, mail@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

This ticket replaces #710, some of which we've now done.

Latest proposal for splitting the base package: http://www.haskell.org/pipermail/libraries/2007-April/007342.html

Attachments (1)

packagegraph.png (14.2 KB) - added by igloo 11 years ago.

Download all attachments as: .zip

Change History (45)

comment:1 Changed 12 years ago by igloo

Owner: set to igloo

comment:2 Changed 12 years ago by igloo

Partially done. The packages that are already split out could do with some refactoring (e.g. changing to use filepath where appropriate). Still remaining is:

Ready to go:

System.Posix.Signals
--> unix (System.Cmd depends on it, but moves to new package process)

Control.Concurrent.*, System.Timeout
--> new package concurrent

Data.Unique
--> new package unique (dep on concurrent)

System.Console.GetOpt
---> new package getopt

Not ready:

Not clear what to do with these:
 Control.Applicative
 Data.Foldable, Data.Traversable
 Data.Map, Data.IntMap, Data.Set, Data.IntSet
 Data.Sequence, Data.Tree
 Data.HashTable
 Data.Graph
 ---> new package collections? containers?  or split further?
      (dep. on array, generics, concurrent)
 Data.Array.*
 --> new package array (maybe; I'm slightly dubious here)
      (dep. on concurrent for Data.Array.Diff)

Needs the above to happen first:
 Data.Generics.*
 --> generics (maybe; Data class is defined for everything and is derivable)

Will happen around the second half of June:
 Data.ByteString.*
 --> bytestring (dep. on base, generics, array)

Other modules we might move:
Text.Printf, Data.Monoid, System.CPUTime

Ross suggests System.Posix.Signal might belong in process too.

comment:3 Changed 12 years ago by igloo

[14:31] < ndm> Igloo: re the "generics" package
[14:31] < ndm> perhaps it should be the syb package
[14:31] < ndm> since its an implementation of syb, not an implementation of 
               generics
[14:31] < ndm> i'm going to be releasing Data.Generics.Uniplate shortly, and 
               thats going to want to be a separate package outside of generics
[14:32] < ndm> (it should have always probably been Data.Generics.SYB, but its 
               too late to change that)

comment:4 Changed 12 years ago by guest

Cc: Bulat.Ziganshin@… added

this is very important task from my POV, especially splitting out ByteStrings which are still quickly improved

it will be great also to split out i/o and data structures. if there some problems that doesn't alllow to do it (such as Exception which defined via Handle and Prelude that includes i/o functions), we can setup ticket to change this too (probably in 6.10, though)

comment:5 Changed 12 years ago by Isaac Dupree

Cc: id@… added

As I think I mentioned somewhere else, we could put Prelude proper in a package other than base. Probably base needs to export Prelude, so it could re-export from some other package that base depends on, and move several things "pre-base", which will require some support from GHC. Eventually normal programs might not need to depend on "base" because they specify all their dependencies otherwise. (Or we could move the big ugly Prelude to package haskell98 and force people to depend on that if they want to import Prelude. Which wouldn't work very well until we have a way to avoid importing Prelude that works in all important Haskell compilers.)

It's important but will take a while. I'm patient. It would be nice if we could avoid each major release of GHC being incompatible in how to compile things with it, but we shouldn't let that hinder us from making progress towards a state where packages are more upgradeable. (We also need more from Cabal, which isn't exactly part of GHC proper.)

comment:6 Changed 12 years ago by igloo

Duncan is working on getting the new bytestring package into shape, and should be done comfortably before the GHC RC. At that point we'll remove Data.ByteString* from base and use that instead.

comment:7 Changed 12 years ago by igloo

Ready to go:

System.Console.GetOpt
---> new package getopt

Not ready:

Causes a unix<->process dependency loop:
System.Posix.Signals
--> unix (System.Cmd depends on it, but moves to new package process)

Not clear what to do with these:
 Control.Applicative
 Data.Foldable, Data.Traversable
 Data.Map, Data.IntMap, Data.Set, Data.IntSet
 Data.Sequence, Data.Tree
 Data.HashTable
 Data.Graph
 ---> new package collections? containers?  or split further?
      (dep. on array, generics, concurrent)
 Data.Array.*
 --> new package array (maybe; I'm slightly dubious here)
      (dep. on concurrent for Data.Array.Diff)

Needs the above to happen first:
 Data.Generics.*
 --> generics (maybe; Data class is defined for everything and is derivable)

Needs Data.Array.Diff to move out of base first:
Control.Concurrent.*, System.Timeout
--> new package concurrent

Needs concurrent to be done first:
Data.Unique
--> new package unique (dep on concurrent)

Will happen soon:
 Data.ByteString.*
 --> bytestring (dep. on base, generics, array)

Other modules we might move:
Text.Printf, Data.Monoid, System.CPUTime

Ross suggests System.Posix.Signal might belong in process too.

comment:8 Changed 12 years ago by simonmar

Regarding System.Posix.Signals, it looks like the unix->process dependency is bogus, the two internal bits that unix depends on:

import System.Process.Internals ( pPrPr_disableITimers, c_execvpe )

should be moved to the unix package, and then process can depend on unix. With any luck that will enable some bits of process to be cleaned up, too.

comment:9 Changed 12 years ago by duncan

Please, please can we keep the class interfaces in the same package as Monad, Functor etc. So that'd be Control.Applicative, Data.Foldable and Data.Traversable. Otherwise people will be highly dissuaded from making their data types instances of Applicative etc. Just imagine if Functor was not in the base package and people had to depend on another package specifically, noone would ever make their data types an instance of functor since people prefer to keep deps to a minimum. So common interfaces should stay relatively close to the root of the package dep tree, implementations can be further down.

So moving the concrete implementations Map, Set, etc etc to a data/collections package is fine of course.

comment:10 Changed 12 years ago by igloo

Milestone: 6.86.1

We're about as far as we're going to get for 6.8 now, so moving to milestone 6.10.

comment:11 Changed 11 years ago by igloo

This is currently more-or-less blocked on "extensible exceptions", as that should get rid of most of the circular import problems.

comment:12 Changed 11 years ago by igloo

Component: Compilerlibraries/base

comment:13 Changed 11 years ago by simonmar

Priority: highnormal

Not essential for 6.10.1.

comment:14 Changed 11 years ago by igloo

Updated proposal. I'll attach packagegraph.png showing the package deps.

This block is mostly as before, except timeout has to be in its own package so that unique can sit in the middle:

timeout:        System.Timeout

unique:         Data.Unique

concurrent:     Control.Concurrent
                Control.Concurrent.Chan
                Control.Concurrent.MVar
                Control.Concurrent.QSem
                Control.Concurrent.QSemN
                Control.Concurrent.SampleVar

st can be pulled out:

st:             Control.Monad.ST
                Control.Monad.ST.Lazy
                Control.Monad.ST.Strict
                Data.STRef
                Data.STRef.Lazy
                Data.STRef.Strict

control should probably actually be merged with containers, but making it its own package made my experimenting simpler:

control:        Control.Applicative
                Data.Foldable
                Data.Monoid
                Data.Traversable

ghc-exts:       GHC.Exts
                GHC.PArr

The System.Mem modules don't really seem to fit here, but I didn't have anywhere better to put them, and they are under System after all.

system:         System.CPUTime
                System.Environment
                System.Exit
                System.Info
                System.Mem
                System.Mem.StableName
                System.Mem.Weak

numeric:        Data.Complex
                Data.Fixed
                Data.Ratio

generics:       Data.Generics
                Data.Generics.Aliases
                Data.Generics.Basics
                Data.Generics.Instances
                Data.Generics.Schemes
                Data.Generics.Text
                Data.Generics.Twins

version:        Data.Version

Little misc packages; we might want to fold some of these back in later, but for now I just wanted to get them out of the way:

getopt:         System.Console.GetOpt

debug:          Debug.Trace

printf:         Text.Printf

Again, these I was just getting out of the way. They're internal to GHC, so where they end up shouldn't much matter:

ghc-bits:       GHC.ConsoleHandler
                GHC.Desugar
                GHC.Environment
                GHC.TopHandler

The rest of base I currently have cut in 2, with a foreign package stuck in the middle. If things don't improve here then I expect we'll stick them all back together for 6.10:

base-top:       Control.Exception
                Control.OldException
                Control.Category
                Control.Arrow
                Control.Monad.Fix
                Control.Monad.Instances
                Text.Show
                Text.Show.Functions
                System.IO.Error
                System.IO
                System.Posix.Types
                System.Posix.Internals
                Data.Ix
                Data.Function
                Prelude

foreign         Foreign
                Foreign.C
                Foreign.C.Error
                Foreign.C.String
                Foreign.Concurrent (GHC-only)
                Foreign.ForeignPtr
                Foreign.Marshal
                Foreign.Marshal.Alloc
                Foreign.Marshal.Array
                Foreign.Marshal.Error
                Foreign.Marshal.Pool
                Foreign.Marshal.Utils
                Foreign.Ptr
                Foreign.StablePtr

base:           Control.Monad
                Data.Bits
                Data.Bool
                Data.Char
                Data.Dynamic
                Data.Either
                Data.Eq
                Data.HashTable
                Data.IORef
                Data.Int
                Data.List
                Data.Maybe
                Data.Ord
                Data.String
                Data.Tuple
                Data.Typeable
                Data.Word
                Foreign.C.Types
                Foreign.Storable
                Numeric
                System.IO.Unsafe
                Text.ParserCombinators.ReadP
                Text.ParserCombinators.ReadPrec
                Text.Read
                Text.Read.Lex
                Unsafe.Coerce
                (plus a load of GHC-only internal modules)

Changed 11 years ago by igloo

Attachment: packagegraph.png added

comment:15 Changed 11 years ago by simonmar

Yes to pulling out concurrent, st, generics, getopt, and moving the Control.Applicative stuff into containers. The rest don't seem to buy us a great deal, and I'm concerned that we're ending up with a plethora of tiny packages.

I'll commit the base3-compat stuff as soon as I can get it to validate on Windows, and then it'll need to be updated to reflect these changes.

comment:17 Changed 11 years ago by simonpj

I'm also a bit concerned about creating lots of tiny packages. Maybe we can do this a step at a time?

If, indeed, we need do anything at all. What is the Main Goal here? Who is pushing for further decomposition of 'base', and what gains does it bring? Are these gains the most important thing to spend our limited effort budget on? There are plenty of other pressing issues! (Untying the mutual recursion is good regardless of further break-up.)

Simon

comment:18 Changed 11 years ago by igloo

One advantage of making base small is that if you are, for example, debugging GHC.Handle then you don't have to recompile >100 other modules every time you make a change in it.

Being able to separately upgrade the different parts is another advantage. Also, it means that we can have a separate maintainer for, e.g., SYB (well, this doesn't technically need it to be a separate package, but it's conceptually simpler if it is).

Breaking base up into packages also makes it much easier to see what the hierarchy is, and makes it easier to restructure the hierarchy. Plus it means that people can't re-tangle the logically separate components, which is all too easy to do when you just have one huge package.

It also means that packages are clearer about what they depend on. One possibility, which I think would be really cool, is to separate all the IO modules from the non-IO modules; between that and looking at the extensions used (e.g. TH and FFI) it would then be clear whether or not a library could do any IO. Of course, the Prelude is a hurdle for this goal.

comment:19 Changed 11 years ago by spl

I can't speak for most packages, but I would like to see that the Data.Generics modules are broken out into a separate package for easier maintainability and upgradeability.

Also, as was discussed in the thread linked by igloo, I would like to call it "syb" instead of "generics."

comment:20 Changed 11 years ago by simonpj

Igloo: these are all good goals. The question is really how high up our priority list they are.

spl: yes, I agree there's a specific reason for the generics stuff.

comment:21 in reply to:  18 ; Changed 11 years ago by simonmar

Replying to igloo:

One advantage of making base small is that if you are, for example, debugging GHC.Handle then you don't have to recompile >100 other modules every time you make a change in it.

All the other advantages are good, but this one is false I think. If you modify GHC.Handle you do have to recompile all the modules above it in the hierarchy, regardless of whether they're in another package or not. GHC may be able to avoid actual recompilation, but you at least need to invoke GHC on every module. (currently the build system doesn't do this except within a package, which is bad, and something we hope to fix).

comment:22 in reply to:  21 ; Changed 11 years ago by igloo

Replying to simonmar:

Replying to igloo:

One advantage of making base small is that if you are, for example, debugging GHC.Handle then you don't have to recompile >100 other modules every time you make a change in it.

All the other advantages are good, but this one is false I think. If you modify GHC.Handle you do have to recompile all the modules above it in the hierarchy, regardless of whether they're in another package or not.

Let me try to clarify: If you're debugging GHC.Handle then you don't need to recompile, for example, GetOpt after adding a debugging print or when you want to test a fix.

Once you've actually fixed the bug you'll need to recompile everything so that the other libraries all work again, agreed.

comment:23 in reply to:  22 Changed 11 years ago by simonmar

Replying to igloo:

Let me try to clarify: If you're debugging GHC.Handle then you don't need to recompile, for example, GetOpt after adding a debugging print or when you want to test a fix.

Once you've actually fixed the bug you'll need to recompile everything so that the other libraries all work again, agreed.

Ok, I see what you meant. Thanks!

comment:24 Changed 11 years ago by dreixel

Cc: jpm@… added

comment:25 Changed 11 years ago by igloo

Milestone: 6.10 branch6.12 branch
Owner: igloo deleted

I've done the parts of this that nobody objected to namely:

concurrent, unique, timeout
st
syb (was: generics)
getopt

comment:26 Changed 11 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:27 Changed 11 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:28 Changed 11 years ago by igloo

Although it looks, from the source, like the IO part of Data.Typeable should be able to be split off from the Typeable classes etc, this is sadly not the case.

Right down at the bottom of the module hierarchy we have

error s = throw (ErrorCall s)

which needs ErrorCall to have a Typeable instance. Although in the source this is just deriving Typeable, the generated code calls mkTyCon, which calls mkTyConKey, which does IO (hidden by unsafePerformIO).

comment:29 Changed 10 years ago by mokus

Would it be possible to implement at least as much of this splitting as is needed to remove all "unsafe" operations from base? I would very much like to be able to use packages as a 'unit of analysis' for a relatively simple library I'm working on that attempts to maintain a package database of "safe" packages (verified not to perform any "dangerous" IO outside the IO type, for example).

Unfortunately, as it stands now the analysis is polluted from the start due to base's inclusion of unsafePerformIO, unsafeCoerce, unsafeSTToIO, the ability to define custom Typeable instances, large parts of the whole GHC subtree, and possibly other surprises. If these could be removed from base, it would make things much simpler. At present I am using a "base-safe" wrapper that uses the PackageImports extension (thanks for that by the way, it's great!) to export a subset of the base library, but it would be much cleaner to have base itself be "pure" so that all cabal packages using it (and build-type Simple and only "safe" language extensions) could be automatically ruled "pure" as well.

Plus, doesn't it just give you lots of warm fuzzies to think of Haskell's base library as pure? Or to put it differently, doesn't it just give you the screaming heebie-jeebies that unsafePerformIO is in base at all? ;)

If proliferation of small packages is a concern, perhaps the split could be implemented by renaming the existing base to, say, "base-unsafe", and providing smaller interface wrapper packages using PackageImports as, e.g., base, st, unsafe-io, unsafe-st, foreign, and so on. I don't know whether that actually solves whatever the concern is over small packages, but it would at least allow the interface to be broken up without requiring as much effort as a full disentanglement of all the module dependencies.

comment:30 Changed 10 years ago by mokus

Cc: mokus@… added

comment:31 in reply to:  29 ; Changed 10 years ago by igloo

Replying to mokus:

Would it be possible to implement at least as much of this splitting as is needed to remove all "unsafe" operations from base?

Unfortunately, this isn't as easy as you might expect. Exceptions require Typeable, which uses unsafePerformIO and IORef, which is enough to allow you to both unsafePerformIO (obviously) and unsafeCoerce.

Of course, you could make them not be exported, but that's not as nice as them being completely absent.

comment:32 in reply to:  31 Changed 10 years ago by mokus

Replying to igloo:

Unfortunately, this isn't as easy as you might expect. Exceptions require Typeable, which uses unsafePerformIO and IORef, which is enough to allow you to both unsafePerformIO (obviously) and unsafeCoerce.

Of course, you could make them not be exported, but that's not as nice as them being completely absent.

As far as I can tell, Typeable is safe as long as the end user cannot inject their own implementations of typeOf. As a separate issue, if this is not correct I'm interested to know that as well ;).

In my current implementation, I export Typeable with the class methods replaced by identically-typed functions, so that Typeable may be used but no 'tricky' implementations may be created. It is still possible to derive instances as long as the class itself is in scope, and as far as I can tell this is sufficient to make Typeable both useful and safe. Likewise for Typeable1, Typeable2, and the rest, as well as Data and Ix just to be safe.

I'm not requesting that unsafe operations not be present, only that they not be available to packages only importing the base package. Clearly the unsafe operations should be exposed somehow, because they're just too darn useful not to provide. Presently I do this by compiling against the un-wrapped version of base when building other "safe" packages like bytestring (similarly wrapped with a 'safe' version hiding its unsafe operations) that make use of unsafe interfaces but cleanly encapsulate them. I expect that any practical strategy for achieving a safe base would either keep the whole unsafe base around, perhaps under a different name, or place the unsafe operations in separate package(s) depended-upon by base and other packages that need them.

comment:33 in reply to:  29 ; Changed 10 years ago by duncan

Replying to mokus:

Would it be possible to implement at least as much of this splitting as is needed to remove all "unsafe" operations from base? I would very much like to be able to use packages as a 'unit of analysis' for a relatively simple library I'm working on that attempts to maintain a package database of "safe" packages (verified not to perform any "dangerous" IO outside the IO type, for example).

I would suggest an approach using a combination of annotations and analysis. Certainly a package that contains only pure functions and depends only on other packages you've deemed to be ok, is itself ok. That requires no annotation, just analysis. Your problem of course is packages that either exports genuinely unsafe things or that uses unsafe things to export safe abstractions. My suggestion here is to annotate the things that are safe but use unsafe things internally. Of course you then need to "believe" the annotations, so you would want a way to specify a list of packages where you will trust the annotations to be correct. You don't need to annotate things as unsafe since all primitives are considered unsafe unless marked safe.

The point is you do not need to conflate the safe/unsafe distinction with package boundaries.

comment:34 in reply to:  33 Changed 10 years ago by mokus

Replying to duncan:

I would suggest an approach using a combination of annotations and analysis. Certainly a package that contains only pure functions and depends only on other packages you've deemed to be ok, is itself ok. That requires no annotation, just analysis. Your problem of course is packages that either exports genuinely unsafe things or that uses unsafe things to export safe abstractions. My suggestion here is to annotate the things that are safe but use unsafe things internally. Of course you then need to "believe" the annotations, so you would want a way to specify a list of packages where you will trust the annotations to be correct. You don't need to annotate things as unsafe since all primitives are considered unsafe unless marked safe.

The point is you do not need to conflate the safe/unsafe distinction with package boundaries.

I agree with this in principle, however I believe that it introduces an unnecessary level of indirection. The type system in Haskell, and GHC in particular, is already a very powerful annotation system which under normal circumstances automatically tracks exactly the kind of safety information I need, and automatically infers it at the expression level. Any annotation system I implement in addition to that would be likely be a large amount of wasted effort to provide essentially a very course-grained and primitive supplemental type system. What I'm doing now is essentially what you suggest, but using types as the "annotations". If I could include the base package among those packages for which I can trust the annotations, it would greatly simplify certain aspects of my implementation.

As far as conflating safety distinctions with package boundaries, I can see how I gave that impression but I don't believe that's what I'm doing. Ultimately, trust is what I am delimiting with package boundaries. As packages are the basic units of code deployment I believe this is a very natural boundary.

comment:35 Changed 10 years ago by simonmar

Why not make a safe-base package that re-exports everything from base except the unsafe bits?

comment:36 in reply to:  35 Changed 10 years ago by mokus

Replying to simonmar:

Why not make a safe-base package that re-exports everything from base except the unsafe bits?

That's exactly what I've got right now, and it works fairly well, but it seems to make the process of building safe hackage packages more complex than it could be because I either have to manually "port" each one I want to use or do what I'm doing now which is to use a cabal-install-like build driver that automatically converts dependencies on base to dependencies on safe-base, hoping it really works. If the "real" base were safe I would still be using a fairly customized build tool for other analysis, but it would make for two less things I have to worry about. First, I would not have to modify the PackageDescription, and second, I could have more confidence that a given package will actually successfully build, which I cannot generally expect if I'm dynamically substituting a false base that the package has not necessarily ever been compiled against.

I can certainly put my wrapper package up on hackage, and I'd hope it would get used when base is not needed but I'm rather doubtful that it would given that all it does is pretend to be base and provide less functionality. But then again, the Haskell community has surprised me many times before, and I would not be very surprised to be wrong about that.

If that winds up being the official recommendation then I can do so without too much trouble, but I think the idea of making base itself type-safe was at least worth bringing up. It seems to me rather appealing on a philosophical level as well, though of course that's not in itself a good enough reason to trump practical concerns.

comment:37 in reply to:  35 ; Changed 10 years ago by igloo

Replying to simonmar:

Why not make a safe-base package that re-exports everything from base except the unsafe bits?

I think it would be a pain to keep it both complete and correct.

comment:38 in reply to:  37 Changed 10 years ago by mokus

Replying to igloo:

Replying to simonmar:

Why not make a safe-base package that re-exports everything from base except the unsafe bits?

I think it would be a pain to keep it both complete and correct.

A bit, but it's not too bad. I've got the versions of my wrapper package tied by strict equality to the versions of base, so it won't compile with a base I haven't examined. base changes infrequently enough that this is quite tractable for me.

comment:39 Changed 10 years ago by mokus

For what it's worth, I am willing to do as much of the dirty work as I am able given my current level of familiarity with the GHC build system as is necessary to perform a 'safe/unsafe' split if it is something that will actually have a good chance of inclusion.

I had already started dabbling in that direction, but wanted to know whether it has any chance of being accepted before digging too deep.

comment:40 Changed 10 years ago by igloo

I think it's strange that, given how much easier a "safe" Haskell should be compared to languages like PHP and perl, they have safe modes and we don't. Personally I think that we should definitely do something along those lines, although we should think carefully about what the options are, and what the best way to achieve it is; this strikes me as the sort of thing where it's possible to do a lot of work, and then realise that you've missed a corner.

See also #1380.

comment:41 Changed 9 years ago by igloo

Milestone: 6.12 branch6.12.3

comment:42 Changed 9 years ago by igloo

Milestone: 6.12.36.14.1
Priority: normallow

comment:43 Changed 9 years ago by igloo

Resolution: fixed
Status: newclosed
Type of failure: None/Unknown

I don't think any more will happen on this without someone making library proposals.

comment:44 Changed 7 years ago by nomeata

Cc: mail@… added

Just a reference to yet another attempt at splitting up base, getting stuck at similar points: http://hackage.haskell.org/trac/ghc/wiki/SplitBase

Note: See TracTickets for help on using tickets.