Opened 10 years ago

Closed 8 years ago

#3480 closed task (fixed)

Easily make Typeable keys pure, so that Typeable can be handled efficiently across communications

Reported by: guest Owned by:
Priority: normal Milestone: 7.2.1
Component: libraries/base Version:
Keywords: Typeable, efficiency Cc: illissius@…, la@…, Lennart, NeilMitchell
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by simonpj)

Data.Typeable: Easily make Typeable keys pure(used in Eq), so that Typeable keys don´t vary from run to run. This permits an efficient storage of the keys in files and to be transmitted trough communications as well as processed without loss of efficiency. Actually gaining efficiency probably, since the keys caches are not necessary.

Currently, whenever the user needs to communicate types, he must transmit the full string name for each type. Moreover, in the reception, the programmer is forced to handle these full string keys for mapping types to handlers, in equality checks etc. if the type keys are pure, the efficiency of key handling can be keept across communications.

short description of task: Istead of using a Hash( stringType, generatedKey) use hashString (string-of-type)

Long description:

1) drop the cache; drop newKey

2) use instead the expression:

 Key $ hashString str

whenever a new key is needed from the package Data.HashTable:

hashString :: String -> Int 

the key obtained is pure so:

3) drop the "IO" in typeRepKey signature

Attachments (1)

Typeable.diff (3.9 KB) - added by guest 10 years ago.
Diff that implement these changes (compile Ok using GHC, but still not tested) Alberto G. Corona (agocorona@…)

Download all attachments as: .zip

Change History (16)

Changed 10 years ago by guest

Attachment: Typeable.diff added

Diff that implement these changes (compile Ok using GHC, but still not tested) Alberto G. Corona (agocorona@…)

comment:1 Changed 10 years ago by igloo

difficulty: Unknown
Milestone: 6.14.1

comment:2 Changed 10 years ago by simonpj

Description: modified (diff)

Something that does a better job of TypeRep would certainly be welcome:

  1. Those unsafePerformIOs are embarrassing. And indeed they lead to trouble if (as can happen in GHCi or Template Haskell) you get two instances of that hash table.
  1. If you were (naughtily) to serialise one of those keys into a file, they'd be wrong when you read them back in. But this never happens (see the Show instance for TypeRep: what you serialise is the TypeRep data structure without the keys. When it's read back in a new set of keys is constructed. (Actually there is no Read instance but that is what would happen if there were.)

The keys are solely used to speed up comparison. Comparing the data structures and strings is the underlying ground truth. So it's perfectly fine for keys to vary from run to run.

However, it's vital that if two TypeReps with the same key really are equal. Hashing the string to an Int does not guarantee that. To be sure, you'd need a lot more bits, something like the fingerprints we store in interface files. See compiler/utils/Fingerprint.hsc.

I'm not sure about the best approach. The baseline is:

  • Compare TypeRep structures as one would any data structure.
  • At a TyCon compare the String representation of the type constructor.

An alternative, that would do the fingerprint stuff once and for all at compile time, is:

  • Compare TypeRep structures as one would any data structure.
  • At a TyCon compare the fingerprint of the string representation of the TyCon

Neither of these give constant-time comparison of TypeReps, but I doubt that such comparisons are going to be in the inner loop of any program. And both fix (1) above

Other things to think about:

  • Surely TypeRep should be in Ord!
  • We should check that the String used to make a Typeable.TyCon includes the full, versioned, package Id.
  • Even then I'm slightly worried. The expection is that foo-3.2:Foo.T uniquely defines the entire structure of T. But what if you forget to bump the package version?
  • More subtly what if T is defined thus:
    data T = MkT S   -- S comes from package bar
    
    Then if S's definition in package bar changes, but package foo is unchanged, the TyCon for T should really change too. Otherwise you might serialise a value of type T into a file along with its TypeRep, and try to read it back in with the wrong deserialiser. I'm not sure whether the package versioning rules force foo's version to be bumped. Maybe we want foo's new PackageId?
  • But in fact the PackageId is perhaps too fine-grained, becuase it changes whenever anything in package foo changes. We really only want the Typeable.TyCon for T to change when the structure of T (transitively) changes. Aha! We already have fingerprints for that, used for versioning in interface files. So perhpas, once we've calculated T's fingerprint we can use that in the Typeable.TyCon for T. That seems to me to be best story.

We could do with someone who'd like to follow these things up. Dynamic typing is important.

comment:3 in reply to:  2 Changed 10 years ago by simonmar

Replying to simonpj:

Neither of these give constant-time comparison of TypeReps, but I doubt that such comparisons are going to be in the inner loop of any program. And both fix (1) above

Aren't such comparisons in the inner loop of SYB traversals?

comment:4 Changed 10 years ago by simonpj

Um, yes, that's true. I had forgotten that. But types are typically small.

comment:5 Changed 10 years ago by simonpj

Type of failure: None/Unknown

I talked to Simon M about the question of tidying up GHC's TypeRep mechanism. A good start would be the following:

  1. Generate a fingerprint from the String (including the fully-versioned module) of the TyCon (eg "base-4.3.2.2:GHC.Arr.Array")
  1. At each App node in the TypeRep generate a fingerprint from the fingerprints of the component pieces. We do this all the time when generating fingerprints in interface files. (This would retain constant-time comparison for TypeRep.)

This would completely deal with the single-program case, because in any one program there is definitely only one type constructor base-4.3.2.2:GHC.Arr.Array. If we wanted a stronger, cross-program, story, then we'd need a stronger fingerprint under (1), based on the type structure as outlined in an earlier comment. That is quite doable too, but would probably require us to inject a binding for each TyCon fingerprint only after they'd been computed in MkIface. A bit more plumbing, that's all.

This would be a straightforward project for someone to do. Since the fingerprinting for TypeRep must be done in Data.Typeable, which is in the base package, we'd need to add fingerprinting technology to base. Not hard, since GHC itself already has it, so we can just copy it over. But we don't want to make base depend on the binary package, so something a bit less general than GHC's fingerprinting interface is wanted.

Any volunteers?

Simon

comment:6 Changed 9 years ago by igloo

Milestone: 7.0.17.0.2

comment:7 Changed 9 years ago by igloo

Milestone: 7.0.27.2.1

comment:8 Changed 9 years ago by illissius

Cc: illissius@… added

comment:9 Changed 8 years ago by simonpj

Resolution: fixed
Status: newclosed

Whizzo. Simon M has done this in GHC 7.2.

comment:10 Changed 8 years ago by lealanko

Cc: la@… added
Resolution: fixed
Status: closednew

Unfortunately the new TypeRepKey type is abstract and provides no operations apart from Eq and Ord instances. Hence, there is still no way to write a type representation into a file or across the wire. Jush showing a TypeRep won't cut it, since that loses the distinctions between similarly named types in different modules.

So, I think we need a Show instance for TypeRepKey, or better yet, a Binary instance. Moreover, a non-deprecated pure function TypeRep -> TypeRepKey is in place, since the keys have an actual use as a compact, efficient, and serializable injection of the typereps.

Alternatively, the Show instance of TypeRep could be made injective by printing out the package and module.

(I intended to open a new ticket, I hope it's ok that I reopen this old one instead.)

comment:11 Changed 8 years ago by simonpj

Summary: Easily make Typeable keys pure, so that Typeable can be handled efficiently across communicationsAdd Show and Binary instances for TypeRep

I'm changing the title; patches welcome.

Simon

comment:12 Changed 8 years ago by simonmar

Cc: Lennart NeilMitchell added

A new ticket would have been *much* better...

I've exposed the representation via the Data.Typeable.Internal module. Of course that's not guaranteed to be a stable API, but I'm happy to add a stable API if we can agree on what it should be. Lennart Augustsson and Neil Mitchell are doing exactly this (serialising TypeRep) - I think they're using the Data.Typeable.Internal API right now.

So what shall we do?

comment:13 Changed 8 years ago by Lennart

For older versions of ghc I just used (show . typeOf). When ghc 7.2 broke this I had to start using Internal, so now I use tyConModule and tyConName to emulate the behaviour of the old show.

comment:14 Changed 8 years ago by lealanko

I'm sorry for messing up the ticket. I had assumed that the missing Show instance had been a simple oversight. If so, this would suffice:

instance Show TypeRepKey where
  show (TypeRepKey (Fingerprint hi lo)) = printf "%016x%016x" hi lo

A Binary instance is more difficult: where would it be put? It cannot be in Typeable.hs (as base cannot depend on binary), but it cannot be in Binary.hs either (as TypeRepKey is opaque and not exposed from Typeable).

I don't think it's a good idea to make TypeReps themselves publicly deserializable, as that would allow the creation of fake typereps. A separate one-way injection to a serializable TypeRepKey seems much more secure.

comment:15 Changed 8 years ago by simonmar

Resolution: fixed
Status: newclosed
Summary: Add Show and Binary instances for TypeRepEasily make Typeable keys pure, so that Typeable can be handled efficiently across communications

Changing the title back and closing the ticket again; new ticket is #5568.

Note: See TracTickets for help on using tickets.