Opened 9 years ago

Closed 9 years ago

#4342 closed bug (fixed)

Review containers changes

Reported by: igloo Owned by:
Priority: high Milestone: 7.4.1
Component: libraries (other) Version: 7.1
Keywords: Cc: fox@…, johan.tibell@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

The recent containers changes have caused an allocation regression in GHC:

in T1969:
   containers-0.3,                allocations: 246,402,492
   containers + dons/tibbe/milan, allocations: 283,370,180 (+15%)

in T3294:
   containers-0.3,                allocations: 832,307,880
   containers + dons/tibbe/milan, allocations: 950,107,000 (+14%)

Some of the patches were also pushed in a hurry in the buildup to the 7.0 release. We should review them.

Fri Sep 24 16:49:46 BST 2010  Milan Straka <fox@ucw.cz>
  * Fix warnings in Data.Map and Data.Set.
  Ignore-this: cb2c0a8ecf0a57acc5941ba841aa7c40

  Only trivial changes.

Fri Sep 24 16:33:53 BST 2010  Milan Straka <fox@ucw.cz>
  * Finish the started worker/wrapper transformation.
  Ignore-this: baeb24573242beb56c3bbe7ca67f5ff7

  Some methods (insert, lookup) were not modified as the rest
  (like insertWith, delete, ...). Also the `seq` were missing
  sometimes.

Fri Sep 24 16:26:42 BST 2010  Milan Straka <fox@ucw.cz>
  * Merge all the OPTIONS and LANGUAGE module pragmas.
  Ignore-this: 86067abf13f0501f29c13ec7c877533c

Fri Sep 24 16:20:08 BST 2010  Milan Straka <fox@ucw.cz>
  * Remove most INLINE from Map, Set, IntMap and IntSet.
  Ignore-this: c88c4ede21c06bfda20af131c232a720

  Because of a code bloat the INLINEs cause, remove most of
  them. The only INLINEs left are the following:
  - only in Set and Map, because in IntMap and IntSet the specialization
    does not help
  - only on functions which need Ord
  - only on 'small' functions, namely member, notMember, lookup*,
    insert*, delete*, adjust*, alter*, update*

  All other functions of Map, Set, IntMap and IntSet are marked INLINABLE,
  even if they are recursive.

  The INLINEs left are only a short-term solution. In the long run the
  auto-specialization of INLINABLE methods seems a good way (maybe
  SPECIALIZABLE).

Fri Sep 24 12:07:05 BST 2010  Milan Straka <fox@ucw.cz>
  * Comment tests and benchmarks on foldlWithKey' which
  Ignore-this: 71b988389e6ae9a78ea3b0e20156ca2f
  was commented recently by Ian Lynagh.

Thu Sep 23 13:56:04 BST 2010  Milan Straka <fox@ucw.cz>
  * Worker/wrapper transformation for Data.IntSet.
  Ignore-this: b0228582818f7bfb690d0853022a7809

Tue Sep 21 12:58:21 BST 2010  Milan Straka <fox@ucw.cz>
  * Compile only the benchmark source, not the Data/*.hs.
  Ignore-this: f94d9e3ffe126cd057d23490c973a4e9

Tue Sep 21 11:32:25 BST 2010  Milan Straka <fox@ucw.cz>
  * Add criterion-based benchmark for IntSet.hs.
  Ignore-this: 3d31a820830c7382748626bc9a1ba54

  The benchmark is nearly identical copy of Set.hs benchmark.

Tue Sep 21 11:28:02 BST 2010  Milan Straka <fox@ucw.cz>
  * Add a testsuite for Data.IntSet.
  Ignore-this: e55484ee185e71915452bdf2a7b2a2b3

Tue Sep 21 10:18:28 BST 2010  Milan Straka <fox@ucw.cz>
  * Further improve Data.Set balance function.
  Ignore-this: f23be37859224e9bbe919a3c0a71fdc6

  As suggested by Kazu Yamamoto, we split balance to balanceL and
  balanceR, which handle only one-sided inbalance, but need fewer
  tests than balance.

  As nearly all functions modifying the structure use balance, this
  results in speedup of many functions. On my 32-bit GHC 6.12.1,
  11% speedup for insert, 12% speedup for delete.

Tue Sep 21 10:15:47 BST 2010  Milan Straka <fox@ucw.cz>
  * Further improve Data.Map balance function.
  Ignore-this: 8abfd027142a5183b2b5282e96ccb414

  As suggested by Kazu Yamamoto, we split balance to balanceL and
  balanceR, which handle only one-sided inbalance, but need fewer
  tests than balance.

  As nearly all functions modifying the structure use balance, this
  results in speedup of many functions. On my 32-bit GHC 6.12.1,
  20% speedup for insert, 7% speedup for delete, 5% speedup for update.

Tue Sep 21 10:05:07 BST 2010  Milan Straka <fox@ucw.cz>
  * Changing delta to 3 in Data.Set.
  Ignore-this: a47d0c542ed9cee99ad6b17c52c977a1

  Only possible values are 3 and 4. The value 3 has much faster inserts,
  value 4 slightly faster deletes, so choosing 3.

  Also changed the inequalities to rebalance only when one subtree
  is _strictly_ larger than delta * the other one, to mimic the behaviour
  from the proof (both from the Adams' and from the one to come).

Tue Sep 21 10:03:58 BST 2010  Milan Straka <fox@ucw.cz>
  * Changing delta to 3 in Data.Map.
  Ignore-this: 85f733f836b65b2b1038383ddb92e8e1

  Only possible values are 3 and 4. The value 3 has much faster inserts,
  value 4 slightly faster deletes, so choosing 3.

  Also changed the inequalities to rebalance only when one subtree
  is _strictly_ larger than delta * the other one, to mimic the behaviour
  from the proof (both from the Adams' and from the one to come).

Tue Sep 14 16:04:42 BST 2010  Milan Straka <fox@ucw.cz>
  * Correct Data.Set Arbitrary instance never to return unbalanced trees.
  Ignore-this: b5c70fa98a56f225b8eb5faf420677b0

  The previous instance sometimes returned unbalanced trees,
  which broke the tests.

  Also the new instance mimics Data.Map instance more closely in the shape
  of the generated trees.

Tue Sep 14 15:58:41 BST 2010  Milan Straka <fox@ucw.cz>
  * Correct Data.Map Arbitrary instance never to return unbalanced trees.
  Ignore-this: 114bbcc63acdb16b77140ea56aeb0a95

  The previous instance sometimes returned unbalanced trees,
  which broke the tests.

Tue Sep 14 15:20:10 BST 2010  Milan Straka <fox@ucw.cz>
  * Improve Data.Set benchmark.
  Ignore-this: 9b878ae3aa5a43ef083abfd7f9b22513

  Add union, difference and intersection to Data.Set benchmark.

Tue Sep 14 15:17:07 BST 2010  Milan Straka <fox@ucw.cz>
  * Improve benchmark infrastructure and Data.Map benchmark.
  Ignore-this: 67e8dafcb4abcb9c726b9b29c7c320fd

  Renamed Benchmarks.hs to Map.hs, as it only benchmarks Data.Map.
  Improve the Makefile to work with multiple benchmarks.
  Add union, difference and intersection to Data.Map benchmark.

Tue Sep 14 15:04:17 BST 2010  Milan Straka <fox@ucw.cz>
  * Improve the performance of Data.Set balance function.
  Ignore-this: 577c511c219695b8d483af546c7387e8

  The balance function is now one monolithic function, which allows
  to perform all pattern-matches only once.

  Nearly all functions modifying Data.Map use balance.
  The improvements are 12% for insert, 14% for delete (GHC 6.12.1).

Tue Sep 14 15:02:17 BST 2010  Milan Straka <fox@ucw.cz>
  * Improve the performance of Data.Map balance function.
  Ignore-this: 951181e035fcac90674dff3300350a1

  The balance function is now one monolithic function, which allows
  to perform all pattern-matches only once.

  Nearly all functions modifying Data.Map use balance.
  The improvements are 7-11% for various insert*, delete*, alter,
  update or intersection functions (GHC 6.12.1).

Tue Sep 14 14:57:25 BST 2010  Milan Straka <fox@ucw.cz>
  * Improve performance of Data.Set union and difference operations.
  Ignore-this: 6dc4a186ea060b9cdb9e783db71ca280

  Use datatype storing evaluated bound instead of high-order functions.
  The improvements are over 25% for both union and difference (GHC 6.12.1).

Tue Sep 14 14:46:14 BST 2010  Milan Straka <fox@ucw.cz>
  * Improve performance of Data.Map union* and difference* operations.
  Ignore-this: 35b23a40ef33e9fa14eb81fdee4b152d

  Use datatype storing evaluated bound instead of high-order functions.
  The improvements are 22% for union and 20% for difference (GHC 6.12.1).

Mon Sep 13 17:51:32 BST 2010  Milan Straka <fox@ucw.cz>
  * Make the Set store the elements evaluated (bang added).
  Ignore-this: b3f230db5bf30d93d3fddf2c81c5f3b4

Tue Aug 31 13:43:52 BST 2010  Johan Tibell <johan.tibell@gmail.com>
  * Improved performance of Data.Set
  Ignore-this: 38a304a0408d29a2956aa9a1fc0ce755

  Performance improvements are due to manually applying the
  worker/wrapper transformation and strictifying the keys.

  Average speed-up is 32% on a 2GHz Core 2 Duo on OS X 10.5.8

Tue Aug 31 13:42:25 BST 2010  Johan Tibell <johan.tibell@gmail.com>
  * Added benchmarks for Data.Set
  Ignore-this: fcacf88761034b8c534d936f0b336cc0

Tue Aug 31 13:40:30 BST 2010  Johan Tibell <johan.tibell@gmail.com>
  * Added a test suite for Data.Set
  Ignore-this: f430dc302c0fcb8b5d62db2272a1d6f7

  Expression coverage: 74%

Thu Sep 23 13:08:38 BST 2010  simonpj@microsoft.com
  * Remove use of lambdas with irrefutable patterns
  Ignore-this: c36e90a0258c0d5262684c585c321419

Wed Sep 15 14:51:03 BST 2010  Ian Lynagh <igloo@earth.li>
  * Revert the recent contentious changes
  Ignore-this: fe4f71ff1ade51c11421dc9974aa0fda
  These will probably be tidied up and reinstated later, but this gets
  the package back to a releasable state.

Tue Aug 31 12:45:55 BST 2010  Simon Marlow <marlowsd@gmail.com>
  * fix warnings
  Ignore-this: 53df71bc054a779b8ad2dad89c09e02d

Tue Aug 31 10:34:46 BST 2010  Don Stewart <dons@galois.com>
  * Missing MagicHash for IntSet
  Ignore-this: d075f760adb9a2aa0ee04676e38a07cc

Tue Aug 31 10:33:16 BST 2010  Don Stewart <dons@galois.com>
  * Performance improvements for Data.IntMap (worker/wrapper and inlining)
  Ignore-this: 206036448558d270f0eb85ef4cd55368

Tue Aug 31 10:32:40 BST 2010  Don Stewart <dons@galois.com>
  * Add criterion-based benchmarking for IntMap
  Ignore-this: d7d85b9afb513532cc30f5b51a3f825e

Tue Aug 31 10:32:02 BST 2010  Don Stewart <dons@galois.com>
  * Add comprehensive testsuite for IntMap
  Ignore-this: d455fedbc615e5b63ac488e605550557

Tue Aug 31 10:29:56 BST 2010  Don Stewart <dons@galois.com>
  * -O2 -fregs-graph is a uniform 10% improvements for IntMap
  Ignore-this: 2372cf4be945fe7939d0af94e32c567f

Sun Aug 29 13:26:28 BST 2010  Don Stewart <dons@galois.com>
  * Major bump (new functions, clarified strictness properties, vastly better performance)
  Ignore-this: 9bfbc58ecaa24a86be37b8c4cb043457

Sun Aug 29 13:21:47 BST 2010  Don Stewart <dons@galois.com>
  * Add two new functions: foldlWithKey' and insertLookupWithKey'
  Ignore-this: a2f112653ba38737fe1b38609e06c314

  These two functions use strict accumulators, compared to their existing
  counterparts (which are lazy left folds, that appear not to be useful).
  Performance is significantly better.


Sun Aug 29 12:46:11 BST 2010  Don Stewart <dons@galois.com>
  * Add a criterion-based benchmark suite for Data.Map
  Ignore-this: ec61668f5bcb78bd15b72e2728c01c19

  This adds a criterion-based micro-benchmarking suite for Data.Map. It
  can be used to measure performance improvements for individual top-level
  functions.

  Examples here: http://is.gd/eJHIE


Sun Aug 29 17:33:29 BST 2010  Don Stewart <dons@galois.com>
  * Missed base case for updateAt worker. Spotted by Jan-Willem Maessen
  Ignore-this: b8daf1c55c163c16f50c3b54cca2dba1

Sun Aug 29 13:02:45 BST 2010  Don Stewart <dons@galois.com>
  * Performance improvements to Data.Map
  Ignore-this: b4830cddfa6d62e4883f4e0f58ac4e57

  Applied several standard transformations to improve the performance of
  code:

      * Worker/wrapper of all recursive functions with constant arguments
      * Inlining of all (non-recursive) wrappers
      * Consistent use of strict keys

  Average performance improvements across common API (with GHC 6.12.3):

      * Linux / x86_64 / 2.6Ghz i7        : 48%
      * Mac OSX 10.5 / x86 / 2 Ghz Xeon   : 36%

  Graphs and raw data: http://is.gd/eJHIE

  This patch is (mostly) orthogonal to the algorithmic changes suggested
  by Milan Straka in his HW 2010 paper:

      http://research.microsoft.com/~simonpj/papers/containers/containers.pdf

  Those changes could be added separately, for some additional improvments.

  Work carried out over 28/29th August, 2010 in Utrecht, NL, by Johan Tibell
  and Don Stewart.


Sun Aug 29 12:35:45 BST 2010  Don Stewart <dons@galois.com>
  * Add a comprehensive testsuite for Data.Map
  Ignore-this: 891e7fe6bac3523868714ac1ff51c0a3

  This patch adds a joint quickcheck2 / hunit testsuite, with coverage of
  91% of top level functions (remaining features are mostly in instances).

  The coverage data is here:

      http://code.haskell.org/~dons/tests/containers/hpc_index.html

  No bugs were found. It includes unit tests for known past bugs
  (balancing).

Attachments (5)

containers_performance.patch (234.5 KB) - added by milan 9 years ago.
containers-benchmark-data.tar.gz (24.8 KB) - added by milan 9 years ago.
Selected readable benchmark comparisons.
containers-benchmark-data-raw.tar.lzma (225.7 KB) - added by milan 9 years ago.
All benchmark results, raw csv files. Splitted and packed by lzma because of attachment size limit.
containers-benchmark-data-src.tar.lzma (49.1 KB) - added by milan 9 years ago.
All sources of benchmarks in containers-benchmark-data-raw. Splitted and packed by lzma because of attachment size limit.
containers_performance2.patch.gz (41.7 KB) - added by milan 9 years ago.

Download all attachments as: .zip

Change History (22)

comment:1 Changed 9 years ago by igloo

We also need to roll back this GHC patch:

Thu Sep 23 14:01:17 BST 2010  simonpj@microsoft.com
  * For now, switch off incomplete-pattern warnings in containers
  
  Put it back on when my patch is applied to the containers repo.
  (the one that removes two refuable lambdas)

comment:2 Changed 9 years ago by milan

Cc: fox@… added

Most of the allocation overhead is causes by the INLINABLE and INLINE pragmas.

Here are my numbers:

  • T1969:
    • ghc-head, containers-from-ghc-7.0-branch: 248471644
    • ghc-head, containers-dons/tibbe/milan: 282720916
    • ghc-head, containers-dons/tibbe/milan + all INLINE converted to INLINABLE: 282689256
    • ghc-head, containers-dons/tibbe/milan + all INLINE and INLINABLE removed: 256416620
  • T3294:
    • ghc-head, containers-from-ghc-7.0-branch: 826836708
    • ghc-head, containers-dons/tibbe/milan: 943910592
    • ghc-head, containers-dons/tibbe/milan + all INLINE converted to INLINABLE: 943819492
    • ghc-head, containers-dons/tibbe/milan + all INLINE and INLINABLE removed: 859844404

By removing INLINE and INLINABLE annotations, the allocation is higher by 3.2% and 3.9%, instead of 15% and 14%.

What is also worth mentioning is that with all methods INLINABLE the allocation is approximately the same as with some explicit INLINE.

Maybe we can remove the INLINE / INLINABLE annotations for know?

I will investigate further after I return from ICFP.

comment:3 Changed 9 years ago by milan

I did investigate a little bit further.

Currently I have achieved T1969: 250761960 (+1.7% increase compared to containers-0.3), T3294: 842190504 (1.1% increase compared to containers-0.3). The time got better a bit too.

The changes I did are not yet systematically tuned, so I would expect further improvements.

Further improvements after returning from ICFP.

comment:4 in reply to:  3 Changed 9 years ago by milan

I ended up at

T1969: 247890576 (+0.6% increase wrt containers-0.3), T3294: 830932780 (-0.2% wrt containers-0.3).

I am really putting hands of it now.

comment:5 Changed 9 years ago by milan

After discussion on ICFP 2010, I am building a better benchmark. It will contrencate on memory (not only on time), will include ghc-binary size, T1969 and T3294 allocation and possibly compile time + membory for nofib.

Then we will choose the right patches.

Time frame is 1 week for a benchmark and 1 week for the patches.

comment:6 Changed 9 years ago by simonpj

Just to summarise the current state of play on the containers front:

  • For the GHC 7.0 branch, we have reverted all the container patches, so the heat is off.
  • For the HEAD we have left all patches in. That means that T1969 and T3294 are failing. (This is allocation-bloat, not code-bloat, for reasons I don't fully understand.) It would be good to fix this as soon as poss.
  • I have committed a patch that allows specialising in the importing module; the commit message is below and gives more details. I think that should allow you to get the benefits of specialisation without the code bloat.
  • All of this only concerns performance issues. I don't know what's ongoing concerning API changes, but let's treat that as a separate matter.

At the moment I think Milan is holding the token. (Is anyone else working on this?) It's not day-to-day urgent, but it is important and would be good to get it sorted. Are you ok with this Milan?

Simon

Thu Oct  7 04:10:51 PDT 2010  simonpj@microsoft.com
  * Implement auto-specialisation of imported Ids
  
  This big-ish patch arranges that if an Id 'f' is 
    * Type-class overloaded 
         f :: Ord a => [a] -> [a]
    * Defined with an INLINABLE pragma
         {-# INLINABLE f #-}
    * Exported from its defining module 'D'
  
  then in any module 'U' that imports D
  
  1. Any call of 'f' at a fixed type will generate 
     (a) a specialised version of f in U
     (b) a RULE that rewrites unspecialised calls to the
         specialised on
  
    e.g. if the call is (f Int dOrdInt xs) then the 
    specialiser will generate
       $sfInt :: [Int] -> [Int]
       $sfInt = <code for f, imported from D, specialised>
       {-# RULE forall d.  f Int d = $sfInt #-}
  
  2. In addition, you can give an explicit {-# SPECIALISE -#}
     pragma for the imported Id
       {-# SPECIALISE f :: [Bool] -> [Bool] #-}
     This too generates a local specialised definition, 
     and the corresponding RULE 
  
  The new RULES are exported from module 'U', so that any module
  importing U will see the specialised versions of 'f', and will
  not re-specialise them.
  
  There's a flag -fwarn-auto-orphan that warns you if the auto-generated
  RULES are orphan rules. It's not in -Wall, mainly to avoid lots of
  error messages with existing packages.
  
  Main implementation changes
  
   - A new flag on a CoreRule to say if it was auto-generated.
     This is persisted across interface files, so there's a small
     change in interface file format.
  
   - Quite a bit of fiddling with plumbing, to get the 
     {-# SPECIALISE #-} pragmas for imported Ids.  In particular, a
     new field tgc_imp_specs in TcGblEnv, to keep the specialise
     pragmas for imported Ids between the typechecker and the desugarer.
  
   - Some new code (although surprisingly little) in Specialise,
     to deal with calls of imported Ids

    M ./compiler/basicTypes/BasicTypes.lhs -1 +7
    M ./compiler/coreSyn/CoreSyn.lhs -1 +5
    M ./compiler/deSugar/Desugar.lhs -14 +24
    M ./compiler/deSugar/DsBinds.lhs -55 +66
    M ./compiler/deSugar/DsForeign.lhs -4 +4
    M ./compiler/hsSyn/HsBinds.lhs -9 +9
    M ./compiler/iface/BinIface.hs -2 +4
    M ./compiler/iface/IfaceSyn.lhs -1 +3
    M ./compiler/iface/MkIface.lhs -6 +12
    M ./compiler/iface/TcIface.lhs -1 +3
    M ./compiler/main/DynFlags.hs +3
    M ./compiler/rename/RnBinds.lhs -62 +84
    M ./compiler/rename/RnEnv.lhs -9 +14
    M ./compiler/rename/RnExpr.lhs -3 +3
    M ./compiler/rename/RnSource.lhs -1 +1
    M ./compiler/specialise/Rules.lhs -19 +11
    M ./compiler/specialise/SpecConstr.lhs -1 +2
    M ./compiler/specialise/Specialise.lhs -59 +156
    M ./compiler/typecheck/TcBinds.lhs -24 +66
    M ./compiler/typecheck/TcClassDcl.lhs -1 +1
    M ./compiler/typecheck/TcDeriv.lhs -4 +2
    M ./compiler/typecheck/TcHsSyn.lhs -6 +12
    M ./compiler/typecheck/TcInstDcls.lhs -10 +19
    M ./compiler/typecheck/TcRnDriver.lhs -13 +16
    M ./compiler/typecheck/TcRnMonad.lhs -14 +15
    M ./compiler/typecheck/TcRnTypes.lhs +1
    M ./compiler/utils/FiniteMap.lhs -1 +3

comment:7 in reply to:  6 Changed 9 years ago by milan

Replying to simonpj:

At the moment I think Milan is holding the token. (Is anyone else working on this?) It's not day-to-day urgent, but it is important and would be good to get it sorted. Are you ok with this Milan?

Yes, I am perfectly fine with that.

I am working on better benchmark and then I will choose the patches that improve the performance (both time and space). I hope to see the effect of your specialization in the results :)

I actually have patches that make both T1969 and T3294 pass, but I am waiting for the benchmark.

The timeframe is

  • end of this weekend -- benchmark
  • end of next weekend -- patches

Cheers, Milan

comment:8 Changed 9 years ago by milan

I am working on it and getting closer, but an illness is slowing me down.

Cheers, Milan

comment:9 Changed 9 years ago by tibbe

Cc: johan.tibell@… added

comment:10 Changed 9 years ago by tibbe

I'd like to wait for the patches referred to in #4257 and #4278 to be committed before we release a new version of containers. Those patches add a few functions to the API, have been approved through the libraries process, and are just waiting for someone to commit them.

comment:11 Changed 9 years ago by milan

Tadaa!

I finally got to polish the performance of the containers head.

The current allocation results:

  • ghc-7.0 (30-10-2010): T1969: 247656108 (worse than before), T3294: 818732238 (better than before)
  • ghc-7.1 (30-10-2010): T1969: 250184284 (worse than before), T3294: 825515292 (better than before)

It was quite tiresome to lessen the memory allocation and code growth and not decrease the time performance, but in the end I think I did it.

The patches are attached and are also present in the repository http://fox.auryn.cz/darcs/containers

For benchmarking I use a new benchmark (released as containers-benchmark on the Hackage). I attach the benchmark comparison of 0.3-best, 7.0-best, head-best, for ghc 6.12, 7.0 and 7.1. The 'best' are the current patches, head is current containers-head, 7.0 is the 7.0 branch of containers, 0.3 is 0.3.0.0 version of containers. I also attach all the raw data + versions benchmarked. Together it is a lot of data -- but be assured I read those :)

Johan asked that 4257 and 4278 are commited before this. When it happens, I will update the patches and the cited repo to accomodate to the new functions (probably change them to reflect the chosen INLINE/INLINABLE/worker-wrapper/... stuff).

Changed 9 years ago by milan

Changed 9 years ago by milan

Selected readable benchmark comparisons.

Changed 9 years ago by milan

All benchmark results, raw csv files. Splitted and packed by lzma because of attachment size limit.

Changed 9 years ago by milan

All sources of benchmarks in containers-benchmark-data-raw. Splitted and packed by lzma because of attachment size limit.

comment:12 Changed 9 years ago by igloo

Thanks Milan!

T1969 and T3294 are passing now, and T3064 is better: was:

bytes allocated 316528584 is more than maximum allowed 280000000

now:

bytes allocated 299224968 is more than maximum allowed 280000000

This patch causes a lot of warnings.

I started fixing them, but in some cases I was fixing shadowed variable warnings by SATing functions, and I'm not sure if you deliberately left them unSATed; certainly in some cases functions seem to have been deliberately unSATed, e.g. (diffing relative to the 7.0 branch's containers):

 lookup :: Ord k => k -> Map k a -> Maybe a
-lookup k = k `seq` go
+lookup = go
   where
-    go Tip = Nothing
-    go (Bin _ kx x l r) =
+    STRICT12(go)
+    go k Tip = Nothing
+    go k (Bin _ kx x l r) =
         case compare k kx of
-            LT -> go l
-            GT -> go r
+            LT -> go k l
+            GT -> go k r
             EQ -> Just x

We need it to be -Wall clean before we can apply it, though.

Incidentally, I don't like these names:

+-- Use macros to define strictness of functions.
+-- STRICTxy denotes an y-ary function strict in the x-th parameter.
+#define STRICT12(fn) fn arg _ | arg `seq` False = undefined
+#define STRICT13(fn) fn arg _ _ | arg `seq` False = undefined
+#define STRICT23(fn) fn _ arg _ | arg `seq` False = undefined
+#define STRICT24(fn) fn _ arg _ _ | arg `seq` False = undefined

I would prefer STRICT_1_OF_2, but personally I think we should just use bang patterns.

This comment:

+-- The balance function is equivalent to the following:
...
+-- It is only written in such a way that every node is pattern-matched only once.

makes me wonder if SpecConstr should be generating the more efficient code automatically?

In Data.IntSet, I don't understand the natFromInt stuff, and the change in lookup. I am suspicious about the extra

| nomatch x p m = False

clause in member, as compared to lookup.

There are some things I'll leave to the containers experts to review:

I haven't checked the balance/balanceL/balanceR changes at all (neither the definitions, nor the calls).

I haven't checked the change of delta from 4 to 3.

I haven't checked the <= to < change in join.

I haven't checked the changes of case into let, e.g. in Data.IntMap.insertLookupWithKey.

I haven't checked the trim changes.

I haven't checked the union*/diff*/hedge* changes.

comment:13 in reply to:  12 ; Changed 9 years ago by milan

T1969 and T3294 are passing now, and T3064 is better: was:

bytes allocated 316528584 is more than maximum allowed 280000000

now:

bytes allocated 299224968 is more than maximum allowed 280000000

Ah, according to the 280000000 limit, it is a 64-bit GHC. On 32-bit machine T3064 passes.

This patch causes a lot of warnings.

I started fixing them, but in some cases I was fixing shadowed variable warnings by SATing functions, and I'm not sure if you deliberately left them unSATed; certainly in some cases functions seem to have been deliberately unSATed, e.g. (diffing relative to the 7.0 branch's containers):

 lookup :: Ord k => k -> Map k a -> Maybe a
-lookup k = k `seq` go
+lookup = go
   where
-    go Tip = Nothing
-    go (Bin _ kx x l r) =
+    STRICT12(go)
+    go k Tip = Nothing
+    go k (Bin _ kx x l r) =
         case compare k kx of
-            LT -> go l
-            GT -> go r
+            LT -> go k l
+            GT -> go k r
             EQ -> Just x

We need it to be -Wall clean before we can apply it, though.

I am sorry, I did not think of that. Do you want me to correct it? I will gladly do it, but I do not want to do the work twice :)

Incidentally, I don't like these names:

+-- Use macros to define strictness of functions.
+-- STRICTxy denotes an y-ary function strict in the x-th parameter.
+#define STRICT12(fn) fn arg _ | arg `seq` False = undefined
+#define STRICT13(fn) fn arg _ _ | arg `seq` False = undefined
+#define STRICT23(fn) fn _ arg _ | arg `seq` False = undefined
+#define STRICT24(fn) fn _ arg _ _ | arg `seq` False = undefined

I would prefer STRICT_1_OF_2, but personally I think we should just use bang patterns.

Bang patterns are not in any standards, so this would rule out other compilers not supporting them. Your rename seems reasonable, I would go with that.

This comment:

+-- The balance function is equivalent to the following:
...
+-- It is only written in such a way that every node is pattern-matched only once.

makes me wonder if SpecConstr should be generating the more efficient code automatically?

Well, it is just an invariant of the structure that allows to do the pattern-matching only once. As the code was written previously, it would be incorrect to use just one pattern-matching.

In Data.IntSet, I don't understand the natFromInt stuff, and the change in lookup.

The natFromInt && friends are the same, just with explicit INLINEs (at some point it did not get inlined and that caused a lot of problems).

I am suspicious about the extra

| nomatch x p m = False

clause in member, as compared to lookup.

This is as it was in 0.3. It is semantically correct both ways. The only think is time complexity. The member is a bit faster if the key looked for is _not_ in the set, but it does slightly more work if the key is present. We can decide both ways and make it consistent.

There are some things I'll leave to the containers experts to review:

I haven't checked the balance/balanceL/balanceR changes at all (neither the definitions, nor the calls).

I haven't checked the change of delta from 4 to 3.

I haven't checked the <= to < change in join.

Also in merge.

I haven't checked the changes of case into let, e.g. in Data.IntMap.insertLookupWithKey.

The current version is from 0.3.

I haven't checked the trim changes.

I haven't checked the union*/diff*/hedge* changes.

Thanks for your work, Cheers, Milan

comment:14 in reply to:  13 ; Changed 9 years ago by igloo

Replying to milan:

We need it to be -Wall clean before we can apply it, though.

Do you want me to correct it?

That'd be great, thanks!

I would prefer STRICT_1_OF_2, but personally I think we should just use bang patterns.

Bang patterns are not in any standards, so this would rule out other compilers not supporting them. Your rename seems reasonable, I would go with that.

That's true. Renaming is fine with me.

I am suspicious about the extra clause in member, as compared to lookup.

It is semantically correct both ways.

OK, great, no problem then.

comment:15 in reply to:  14 Changed 9 years ago by milan

Replying to igloo:

Replying to milan:

We need it to be -Wall clean before we can apply it, though.

Do you want me to correct it?

That'd be great, thanks!

I would prefer STRICT_1_OF_2, but personally I think we should just use bang patterns.

Bang patterns are not in any standards, so this would rule out other compilers not supporting them. Your rename seems reasonable, I would go with that.

That's true. Renaming is fine with me.

I am suspicious about the extra clause in member, as compared to lookup.

It is semantically correct both ways.

OK, great, no problem then.

Tomorrow I will prepare updated patches. Simon also pointed that I should add several my comments from this ticket to the source comments, so people would know why.

Cheers, Milan

comment:16 Changed 9 years ago by milan

Hi,

the new version of the patch is attached. There were only trivial renames to silence the warnings (defined but not used variables and name shadowings) and comments to the source code explaining the subtleties.

The new version is attached here and also present in the mentioned repository http://fox.auryn.cz/darcs/containers/.

Looking at the igloo's mail with issues, these are still unresolved:

There are some things I'll leave to the containers experts to review: I haven't checked the balance/balanceL/balanceR changes at all (neither the definitions, nor the calls).

I haven't checked the <= to < change in join and merge.

These two points are connected -- the rebalance was not done in accordance with the original paper, but a bit sooner. Now it is rebalanced after the balance is lost, not when it is just on the boundary.

Other than that, balance is just trivial inlines of rotateL && others. The original definition is kept in the source for reference. The balanceL and balanceR are just parts of balance, where the balance condition can be broken only in one subtree.

I haven't checked the change of delta from 4 to 3.

I have a proof that it is correct. I am writing it down, but did not have enough time yet (the time frame is weeks). But there is no published proof that delta=4 is currect neither :)

I haven't checked the trim changes.

I haven't checked the union*/diff*/hedge* changes.

These two points are also connected -- replace a lambda functions performing comparison (with possibly infinity element) by a Maybe bound. The paper cites the improvement of performance, I do not recall now.

Cheers, Milan

Changed 9 years ago by milan

comment:17 Changed 9 years ago by igloo

Resolution: fixed
Status: newclosed

Thanks; I've pushed the patches, so I think we can close this ticket now. If anyone finds any issues still remain, please file a new ticket.

Note: See TracTickets for help on using tickets.