# Ticket #4312: containers-set-improvements-2.patch

File containers-set-improvements-2.patch, 63.9 KB (added by , 9 years ago) |
---|

Line | |
---|---|

1 | 10 patches for repository http://darcs.haskell.org/packages/containers: |

2 | |

3 | Tue Aug 31 14:40:30 CEST 2010 Johan Tibell <johan.tibell@gmail.com> |

4 | * Added a test suite for Data.Set |

5 | |

6 | Expression coverage: 74% |

7 | |

8 | Tue Aug 31 14:42:25 CEST 2010 Johan Tibell <johan.tibell@gmail.com> |

9 | * Added benchmarks for Data.Set |

10 | |

11 | Tue Aug 31 14:43:52 CEST 2010 Johan Tibell <johan.tibell@gmail.com> |

12 | * Improved performance of Data.Set |

13 | |

14 | Performance improvements are due to manually applying the |

15 | worker/wrapper transformation and strictifying the keys. |

16 | |

17 | Average speed-up is 32% on a 2GHz Core 2 Duo on OS X 10.5.8 |

18 | |

19 | Mon Sep 13 18:51:32 CEST 2010 Milan Straka <fox@ucw.cz> |

20 | * Make the Set store the elements evaluated (bang added). |

21 | |

22 | Tue Sep 14 15:57:25 CEST 2010 Milan Straka <fox@ucw.cz> |

23 | * Improve performance of Data.Set union and difference operations. |

24 | |

25 | Use datatype storing evaluated bound instead of high-order functions. |

26 | The improvements are over 25% for both union and difference (GHC 6.12.1). |

27 | |

28 | Tue Sep 14 16:04:17 CEST 2010 Milan Straka <fox@ucw.cz> |

29 | * Improve the performance of Data.Set balance function. |

30 | |

31 | The balance function is now one monolithic function, which allows |

32 | to perform all pattern-matches only once. |

33 | |

34 | Nearly all functions modifying Data.Map use balance. |

35 | The improvements are 12% for insert, 14% for delete (GHC 6.12.1). |

36 | |

37 | Tue Sep 14 16:20:10 CEST 2010 Milan Straka <fox@ucw.cz> |

38 | * Improve Data.Set benchmark. |

39 | |

40 | Add union, difference and intersection to Data.Set benchmark. |

41 | |

42 | Tue Sep 14 17:04:42 CEST 2010 Milan Straka <fox@ucw.cz> |

43 | * Correct Data.Set Arbitrary instance never to return unbalanced trees. |

44 | |

45 | The previous instance sometimes returned unbalanced trees, |

46 | which broke the tests. |

47 | |

48 | Also the new instance mimics Data.Map instance more closely in the shape |

49 | of the generated trees. |

50 | |

51 | Tue Sep 21 11:05:07 CEST 2010 Milan Straka <fox@ucw.cz> |

52 | * Changing delta to 3 in Data.Set. |

53 | |

54 | Only possible values are 3 and 4. The value 3 has much faster inserts, |

55 | value 4 slightly faster deletes, so choosing 3. |

56 | |

57 | Also changed the inequalities to rebalance only when one subtree |

58 | is _strictly_ larger than delta * the other one, to mimic the behaviour |

59 | from the proof (both from the Adams' and from the one to come). |

60 | |

61 | Tue Sep 21 11:18:28 CEST 2010 Milan Straka <fox@ucw.cz> |

62 | * Further improve Data.Set balance function. |

63 | |

64 | As suggested by Kazu Yamamoto, we split balance to balanceL and |

65 | balanceR, which handle only one-sided inbalance, but need fewer |

66 | tests than balance. |

67 | |

68 | As nearly all functions modifying the structure use balance, this |

69 | results in speedup of many functions. On my 32-bit GHC 6.12.1, |

70 | 11% speedup for insert, 12% speedup for delete. |

71 | |

72 | New patches: |

73 | |

74 | [Added a test suite for Data.Set |

75 | Johan Tibell <johan.tibell@gmail.com>**20100831124030 |

76 | Ignore-this: f430dc302c0fcb8b5d62db2272a1d6f7 |

77 | |

78 | Expression coverage: 74% |

79 | ] { |

80 | hunk ./Data/Set.hs 39 |

81 | |

82 | module Data.Set ( |

83 | -- * Set type |

84 | +#if !defined(TESTING) |

85 | Set -- instance Eq,Ord,Show,Read,Data,Typeable |

86 | hunk ./Data/Set.hs 41 |

87 | +#else |

88 | + Set(..) |

89 | +#endif |

90 | |

91 | -- * Operators |

92 | , (\\) |

93 | hunk ./Data/Set.hs 106 |

94 | , showTree |

95 | , showTreeWith |

96 | , valid |

97 | + |

98 | +#if defined(TESTING) |

99 | + -- Internals (for testing) |

100 | + , bin |

101 | + , balanced |

102 | + , join |

103 | + , merge |

104 | +#endif |

105 | ) where |

106 | |

107 | import Prelude hiding (filter,foldr,null,map) |

108 | hunk ./Data/Set.hs 552 |

109 | showsPrec p xs = showParen (p > 10) $ |

110 | showString "fromList " . shows (toList xs) |

111 | |

112 | -{- |

113 | -XXX unused code |

114 | - |

115 | -showSet :: (Show a) => [a] -> ShowS |

116 | -showSet [] |

117 | - = showString "{}" |

118 | -showSet (x:xs) |

119 | - = showChar '{' . shows x . showTail xs |

120 | - where |

121 | - showTail [] = showChar '}' |

122 | - showTail (x':xs') = showChar ',' . shows x' . showTail xs' |

123 | --} |

124 | - |

125 | {-------------------------------------------------------------------- |

126 | Read |

127 | --------------------------------------------------------------------} |

128 | hunk ./Data/Set.hs 608 |

129 | _ -> trim cmplo cmphi l |

130 | _ -> trim cmplo cmphi r |

131 | |

132 | -{- |

133 | -XXX unused code |

134 | - |

135 | -trimMemberLo :: Ord a => a -> (a -> Ordering) -> Set a -> (Bool, Set a) |

136 | -trimMemberLo _ _ Tip = (False,Tip) |

137 | -trimMemberLo lo cmphi t@(Bin _ x l r) |

138 | - = case compare lo x of |

139 | - LT -> case cmphi x of |

140 | - GT -> (member lo t, t) |

141 | - _ -> trimMemberLo lo cmphi l |

142 | - GT -> trimMemberLo lo cmphi r |

143 | - EQ -> (True,trim (compare lo) cmphi r) |

144 | --} |

145 | - |

146 | {-------------------------------------------------------------------- |

147 | [filterGt x t] filter all values >[x] from tree [t] |

148 | [filterLt x t] filter all values <[x] from tree [t] |

149 | hunk ./Data/Set.hs 1003 |

150 | Bin sz _ l r -> case (realsize l,realsize r) of |

151 | (Just n,Just m) | n+m+1 == sz -> Just sz |

152 | _ -> Nothing |

153 | - |

154 | -{- |

155 | -{-------------------------------------------------------------------- |

156 | - Testing |

157 | ---------------------------------------------------------------------} |

158 | -testTree :: [Int] -> Set Int |

159 | -testTree xs = fromList xs |

160 | -test1 = testTree [1..20] |

161 | -test2 = testTree [30,29..10] |

162 | -test3 = testTree [1,4,6,89,2323,53,43,234,5,79,12,9,24,9,8,423,8,42,4,8,9,3] |

163 | - |

164 | -{-------------------------------------------------------------------- |

165 | - QuickCheck |

166 | ---------------------------------------------------------------------} |

167 | -qcheck prop |

168 | - = check config prop |

169 | - where |

170 | - config = Config |

171 | - { configMaxTest = 500 |

172 | - , configMaxFail = 5000 |

173 | - , configSize = \n -> (div n 2 + 3) |

174 | - , configEvery = \n args -> let s = show n in s ++ [ '\b' | _ <- s ] |

175 | - } |

176 | - |

177 | - |

178 | -{-------------------------------------------------------------------- |

179 | - Arbitrary, reasonably balanced trees |

180 | ---------------------------------------------------------------------} |

181 | -instance (Enum a) => Arbitrary (Set a) where |

182 | - arbitrary = sized (arbtree 0 maxkey) |

183 | - where maxkey = 10000 |

184 | - |

185 | -arbtree :: (Enum a) => Int -> Int -> Int -> Gen (Set a) |

186 | -arbtree lo hi n |

187 | - | n <= 0 = return Tip |

188 | - | lo >= hi = return Tip |

189 | - | otherwise = do{ i <- choose (lo,hi) |

190 | - ; m <- choose (1,30) |

191 | - ; let (ml,mr) | m==(1::Int)= (1,2) |

192 | - | m==2 = (2,1) |

193 | - | m==3 = (1,1) |

194 | - | otherwise = (2,2) |

195 | - ; l <- arbtree lo (i-1) (n `div` ml) |

196 | - ; r <- arbtree (i+1) hi (n `div` mr) |

197 | - ; return (bin (toEnum i) l r) |

198 | - } |

199 | - |

200 | - |

201 | -{-------------------------------------------------------------------- |

202 | - Valid tree's |

203 | ---------------------------------------------------------------------} |

204 | -forValid :: (Enum a,Show a,Testable b) => (Set a -> b) -> Property |

205 | -forValid f |

206 | - = forAll arbitrary $ \t -> |

207 | --- classify (balanced t) "balanced" $ |

208 | - classify (size t == 0) "empty" $ |

209 | - classify (size t > 0 && size t <= 10) "small" $ |

210 | - classify (size t > 10 && size t <= 64) "medium" $ |

211 | - classify (size t > 64) "large" $ |

212 | - balanced t ==> f t |

213 | - |

214 | -forValidIntTree :: Testable a => (Set Int -> a) -> Property |

215 | -forValidIntTree f |

216 | - = forValid f |

217 | - |

218 | -forValidUnitTree :: Testable a => (Set Int -> a) -> Property |

219 | -forValidUnitTree f |

220 | - = forValid f |

221 | - |

222 | - |

223 | -prop_Valid |

224 | - = forValidUnitTree $ \t -> valid t |

225 | - |

226 | -{-------------------------------------------------------------------- |

227 | - Single, Insert, Delete |

228 | ---------------------------------------------------------------------} |

229 | -prop_Single :: Int -> Bool |

230 | -prop_Single x |

231 | - = (insert x empty == singleton x) |

232 | - |

233 | -prop_InsertValid :: Int -> Property |

234 | -prop_InsertValid k |

235 | - = forValidUnitTree $ \t -> valid (insert k t) |

236 | - |

237 | -prop_InsertDelete :: Int -> Set Int -> Property |

238 | -prop_InsertDelete k t |

239 | - = not (member k t) ==> delete k (insert k t) == t |

240 | - |

241 | -prop_DeleteValid :: Int -> Property |

242 | -prop_DeleteValid k |

243 | - = forValidUnitTree $ \t -> |

244 | - valid (delete k (insert k t)) |

245 | - |

246 | -{-------------------------------------------------------------------- |

247 | - Balance |

248 | ---------------------------------------------------------------------} |

249 | -prop_Join :: Int -> Property |

250 | -prop_Join x |

251 | - = forValidUnitTree $ \t -> |

252 | - let (l,r) = split x t |

253 | - in valid (join x l r) |

254 | - |

255 | -prop_Merge :: Int -> Property |

256 | -prop_Merge x |

257 | - = forValidUnitTree $ \t -> |

258 | - let (l,r) = split x t |

259 | - in valid (merge l r) |

260 | - |

261 | - |

262 | -{-------------------------------------------------------------------- |

263 | - Union |

264 | ---------------------------------------------------------------------} |

265 | -prop_UnionValid :: Property |

266 | -prop_UnionValid |

267 | - = forValidUnitTree $ \t1 -> |

268 | - forValidUnitTree $ \t2 -> |

269 | - valid (union t1 t2) |

270 | - |

271 | -prop_UnionInsert :: Int -> Set Int -> Bool |

272 | -prop_UnionInsert x t |

273 | - = union t (singleton x) == insert x t |

274 | - |

275 | -prop_UnionAssoc :: Set Int -> Set Int -> Set Int -> Bool |

276 | -prop_UnionAssoc t1 t2 t3 |

277 | - = union t1 (union t2 t3) == union (union t1 t2) t3 |

278 | - |

279 | -prop_UnionComm :: Set Int -> Set Int -> Bool |

280 | -prop_UnionComm t1 t2 |

281 | - = (union t1 t2 == union t2 t1) |

282 | - |

283 | - |

284 | -prop_DiffValid |

285 | - = forValidUnitTree $ \t1 -> |

286 | - forValidUnitTree $ \t2 -> |

287 | - valid (difference t1 t2) |

288 | - |

289 | -prop_Diff :: [Int] -> [Int] -> Bool |

290 | -prop_Diff xs ys |

291 | - = toAscList (difference (fromList xs) (fromList ys)) |

292 | - == List.sort ((List.\\) (nub xs) (nub ys)) |

293 | - |

294 | -prop_IntValid |

295 | - = forValidUnitTree $ \t1 -> |

296 | - forValidUnitTree $ \t2 -> |

297 | - valid (intersection t1 t2) |

298 | - |

299 | -prop_Int :: [Int] -> [Int] -> Bool |

300 | -prop_Int xs ys |

301 | - = toAscList (intersection (fromList xs) (fromList ys)) |

302 | - == List.sort (nub ((List.intersect) (xs) (ys))) |

303 | - |

304 | -{-------------------------------------------------------------------- |

305 | - Lists |

306 | ---------------------------------------------------------------------} |

307 | -prop_Ordered |

308 | - = forAll (choose (5,100)) $ \n -> |

309 | - let xs = [0..n::Int] |

310 | - in fromAscList xs == fromList xs |

311 | - |

312 | -prop_List :: [Int] -> Bool |

313 | -prop_List xs |

314 | - = (sort (nub xs) == toList (fromList xs)) |

315 | --} |

316 | addfile ./tests/Set.hs |

317 | hunk ./tests/Set.hs 1 |

318 | +{-# LANGUAGE CPP, ScopedTypeVariables #-} |

319 | + |

320 | +-- QuickCheck properties for Data.Set |

321 | +-- > ghc -DTESTING -fforce-recomp -O2 --make -fhpc -i.. Set.hs |

322 | + |

323 | +import Data.List (nub,sort) |

324 | +import qualified Data.List as List |

325 | +import Data.Set |

326 | +import Prelude hiding (lookup, null, map ,filter) |

327 | +import Test.QuickCheck |

328 | + |

329 | +main :: IO () |

330 | +main = do |

331 | + q $ label "prop_Valid" prop_Valid |

332 | + q $ label "prop_Single" prop_Single |

333 | + q $ label "prop_Single" prop_Single |

334 | + q $ label "prop_InsertValid" prop_InsertValid |

335 | + q $ label "prop_InsertValid" prop_InsertValid |

336 | + q $ label "prop_InsertDelete" prop_InsertDelete |

337 | + q $ label "prop_InsertDelete" prop_InsertDelete |

338 | + q $ label "prop_DeleteValid" prop_DeleteValid |

339 | + q $ label "prop_DeleteValid" prop_DeleteValid |

340 | + q $ label "prop_Join" prop_Join |

341 | + q $ label "prop_Join" prop_Join |

342 | + q $ label "prop_Merge" prop_Merge |

343 | + q $ label "prop_Merge" prop_Merge |

344 | + q $ label "prop_UnionValid" prop_UnionValid |

345 | + q $ label "prop_UnionValid" prop_UnionValid |

346 | + q $ label "prop_UnionInsert" prop_UnionInsert |

347 | + q $ label "prop_UnionInsert" prop_UnionInsert |

348 | + q $ label "prop_UnionAssoc" prop_UnionAssoc |

349 | + q $ label "prop_UnionAssoc" prop_UnionAssoc |

350 | + q $ label "prop_UnionComm" prop_UnionComm |

351 | + q $ label "prop_UnionComm" prop_UnionComm |

352 | + q $ label "prop_DiffValid" prop_DiffValid |

353 | + q $ label "prop_Diff" prop_Diff |

354 | + q $ label "prop_Diff" prop_Diff |

355 | + q $ label "prop_IntValid" prop_IntValid |

356 | + q $ label "prop_Int" prop_Int |

357 | + q $ label "prop_Int" prop_Int |

358 | + q $ label "prop_Ordered" prop_Ordered |

359 | + q $ label "prop_List" prop_List |

360 | + q $ label "prop_List" prop_List |

361 | + where |

362 | + q :: Testable prop => prop -> IO () |

363 | + q = quickCheckWith args |

364 | + |

365 | +{-------------------------------------------------------------------- |

366 | + QuickCheck |

367 | +--------------------------------------------------------------------} |

368 | + |

369 | +args :: Args |

370 | +args = stdArgs { maxSuccess = 500 |

371 | + , maxDiscard = 500 |

372 | + } |

373 | + |

374 | +{-------------------------------------------------------------------- |

375 | + Arbitrary, reasonably balanced trees |

376 | +--------------------------------------------------------------------} |

377 | +instance (Enum a) => Arbitrary (Set a) where |

378 | + arbitrary = sized (arbtree 0 maxkey) |

379 | + where maxkey = 10000 |

380 | + |

381 | +arbtree :: (Enum a) => Int -> Int -> Int -> Gen (Set a) |

382 | +arbtree lo hi n |

383 | + | n <= 0 = return Tip |

384 | + | lo >= hi = return Tip |

385 | + | otherwise = do i <- choose (lo,hi) |

386 | + m <- choose (1,30) |

387 | + let (ml,mr) | m==(1::Int) = (1,2) |

388 | + | m==2 = (2,1) |

389 | + | m==3 = (1,1) |

390 | + | otherwise = (2,2) |

391 | + l <- arbtree lo (i-1) (n `div` ml) |

392 | + r <- arbtree (i+1) hi (n `div` mr) |

393 | + return (bin (toEnum i) l r) |

394 | + |

395 | +{-------------------------------------------------------------------- |

396 | + Valid tree's |

397 | +--------------------------------------------------------------------} |

398 | +forValid :: (Enum a,Show a,Testable b) => (Set a -> b) -> Property |

399 | +forValid f = forAll arbitrary $ \t -> |

400 | +-- classify (balanced t) "balanced" $ |

401 | + classify (size t == 0) "empty" $ |

402 | + classify (size t > 0 && size t <= 10) "small" $ |

403 | + classify (size t > 10 && size t <= 64) "medium" $ |

404 | + classify (size t > 64) "large" $ |

405 | + balanced t ==> f t |

406 | + |

407 | +forValidUnitTree :: Testable a => (Set Int -> a) -> Property |

408 | +forValidUnitTree f = forValid f |

409 | + |

410 | +prop_Valid :: Property |

411 | +prop_Valid = forValidUnitTree $ \t -> valid t |

412 | + |

413 | +{-------------------------------------------------------------------- |

414 | + Single, Insert, Delete |

415 | +--------------------------------------------------------------------} |

416 | +prop_Single :: Int -> Bool |

417 | +prop_Single x = (insert x empty == singleton x) |

418 | + |

419 | +prop_InsertValid :: Int -> Property |

420 | +prop_InsertValid k = forValidUnitTree $ \t -> valid (insert k t) |

421 | + |

422 | +prop_InsertDelete :: Int -> Set Int -> Property |

423 | +prop_InsertDelete k t = not (member k t) ==> delete k (insert k t) == t |

424 | + |

425 | +prop_DeleteValid :: Int -> Property |

426 | +prop_DeleteValid k = forValidUnitTree $ \t -> valid (delete k (insert k t)) |

427 | + |

428 | +{-------------------------------------------------------------------- |

429 | + Balance |

430 | +--------------------------------------------------------------------} |

431 | +prop_Join :: Int -> Property |

432 | +prop_Join x = forValidUnitTree $ \t -> |

433 | + let (l,r) = split x t |

434 | + in valid (join x l r) |

435 | + |

436 | +prop_Merge :: Int -> Property |

437 | +prop_Merge x = forValidUnitTree $ \t -> |

438 | + let (l,r) = split x t |

439 | + in valid (merge l r) |

440 | + |

441 | +{-------------------------------------------------------------------- |

442 | + Union |

443 | +--------------------------------------------------------------------} |

444 | +prop_UnionValid :: Property |

445 | +prop_UnionValid |

446 | + = forValidUnitTree $ \t1 -> |

447 | + forValidUnitTree $ \t2 -> |

448 | + valid (union t1 t2) |

449 | + |

450 | +prop_UnionInsert :: Int -> Set Int -> Bool |

451 | +prop_UnionInsert x t = union t (singleton x) == insert x t |

452 | + |

453 | +prop_UnionAssoc :: Set Int -> Set Int -> Set Int -> Bool |

454 | +prop_UnionAssoc t1 t2 t3 = union t1 (union t2 t3) == union (union t1 t2) t3 |

455 | + |

456 | +prop_UnionComm :: Set Int -> Set Int -> Bool |

457 | +prop_UnionComm t1 t2 = (union t1 t2 == union t2 t1) |

458 | + |

459 | +prop_DiffValid :: Property |

460 | +prop_DiffValid = forValidUnitTree $ \t1 -> |

461 | + forValidUnitTree $ \t2 -> |

462 | + valid (difference t1 t2) |

463 | + |

464 | +prop_Diff :: [Int] -> [Int] -> Bool |

465 | +prop_Diff xs ys = toAscList (difference (fromList xs) (fromList ys)) |

466 | + == List.sort ((List.\\) (nub xs) (nub ys)) |

467 | + |

468 | +prop_IntValid :: Property |

469 | +prop_IntValid = forValidUnitTree $ \t1 -> |

470 | + forValidUnitTree $ \t2 -> |

471 | + valid (intersection t1 t2) |

472 | + |

473 | +prop_Int :: [Int] -> [Int] -> Bool |

474 | +prop_Int xs ys = toAscList (intersection (fromList xs) (fromList ys)) |

475 | + == List.sort (nub ((List.intersect) (xs) (ys))) |

476 | + |

477 | +{-------------------------------------------------------------------- |

478 | + Lists |

479 | +--------------------------------------------------------------------} |

480 | +prop_Ordered :: Property |

481 | +prop_Ordered = forAll (choose (5,100)) $ \n -> |

482 | + let xs = [0..n::Int] |

483 | + in fromAscList xs == fromList xs |

484 | + |

485 | +prop_List :: [Int] -> Bool |

486 | +prop_List xs = (sort (nub xs) == toList (fromList xs)) |

487 | } |

488 | [Added benchmarks for Data.Set |

489 | Johan Tibell <johan.tibell@gmail.com>**20100831124225 |

490 | Ignore-this: fcacf88761034b8c534d936f0b336cc0 |

491 | ] { |

492 | addfile ./benchmarks/Set.hs |

493 | hunk ./benchmarks/Set.hs 1 |

494 | +{-# LANGUAGE BangPatterns #-} |

495 | + |

496 | +-- > ghc -DTESTING --make -O2 -fforce-recomp -i.. Set.hs |

497 | +module Main where |

498 | + |

499 | +import Control.DeepSeq |

500 | +import Control.Exception (evaluate) |

501 | +import Control.Monad.Trans (liftIO) |

502 | +import Criterion.Config |

503 | +import Criterion.Main |

504 | +import Data.List (foldl') |

505 | +import qualified Data.Set as S |

506 | + |

507 | +instance NFData a => NFData (S.Set a) where |

508 | + rnf S.Tip = () |

509 | + rnf (S.Bin _ a l r) = rnf a `seq` rnf l `seq` rnf r |

510 | + |

511 | +main = do |

512 | + let s = S.fromAscList elems :: S.Set Int |

513 | + s2 = S.fromAscList [-1, -2 .. -(2^10)] :: S.Set Int |

514 | + defaultMainWith |

515 | + defaultConfig |

516 | + (liftIO . evaluate $ rnf [s, s2]) |

517 | + [ bench "member" $ nf (member elems) s |

518 | + , bench "insert" $ nf (ins elems) S.empty |

519 | + , bench "map" $ nf (S.map (+ 1)) s |

520 | + , bench "filter" $ nf (S.filter ((== 0) . (`mod` 2))) s |

521 | + , bench "partition" $ nf (S.partition ((== 0) . (`mod` 2))) s |

522 | + , bench "fold" $ nf (S.fold (:) []) s |

523 | + , bench "delete" $ nf (del elems) s |

524 | + , bench "findMin" $ nf S.findMin s |

525 | + , bench "findMax" $ nf S.findMax s |

526 | + , bench "deleteMin" $ nf S.deleteMin s |

527 | + , bench "deleteMax" $ nf S.deleteMax s |

528 | + , bench "unions" $ nf S.unions [s, s2] |

529 | + , bench "union" $ nf (S.union s) s2 |

530 | + ] |

531 | + where |

532 | + elems = [1..2^10] |

533 | + |

534 | +member :: [Int] -> S.Set Int -> Int |

535 | +member xs s = foldl' (\n x -> if S.member x s then n + 1 else n) 0 xs |

536 | + |

537 | +ins :: [Int] -> S.Set Int -> S.Set Int |

538 | +ins xs s0 = foldl' (\s a -> S.insert a s) s0 xs |

539 | + |

540 | +del :: [Int] -> S.Set Int -> S.Set Int |

541 | +del xs s0 = foldl' (\s k -> S.delete k s) s0 xs |

542 | } |

543 | [Improved performance of Data.Set |

544 | Johan Tibell <johan.tibell@gmail.com>**20100831124352 |

545 | Ignore-this: 38a304a0408d29a2956aa9a1fc0ce755 |

546 | |

547 | Performance improvements are due to manually applying the |

548 | worker/wrapper transformation and strictifying the keys. |

549 | |

550 | Average speed-up is 32% on a 2GHz Core 2 Duo on OS X 10.5.8 |

551 | ] { |

552 | hunk ./Data/Set.hs 23 |

553 | -- trees of /bounded balance/) as described by: |

554 | -- |

555 | -- * Stephen Adams, \"/Efficient sets: a balancing act/\", |

556 | --- Journal of Functional Programming 3(4):553-562, October 1993, |

557 | --- <http://www.swiss.ai.mit.edu/~adams/BB/>. |

558 | +-- Journal of Functional Programming 3(4):553-562, October 1993, |

559 | +-- <http://www.swiss.ai.mit.edu/~adams/BB/>. |

560 | -- |

561 | -- * J. Nievergelt and E.M. Reingold, |

562 | hunk ./Data/Set.hs 27 |

563 | --- \"/Binary search trees of bounded balance/\", |

564 | --- SIAM journal of computing 2(1), March 1973. |

565 | +-- \"/Binary search trees of bounded balance/\", |

566 | +-- SIAM journal of computing 2(1), March 1973. |

567 | -- |

568 | -- Note that the implementation is /left-biased/ -- the elements of a |

569 | -- first argument are always preferred to the second, for example in |

570 | hunk ./Data/Set.hs 63 |

571 | , delete |

572 | |

573 | -- * Combine |

574 | - , union, unions |

575 | + , union |

576 | + , unions |

577 | , difference |

578 | , intersection |

579 | |

580 | hunk ./Data/Set.hs 75 |

581 | , splitMember |

582 | |

583 | -- * Map |

584 | - , map |

585 | - , mapMonotonic |

586 | + , map |

587 | + , mapMonotonic |

588 | |

589 | -- * Fold |

590 | , fold |

591 | hunk ./Data/Set.hs 146 |

592 | -- | /O(n+m)/. See 'difference'. |

593 | (\\) :: Ord a => Set a -> Set a -> Set a |

594 | m1 \\ m2 = difference m1 m2 |

595 | +{-# INLINE (\\) #-} |

596 | |

597 | {-------------------------------------------------------------------- |

598 | Sets are size balanced trees |

599 | hunk ./Data/Set.hs 189 |

600 | --------------------------------------------------------------------} |

601 | -- | /O(1)/. Is this the empty set? |

602 | null :: Set a -> Bool |

603 | -null t |

604 | - = case t of |

605 | - Tip -> True |

606 | - Bin {} -> False |

607 | +null Tip = True |

608 | +null (Bin {}) = False |

609 | +{-# INLINE null #-} |

610 | |

611 | -- | /O(1)/. The number of elements in the set. |

612 | size :: Set a -> Int |

613 | hunk ./Data/Set.hs 195 |

614 | -size t |

615 | - = case t of |

616 | - Tip -> 0 |

617 | - Bin sz _ _ _ -> sz |

618 | +size = go |

619 | + where |

620 | + go Tip = 0 |

621 | + go (Bin sz _ _ _) = sz |

622 | +{-# INLINE size #-} |

623 | |

624 | -- | /O(log n)/. Is the element in the set? |

625 | member :: Ord a => a -> Set a -> Bool |

626 | hunk ./Data/Set.hs 203 |

627 | -member x t |

628 | - = case t of |

629 | - Tip -> False |

630 | - Bin _ y l r |

631 | - -> case compare x y of |

632 | - LT -> member x l |

633 | - GT -> member x r |

634 | - EQ -> True |

635 | - |

636 | +member x = x `seq` go |

637 | + where |

638 | + go Tip = False |

639 | + go (Bin _ y l r) = case compare x y of |

640 | + LT -> go l |

641 | + GT -> go r |

642 | + EQ -> True |

643 | +{-# INLINE member #-} |

644 | + |

645 | -- | /O(log n)/. Is the element not in the set? |

646 | notMember :: Ord a => a -> Set a -> Bool |

647 | hunk ./Data/Set.hs 214 |

648 | -notMember x t = not $ member x t |

649 | +notMember a t = not $ member a t |

650 | +{-# INLINE notMember #-} |

651 | |

652 | {-------------------------------------------------------------------- |

653 | Construction |

654 | hunk ./Data/Set.hs 222 |

655 | --------------------------------------------------------------------} |

656 | -- | /O(1)/. The empty set. |

657 | empty :: Set a |

658 | -empty |

659 | - = Tip |

660 | +empty = Tip |

661 | +{-# INLINE empty #-} |

662 | |

663 | -- | /O(1)/. Create a singleton set. |

664 | singleton :: a -> Set a |

665 | hunk ./Data/Set.hs 227 |

666 | -singleton x |

667 | - = Bin 1 x Tip Tip |

668 | +singleton x = Bin 1 x Tip Tip |

669 | +{-# INLINE singleton #-} |

670 | |

671 | {-------------------------------------------------------------------- |

672 | Insertion, Deletion |

673 | hunk ./Data/Set.hs 237 |

674 | -- If the set already contains an element equal to the given value, |

675 | -- it is replaced with the new value. |

676 | insert :: Ord a => a -> Set a -> Set a |

677 | -insert x t |

678 | - = case t of |

679 | - Tip -> singleton x |

680 | - Bin sz y l r |

681 | - -> case compare x y of |

682 | - LT -> balance y (insert x l) r |

683 | - GT -> balance y l (insert x r) |

684 | - EQ -> Bin sz x l r |

685 | - |

686 | +insert x = x `seq` go |

687 | + where |

688 | + go Tip = singleton x |

689 | + go (Bin sz y l r) = case compare x y of |

690 | + LT -> balance y (go l) r |

691 | + GT -> balance y l (go r) |

692 | + EQ -> Bin sz x l r |

693 | +{-# INLINE insert #-} |

694 | |

695 | -- | /O(log n)/. Delete an element from a set. |

696 | delete :: Ord a => a -> Set a -> Set a |

697 | hunk ./Data/Set.hs 248 |

698 | -delete x t |

699 | - = case t of |

700 | - Tip -> Tip |

701 | - Bin _ y l r |

702 | - -> case compare x y of |

703 | - LT -> balance y (delete x l) r |

704 | - GT -> balance y l (delete x r) |

705 | - EQ -> glue l r |

706 | +delete x = x `seq` go |

707 | + where |

708 | + go Tip = Tip |

709 | + go (Bin _ y l r) = case compare x y of |

710 | + LT -> balance y (go l) r |

711 | + GT -> balance y l (go r) |

712 | + EQ -> glue l r |

713 | +{-# INLINE delete #-} |

714 | |

715 | {-------------------------------------------------------------------- |

716 | Subset |

717 | hunk ./Data/Set.hs 308 |

718 | deleteMax (Bin _ x l r) = balance x l (deleteMax r) |

719 | deleteMax Tip = Tip |

720 | |

721 | - |

722 | {-------------------------------------------------------------------- |

723 | Union. |

724 | --------------------------------------------------------------------} |

725 | hunk ./Data/Set.hs 313 |

726 | -- | The union of a list of sets: (@'unions' == 'foldl' 'union' 'empty'@). |

727 | unions :: Ord a => [Set a] -> Set a |

728 | -unions ts |

729 | - = foldlStrict union empty ts |

730 | - |

731 | +unions = foldlStrict union empty |

732 | +{-# INLINE unions #-} |

733 | |

734 | -- | /O(n+m)/. The union of two sets, preferring the first set when |

735 | -- equal elements are encountered. |

736 | hunk ./Data/Set.hs 324 |

737 | union Tip t2 = t2 |

738 | union t1 Tip = t1 |

739 | union t1 t2 = hedgeUnion (const LT) (const GT) t1 t2 |

740 | +{-# INLINE union #-} |

741 | |

742 | hedgeUnion :: Ord a |

743 | => (a -> Ordering) -> (a -> Ordering) -> Set a -> Set a -> Set a |

744 | hunk ./Data/Set.hs 347 |

745 | difference Tip _ = Tip |

746 | difference t1 Tip = t1 |

747 | difference t1 t2 = hedgeDiff (const LT) (const GT) t1 t2 |

748 | +{-# INLINE difference #-} |

749 | |

750 | hedgeDiff :: Ord a |

751 | => (a -> Ordering) -> (a -> Ordering) -> Set a -> Set a -> Set a |

752 | hunk ./Data/Set.hs 397 |

753 | --------------------------------------------------------------------} |

754 | -- | /O(n)/. Filter all elements that satisfy the predicate. |

755 | filter :: Ord a => (a -> Bool) -> Set a -> Set a |

756 | -filter _ Tip = Tip |

757 | -filter p (Bin _ x l r) |

758 | - | p x = join x (filter p l) (filter p r) |

759 | - | otherwise = merge (filter p l) (filter p r) |

760 | +filter p = go |

761 | + where |

762 | + go Tip = Tip |

763 | + go (Bin _ x l r) |

764 | + | p x = join x (go l) (go r) |

765 | + | otherwise = merge (go l) (go r) |

766 | +{-# INLINE filter #-} |

767 | |

768 | -- | /O(n)/. Partition the set into two sets, one with all elements that satisfy |

769 | -- the predicate and one with all elements that don't satisfy the predicate. |

770 | hunk ./Data/Set.hs 409 |

771 | -- See also 'split'. |

772 | partition :: Ord a => (a -> Bool) -> Set a -> (Set a,Set a) |

773 | -partition _ Tip = (Tip,Tip) |

774 | -partition p (Bin _ x l r) |

775 | - | p x = (join x l1 r1,merge l2 r2) |

776 | - | otherwise = (merge l1 r1,join x l2 r2) |

777 | +partition p = go |

778 | where |

779 | hunk ./Data/Set.hs 411 |

780 | - (l1,l2) = partition p l |

781 | - (r1,r2) = partition p r |

782 | + go Tip = (Tip, Tip) |

783 | + go (Bin _ x l r) = case (go l, go r) of |

784 | + ((l1, l2), (r1, r2)) |

785 | + | p x -> (join x l1 r1, merge l2 r2) |

786 | + | otherwise -> (merge l1 r1, join x l2 r2) |

787 | +{-# INLINE partition #-} |

788 | |

789 | {---------------------------------------------------------------------- |

790 | Map |

791 | hunk ./Data/Set.hs 430 |

792 | |

793 | map :: (Ord a, Ord b) => (a->b) -> Set a -> Set b |

794 | map f = fromList . List.map f . toList |

795 | +{-# INLINE map #-} |

796 | |

797 | -- | /O(n)/. The |

798 | -- |

799 | hunk ./Data/Set.hs 443 |

800 | -- > where ls = toList s |

801 | |

802 | mapMonotonic :: (a->b) -> Set a -> Set b |

803 | -mapMonotonic _ Tip = Tip |

804 | -mapMonotonic f (Bin sz x l r) = |

805 | - Bin sz (f x) (mapMonotonic f l) (mapMonotonic f r) |

806 | - |

807 | +mapMonotonic f = go |

808 | + where |

809 | + go Tip = Tip |

810 | + go (Bin sz x l r) = Bin sz (f x) (go l) (go r) |

811 | +{-# INLINE mapMonotonic #-} |

812 | |

813 | {-------------------------------------------------------------------- |

814 | Fold |

815 | hunk ./Data/Set.hs 454 |

816 | --------------------------------------------------------------------} |

817 | -- | /O(n)/. Fold over the elements of a set in an unspecified order. |

818 | fold :: (a -> b -> b) -> b -> Set a -> b |

819 | -fold f z s |

820 | - = foldr f z s |

821 | +fold = foldr |

822 | +{-# INLINE fold #-} |

823 | |

824 | -- | /O(n)/. Post-order fold. |

825 | foldr :: (a -> b -> b) -> b -> Set a -> b |

826 | hunk ./Data/Set.hs 459 |

827 | -foldr _ z Tip = z |

828 | -foldr f z (Bin _ x l r) = foldr f (f x (foldr f z r)) l |

829 | +foldr f = go |

830 | + where |

831 | + go z Tip = z |

832 | + go z (Bin _ x l r) = go (f x (go z r)) l |

833 | +{-# INLINE foldr #-} |

834 | |

835 | {-------------------------------------------------------------------- |

836 | List variations |

837 | hunk ./Data/Set.hs 470 |

838 | --------------------------------------------------------------------} |

839 | -- | /O(n)/. The elements of a set. |

840 | elems :: Set a -> [a] |

841 | -elems s |

842 | - = toList s |

843 | +elems = toList |

844 | +{-# INLINE elems #-} |

845 | |

846 | {-------------------------------------------------------------------- |

847 | Lists |

848 | hunk ./Data/Set.hs 478 |

849 | --------------------------------------------------------------------} |

850 | -- | /O(n)/. Convert the set to a list of elements. |

851 | toList :: Set a -> [a] |

852 | -toList s |

853 | - = toAscList s |

854 | +toList = toAscList |

855 | +{-# INLINE toList #-} |

856 | |

857 | -- | /O(n)/. Convert the set to an ascending list of elements. |

858 | toAscList :: Set a -> [a] |

859 | hunk ./Data/Set.hs 483 |

860 | -toAscList t |

861 | - = foldr (:) [] t |

862 | - |

863 | +toAscList = foldr (:) [] |

864 | +{-# INLINE toAscList #-} |

865 | |

866 | -- | /O(n*log n)/. Create a set from a list of elements. |

867 | fromList :: Ord a => [a] -> Set a |

868 | hunk ./Data/Set.hs 488 |

869 | -fromList xs |

870 | - = foldlStrict ins empty xs |

871 | +fromList = foldlStrict ins empty |

872 | where |

873 | ins t x = insert x t |

874 | hunk ./Data/Set.hs 491 |

875 | +{-# INLINE fromList #-} |

876 | |

877 | {-------------------------------------------------------------------- |

878 | Building trees from ascending/descending lists can be done in linear time. |

879 | hunk ./Data/Set.hs 629 |

880 | LT -> join x (filterGt cmp l) r |

881 | GT -> filterGt cmp r |

882 | EQ -> r |

883 | +{-# INLINE filterGt #-} |

884 | |

885 | filterLt :: (a -> Ordering) -> Set a -> Set a |

886 | filterLt _ Tip = Tip |

887 | hunk ./Data/Set.hs 638 |

888 | LT -> filterLt cmp l |

889 | GT -> join x l (filterLt cmp r) |

890 | EQ -> l |

891 | - |

892 | +{-# INLINE filterLt #-} |

893 | |

894 | {-------------------------------------------------------------------- |

895 | Split |

896 | hunk ./Data/Set.hs 880 |

897 | Utilities |

898 | --------------------------------------------------------------------} |

899 | foldlStrict :: (a -> b -> a) -> a -> [b] -> a |

900 | -foldlStrict f z xs |

901 | - = case xs of |

902 | - [] -> z |

903 | - (x:xx) -> let z' = f z x in seq z' (foldlStrict f z' xx) |

904 | - |

905 | +foldlStrict f = go |

906 | + where |

907 | + go z [] = z |

908 | + go z (x:xs) = z `seq` go (f z x) xs |

909 | +{-# INLINE foldlStrict #-} |

910 | |

911 | {-------------------------------------------------------------------- |

912 | Debugging |

913 | } |

914 | [Make the Set store the elements evaluated (bang added). |

915 | Milan Straka <fox@ucw.cz>**20100913165132 |

916 | Ignore-this: b3f230db5bf30d93d3fddf2c81c5f3b4 |

917 | ] hunk ./Data/Set.hs 153 |

918 | --------------------------------------------------------------------} |

919 | -- | A set of values @a@. |

920 | data Set a = Tip |

921 | - | Bin {-# UNPACK #-} !Size a !(Set a) !(Set a) |

922 | + | Bin {-# UNPACK #-} !Size !a !(Set a) !(Set a) |

923 | |

924 | type Size = Int |

925 | |

926 | [Improve performance of Data.Set union and difference operations. |

927 | Milan Straka <fox@ucw.cz>**20100914135725 |

928 | Ignore-this: 6dc4a186ea060b9cdb9e783db71ca280 |

929 | |

930 | Use datatype storing evaluated bound instead of high-order functions. |

931 | The improvements are over 25% for both union and difference (GHC 6.12.1). |

932 | ] { |

933 | hunk ./Data/Set.hs 246 |

934 | EQ -> Bin sz x l r |

935 | {-# INLINE insert #-} |

936 | |

937 | +-- Insert an element to the set only if it is not in the set. Used by |

938 | +-- `union`. |

939 | +insertR :: Ord a => a -> Set a -> Set a |

940 | +insertR x = x `seq` go |

941 | + where |

942 | + go Tip = singleton x |

943 | + go t@(Bin sz y l r) = case compare x y of |

944 | + LT -> balance y (go l) r |

945 | + GT -> balance y l (go r) |

946 | + EQ -> t |

947 | +{-# INLINE insertR #-} |

948 | + |

949 | -- | /O(log n)/. Delete an element from a set. |

950 | delete :: Ord a => a -> Set a -> Set a |

951 | delete x = x `seq` go |

952 | hunk ./Data/Set.hs 335 |

953 | union :: Ord a => Set a -> Set a -> Set a |

954 | union Tip t2 = t2 |

955 | union t1 Tip = t1 |

956 | -union t1 t2 = hedgeUnion (const LT) (const GT) t1 t2 |

957 | +union (Bin _ x Tip Tip) t = insert x t |

958 | +union t (Bin _ x Tip Tip) = insertR x t |

959 | +union t1 t2 = hedgeUnion NothingS NothingS t1 t2 |

960 | {-# INLINE union #-} |

961 | |

962 | hedgeUnion :: Ord a |

963 | hunk ./Data/Set.hs 341 |

964 | - => (a -> Ordering) -> (a -> Ordering) -> Set a -> Set a -> Set a |

965 | + => MaybeS a -> MaybeS a -> Set a -> Set a -> Set a |

966 | hedgeUnion _ _ t1 Tip |

967 | = t1 |

968 | hunk ./Data/Set.hs 344 |

969 | -hedgeUnion cmplo cmphi Tip (Bin _ x l r) |

970 | - = join x (filterGt cmplo l) (filterLt cmphi r) |

971 | -hedgeUnion cmplo cmphi (Bin _ x l r) t2 |

972 | - = join x (hedgeUnion cmplo cmpx l (trim cmplo cmpx t2)) |

973 | - (hedgeUnion cmpx cmphi r (trim cmpx cmphi t2)) |

974 | +hedgeUnion blo bhi Tip (Bin _ x l r) |

975 | + = join x (filterGt blo l) (filterLt bhi r) |

976 | +hedgeUnion blo bhi (Bin _ x l r) t2 |

977 | + = join x (hedgeUnion blo bmi l (trim blo bmi t2)) |

978 | + (hedgeUnion bmi bhi r (trim bmi bhi t2)) |

979 | where |

980 | hunk ./Data/Set.hs 350 |

981 | - cmpx y = compare x y |

982 | + bmi = JustS x |

983 | |

984 | {-------------------------------------------------------------------- |

985 | Difference |

986 | hunk ./Data/Set.hs 360 |

987 | difference :: Ord a => Set a -> Set a -> Set a |

988 | difference Tip _ = Tip |

989 | difference t1 Tip = t1 |

990 | -difference t1 t2 = hedgeDiff (const LT) (const GT) t1 t2 |

991 | +difference t1 t2 = hedgeDiff NothingS NothingS t1 t2 |

992 | {-# INLINE difference #-} |

993 | |

994 | hedgeDiff :: Ord a |

995 | hunk ./Data/Set.hs 364 |

996 | - => (a -> Ordering) -> (a -> Ordering) -> Set a -> Set a -> Set a |

997 | + => MaybeS a -> MaybeS a -> Set a -> Set a -> Set a |

998 | hedgeDiff _ _ Tip _ |

999 | = Tip |

1000 | hunk ./Data/Set.hs 367 |

1001 | -hedgeDiff cmplo cmphi (Bin _ x l r) Tip |

1002 | - = join x (filterGt cmplo l) (filterLt cmphi r) |

1003 | -hedgeDiff cmplo cmphi t (Bin _ x l r) |

1004 | - = merge (hedgeDiff cmplo cmpx (trim cmplo cmpx t) l) |

1005 | - (hedgeDiff cmpx cmphi (trim cmpx cmphi t) r) |

1006 | +hedgeDiff blo bhi (Bin _ x l r) Tip |

1007 | + = join x (filterGt blo l) (filterLt bhi r) |

1008 | +hedgeDiff blo bhi t (Bin _ x l r) |

1009 | + = merge (hedgeDiff blo bmi (trim blo bmi t) l) |

1010 | + (hedgeDiff bmi bhi (trim bmi bhi t) r) |

1011 | where |

1012 | hunk ./Data/Set.hs 373 |

1013 | - cmpx y = compare x y |

1014 | + bmi = JustS x |

1015 | |

1016 | {-------------------------------------------------------------------- |

1017 | Intersection |

1018 | hunk ./Data/Set.hs 603 |

1019 | |

1020 | {-------------------------------------------------------------------- |

1021 | Utility functions that return sub-ranges of the original |

1022 | - tree. Some functions take a comparison function as argument to |

1023 | - allow comparisons against infinite values. A function [cmplo x] |

1024 | - should be read as [compare lo x]. |

1025 | + tree. Some functions take a `Maybe value` as an argument to |

1026 | + allow comparisons against infinite values. These are called `blow` |

1027 | + (Nothing is -\infty) and `bhigh` (here Nothing is +\infty). |

1028 | + We use MaybeS value, which is a Maybe strict in the Just case. |

1029 | |

1030 | hunk ./Data/Set.hs 608 |

1031 | - [trim cmplo cmphi t] A tree that is either empty or where [cmplo x == LT] |

1032 | - and [cmphi x == GT] for the value [x] of the root. |

1033 | - [filterGt cmp t] A tree where for all values [k]. [cmp k == LT] |

1034 | - [filterLt cmp t] A tree where for all values [k]. [cmp k == GT] |

1035 | + [trim blow bhigh t] A tree that is either empty or where [x > blow] |

1036 | + and [x < bhigh] for the value [x] of the root. |

1037 | + [filterGt blow t] A tree where for all values [k]. [k > blow] |

1038 | + [filterLt bhigh t] A tree where for all values [k]. [k < bhigh] |

1039 | |

1040 | [split k t] Returns two trees [l] and [r] where all values |

1041 | in [l] are <[k] and all keys in [r] are >[k]. |

1042 | hunk ./Data/Set.hs 619 |

1043 | was found in the tree. |

1044 | --------------------------------------------------------------------} |

1045 | |

1046 | +data MaybeS a = NothingS | JustS !a |

1047 | + |

1048 | {-------------------------------------------------------------------- |

1049 | hunk ./Data/Set.hs 622 |

1050 | - [trim lo hi t] trims away all subtrees that surely contain no |

1051 | - values between the range [lo] to [hi]. The returned tree is either |

1052 | - empty or the key of the root is between @lo@ and @hi@. |

1053 | + [trim blo bhi t] trims away all subtrees that surely contain no |

1054 | + values between the range [blo] to [bhi]. The returned tree is either |

1055 | + empty or the key of the root is between @blo@ and @bhi@. |

1056 | --------------------------------------------------------------------} |

1057 | hunk ./Data/Set.hs 626 |

1058 | -trim :: (a -> Ordering) -> (a -> Ordering) -> Set a -> Set a |

1059 | -trim _ _ Tip = Tip |

1060 | -trim cmplo cmphi t@(Bin _ x l r) |

1061 | - = case cmplo x of |

1062 | - LT -> case cmphi x of |

1063 | - GT -> t |

1064 | - _ -> trim cmplo cmphi l |

1065 | - _ -> trim cmplo cmphi r |

1066 | +trim :: Ord a => MaybeS a -> MaybeS a -> Set a -> Set a |

1067 | +trim NothingS NothingS t = t |

1068 | +trim (JustS lx) NothingS t = greater t where greater (Bin _ x _ r) | x <= lx = greater r |

1069 | + greater t = t |

1070 | +trim NothingS (JustS hx) t = lesser t where lesser (Bin _ x l _) | x >= hx = lesser l |

1071 | + lesser t = t |

1072 | +trim (JustS lx) (JustS hx) t = middle t where middle (Bin _ x _ r) | x <= lx = middle r |

1073 | + middle (Bin _ x l _) | x >= hx = middle l |

1074 | + middle t = t |

1075 | |

1076 | {-------------------------------------------------------------------- |

1077 | hunk ./Data/Set.hs 637 |

1078 | - [filterGt x t] filter all values >[x] from tree [t] |

1079 | - [filterLt x t] filter all values <[x] from tree [t] |

1080 | + [filterGt b t] filter all values >[b] from tree [t] |

1081 | + [filterLt b t] filter all values <[b] from tree [t] |

1082 | --------------------------------------------------------------------} |

1083 | hunk ./Data/Set.hs 640 |

1084 | -filterGt :: (a -> Ordering) -> Set a -> Set a |

1085 | -filterGt _ Tip = Tip |

1086 | -filterGt cmp (Bin _ x l r) |

1087 | - = case cmp x of |

1088 | - LT -> join x (filterGt cmp l) r |

1089 | - GT -> filterGt cmp r |

1090 | - EQ -> r |

1091 | +filterGt :: Ord a => MaybeS a -> Set a -> Set a |

1092 | +filterGt NothingS t = t |

1093 | +filterGt (JustS b) t = filter' t |

1094 | + where filter' Tip = Tip |

1095 | + filter' (Bin _ x l r) = case compare b x of LT -> join x (filter' l) r |

1096 | + EQ -> r |

1097 | + GT -> filter' r |

1098 | {-# INLINE filterGt #-} |

1099 | |

1100 | hunk ./Data/Set.hs 649 |

1101 | -filterLt :: (a -> Ordering) -> Set a -> Set a |

1102 | -filterLt _ Tip = Tip |

1103 | -filterLt cmp (Bin _ x l r) |

1104 | - = case cmp x of |

1105 | - LT -> filterLt cmp l |

1106 | - GT -> join x l (filterLt cmp r) |

1107 | - EQ -> l |

1108 | +filterLt :: Ord a => MaybeS a -> Set a -> Set a |

1109 | +filterLt NothingS t = t |

1110 | +filterLt (JustS b) t = filter' t |

1111 | + where filter' Tip = Tip |

1112 | + filter' (Bin _ x l r) = case compare x b of LT -> join x l (filter' r) |

1113 | + EQ -> l |

1114 | + GT -> filter' l |

1115 | {-# INLINE filterLt #-} |

1116 | |

1117 | {-------------------------------------------------------------------- |

1118 | } |

1119 | [Improve the performance of Data.Set balance function. |

1120 | Milan Straka <fox@ucw.cz>**20100914140417 |

1121 | Ignore-this: 577c511c219695b8d483af546c7387e8 |

1122 | |

1123 | The balance function is now one monolithic function, which allows |

1124 | to perform all pattern-matches only once. |

1125 | |

1126 | Nearly all functions modifying Data.Map use balance. |

1127 | The improvements are 12% for insert, 14% for delete (GHC 6.12.1). |

1128 | ] { |

1129 | hunk ./Data/Set.hs 805 |

1130 | size of one of them. (a rotation). |

1131 | |

1132 | [delta] is the maximal relative difference between the sizes of |

1133 | - two trees, it corresponds with the [w] in Adams' paper, |

1134 | - or equivalently, [1/delta] corresponds with the $\alpha$ |

1135 | - in Nievergelt's paper. Adams shows that [delta] should |

1136 | - be larger than 3.745 in order to garantee that the |

1137 | - rotations can always restore balance. |

1138 | - |

1139 | + two trees, it corresponds with the [w] in Adams' paper. |

1140 | [ratio] is the ratio between an outer and inner sibling of the |

1141 | heavier subtree in an unbalanced setting. It determines |

1142 | whether a double or single rotation should be performed |

1143 | hunk ./Data/Set.hs 812 |

1144 | to restore balance. It is correspondes with the inverse |

1145 | of $\alpha$ in Adam's article. |

1146 | |

1147 | - Note that: |

1148 | + Note that according to the Adam's paper: |

1149 | - [delta] should be larger than 4.646 with a [ratio] of 2. |

1150 | - [delta] should be larger than 3.745 with a [ratio] of 1.534. |

1151 | hunk ./Data/Set.hs 815 |

1152 | - |

1153 | + |

1154 | + But the Adam's paper is errorneous: |

1155 | + - it can be proved that for delta=2 and delta>=5 there does |

1156 | + not exist any ratio that would work |

1157 | + - delta=4.5 and ratio=2 does not work |

1158 | + |

1159 | + That leaves two reasonable variants, delta=3 and delta=4, |

1160 | + both with ratio=2. |

1161 | + |

1162 | - A lower [delta] leads to a more 'perfectly' balanced tree. |

1163 | - A higher [delta] performs less rebalancing. |

1164 | |

1165 | hunk ./Data/Set.hs 827 |

1166 | - - Balancing is automatic for random data and a balancing |

1167 | - scheme is only necessary to avoid pathological worst cases. |

1168 | - Almost any choice will do in practice |

1169 | - |

1170 | - - Allthough it seems that a rather large [delta] may perform better |

1171 | - than smaller one, measurements have shown that the smallest [delta] |

1172 | - of 4 is actually the fastest on a wide range of operations. It |

1173 | - especially improves performance on worst-case scenarios like |

1174 | - a sequence of ordered insertions. |

1175 | + In the benchmarks, delta=3 is faster on insert operations, |

1176 | + but delta=4 has better overall performance, so we use delta=4. |

1177 | + |

1178 | + Note: in contrast to Adam's paper, we perform the rebalance |

1179 | + even in the case when (size left == delta * size right), instead |

1180 | + when (size left > delta * size) as in the paper. Both are correct, |

1181 | + but the former is slightly faster overall. |

1182 | |

1183 | hunk ./Data/Set.hs 835 |

1184 | - Note: in contrast to Adams' paper, we use a ratio of (at least) 2 |

1185 | - to decide whether a single or double rotation is needed. Allthough |

1186 | - he actually proves that this ratio is needed to maintain the |

1187 | - invariants, his implementation uses a (invalid) ratio of 1. |

1188 | - He is aware of the problem though since he has put a comment in his |

1189 | - original source code that he doesn't care about generating a |

1190 | - slightly inbalanced tree since it doesn't seem to matter in practice. |

1191 | - However (since we use quickcheck :-) we will stick to strictly balanced |

1192 | - trees. |

1193 | --------------------------------------------------------------------} |

1194 | delta,ratio :: Int |

1195 | delta = 4 |

1196 | hunk ./Data/Set.hs 840 |

1197 | ratio = 2 |

1198 | |

1199 | -balance :: a -> Set a -> Set a -> Set a |

1200 | -balance x l r |

1201 | - | sizeL + sizeR <= 1 = Bin sizeX x l r |

1202 | - | sizeR >= delta*sizeL = rotateL x l r |

1203 | - | sizeL >= delta*sizeR = rotateR x l r |

1204 | - | otherwise = Bin sizeX x l r |

1205 | - where |

1206 | - sizeL = size l |

1207 | - sizeR = size r |

1208 | - sizeX = sizeL + sizeR + 1 |

1209 | - |

1210 | --- rotate |

1211 | -rotateL :: a -> Set a -> Set a -> Set a |

1212 | -rotateL x l r@(Bin _ _ ly ry) |

1213 | - | size ly < ratio*size ry = singleL x l r |

1214 | - | otherwise = doubleL x l r |

1215 | -rotateL _ _ Tip = error "rotateL Tip" |

1216 | +-- The balance function is equivalent to the following: |

1217 | +-- |

1218 | +-- balance :: a -> Set a -> Set a -> Set a |

1219 | +-- balance x l r |

1220 | +-- | sizeL + sizeR <= 1 = Bin sizeX x l r |

1221 | +-- | sizeR >= delta*sizeL = rotateL x l r |

1222 | +-- | sizeL >= delta*sizeR = rotateR x l r |

1223 | +-- | otherwise = Bin sizeX x l r |

1224 | +-- where |

1225 | +-- sizeL = size l |

1226 | +-- sizeR = size r |

1227 | +-- sizeX = sizeL + sizeR + 1 |

1228 | +-- |

1229 | +-- rotateL :: a -> Set a -> Set a -> Set a |

1230 | +-- rotateL x l r@(Bin _ _ ly ry) | size ly < ratio*size ry = singleL x l r |

1231 | +-- | otherwise = doubleL x l r |

1232 | +-- rotateR :: a -> Set a -> Set a -> Set a |

1233 | +-- rotateR x l@(Bin _ _ ly ry) r | size ry < ratio*size ly = singleR x l r |

1234 | +-- | otherwise = doubleR x l r |

1235 | +-- |

1236 | +-- singleL, singleR :: a -> Set a -> Set a -> Set a |

1237 | +-- singleL x1 t1 (Bin _ x2 t2 t3) = bin x2 (bin x1 t1 t2) t3 |

1238 | +-- singleR x1 (Bin _ x2 t1 t2) t3 = bin x2 t1 (bin x1 t2 t3) |

1239 | +-- |

1240 | +-- doubleL, doubleR :: a -> Set a -> Set a -> Set a |

1241 | +-- doubleL x1 t1 (Bin _ x2 (Bin _ x3 t2 t3) t4) = bin x3 (bin x1 t1 t2) (bin x2 t3 t4) |

1242 | +-- doubleR x1 (Bin _ x2 t1 (Bin _ x3 t2 t3)) t4 = bin x3 (bin x2 t1 t2) (bin x1 t3 t4) |

1243 | +-- |

1244 | +-- It is only written in such a way that every node is pattern-matched only once. |

1245 | |

1246 | hunk ./Data/Set.hs 870 |

1247 | -rotateR :: a -> Set a -> Set a -> Set a |

1248 | -rotateR x l@(Bin _ _ ly ry) r |

1249 | - | size ry < ratio*size ly = singleR x l r |

1250 | - | otherwise = doubleR x l r |

1251 | -rotateR _ Tip _ = error "rotateL Tip" |

1252 | - |

1253 | --- basic rotations |

1254 | -singleL, singleR :: a -> Set a -> Set a -> Set a |

1255 | -singleL x1 t1 (Bin _ x2 t2 t3) = bin x2 (bin x1 t1 t2) t3 |

1256 | -singleL _ _ Tip = error "singleL" |

1257 | -singleR x1 (Bin _ x2 t1 t2) t3 = bin x2 t1 (bin x1 t2 t3) |

1258 | -singleR _ Tip _ = error "singleR" |

1259 | +balance :: a -> Set a -> Set a -> Set a |

1260 | +balance x l r = case l of |

1261 | + Tip -> case r of |

1262 | + Tip -> Bin 1 x Tip Tip |

1263 | + r@(Bin rs rx Tip Tip) -> Bin 2 x Tip r |

1264 | + r@(Bin rs rx Tip rr@(Bin _ _ _ _)) -> Bin 3 rx (Bin 1 x Tip Tip) rr |

1265 | + r@(Bin rs rx rl@(Bin _ rlx _ _) Tip) -> Bin 3 rlx (Bin 1 x Tip Tip) (Bin 1 rx Tip Tip) |

1266 | + r@(Bin rs rx rl@(Bin rls rlx rll rlr) rr@(Bin rrs rrx rrl rrr)) |

1267 | + | rls < ratio*rrs -> Bin (1+rs) rx (Bin (1+rls) x Tip rl) rr |

1268 | + | otherwise -> Bin (1+rs) rlx (Bin (1+size rll) x Tip rll) (Bin (1+rrs+size rlr) rx rlr rr) |

1269 | |

1270 | hunk ./Data/Set.hs 881 |

1271 | -doubleL, doubleR :: a -> Set a -> Set a -> Set a |

1272 | -doubleL x1 t1 (Bin _ x2 (Bin _ x3 t2 t3) t4) = bin x3 (bin x1 t1 t2) (bin x2 t3 t4) |

1273 | -doubleL _ _ _ = error "doubleL" |

1274 | -doubleR x1 (Bin _ x2 t1 (Bin _ x3 t2 t3)) t4 = bin x3 (bin x2 t1 t2) (bin x1 t3 t4) |

1275 | -doubleR _ _ _ = error "doubleR" |

1276 | + l@(Bin ls lx ll lr) -> case r of |

1277 | + Tip -> case (ll, lr) of |

1278 | + (Tip, Tip) -> Bin 2 x l Tip |

1279 | + (Tip, lr@(Bin _ lrx _ _)) -> Bin 3 lrx (Bin 1 lx Tip Tip) (Bin 1 x Tip Tip) |

1280 | + (ll@(Bin _ _ _ _), Tip) -> Bin 3 lx ll (Bin 1 x Tip Tip) |

1281 | + (ll@(Bin lls llx lll llr), lr@(Bin lrs lrx lrl lrr)) |

1282 | + | lrs < ratio*lls -> Bin (1+ls) lx ll (Bin (1+lrs) x lr Tip) |

1283 | + | otherwise -> Bin (1+ls) lrx (Bin (1+lls+size lrl) lx ll lrl) (Bin (1+size lrr) x lrr Tip) |

1284 | + r@(Bin rs rx rl rr) |

1285 | + | rs >= delta*ls -> case (rl, rr) of |

1286 | + (Bin rls rlx rll rlr, Bin rrs rrx rrl rrr) |

1287 | + | rls < ratio*rrs -> Bin (1+ls+rs) rx (Bin (1+ls+rls) x l rl) rr |

1288 | + | otherwise -> Bin (1+ls+rs) rlx (Bin (1+ls+size rll) x l rll) (Bin (1+rrs+size rlr) rx rlr rr) |

1289 | + | ls >= delta*rs -> case (ll, lr) of |

1290 | + (Bin lls llx lll llr, Bin lrs lrx lrl lrr) |

1291 | + | lrs < ratio*lls -> Bin (1+ls+rs) lx ll (Bin (1+rs+lrs) x lr r) |

1292 | + | otherwise -> Bin (1+ls+rs) lrx (Bin (1+lls+size lrl) lx ll lrl) (Bin (1+rs+size lrr) x lrr r) |

1293 | + | otherwise -> Bin (1+ls+rs) x l r |

1294 | |

1295 | |

1296 | {-------------------------------------------------------------------- |

1297 | } |

1298 | [Improve Data.Set benchmark. |

1299 | Milan Straka <fox@ucw.cz>**20100914142010 |

1300 | Ignore-this: 9b878ae3aa5a43ef083abfd7f9b22513 |

1301 | |

1302 | Add union, difference and intersection to Data.Set benchmark. |

1303 | ] { |

1304 | hunk ./benchmarks/Set.hs 20 |

1305 | |

1306 | main = do |

1307 | let s = S.fromAscList elems :: S.Set Int |

1308 | - s2 = S.fromAscList [-1, -2 .. -(2^10)] :: S.Set Int |

1309 | + s_even = S.fromAscList elems_even :: S.Set Int |

1310 | + s_odd = S.fromAscList elems_odd :: S.Set Int |

1311 | defaultMainWith |

1312 | defaultConfig |

1313 | hunk ./benchmarks/Set.hs 24 |

1314 | - (liftIO . evaluate $ rnf [s, s2]) |

1315 | + (liftIO . evaluate $ rnf [s, s_even, s_odd]) |

1316 | [ bench "member" $ nf (member elems) s |

1317 | , bench "insert" $ nf (ins elems) S.empty |

1318 | , bench "map" $ nf (S.map (+ 1)) s |

1319 | hunk ./benchmarks/Set.hs 36 |

1320 | , bench "findMax" $ nf S.findMax s |

1321 | , bench "deleteMin" $ nf S.deleteMin s |

1322 | , bench "deleteMax" $ nf S.deleteMax s |

1323 | - , bench "unions" $ nf S.unions [s, s2] |

1324 | - , bench "union" $ nf (S.union s) s2 |

1325 | + , bench "unions" $ nf S.unions [s_even, s_odd] |

1326 | + , bench "union" $ nf (S.union s_even) s_odd |

1327 | + , bench "difference" $ nf (S.difference s) s_even |

1328 | + , bench "intersection" $ nf (S.intersection s) s_even |

1329 | ] |

1330 | where |

1331 | elems = [1..2^10] |

1332 | hunk ./benchmarks/Set.hs 43 |

1333 | + elems_even = [2,4..2^10] |

1334 | + elems_odd = [1,3..2^10] |

1335 | |

1336 | member :: [Int] -> S.Set Int -> Int |

1337 | member xs s = foldl' (\n x -> if S.member x s then n + 1 else n) 0 xs |

1338 | } |

1339 | [Correct Data.Set Arbitrary instance never to return unbalanced trees. |

1340 | Milan Straka <fox@ucw.cz>**20100914150442 |

1341 | Ignore-this: b5c70fa98a56f225b8eb5faf420677b0 |

1342 | |

1343 | The previous instance sometimes returned unbalanced trees, |

1344 | which broke the tests. |

1345 | |

1346 | Also the new instance mimics Data.Map instance more closely in the shape |

1347 | of the generated trees. |

1348 | ] { |

1349 | move ./tests/Set.hs ./tests/set-properties.hs |

1350 | hunk ./tests/set-properties.hs 4 |

1351 | {-# LANGUAGE CPP, ScopedTypeVariables #-} |

1352 | |

1353 | -- QuickCheck properties for Data.Set |

1354 | --- > ghc -DTESTING -fforce-recomp -O2 --make -fhpc -i.. Set.hs |

1355 | +-- > ghc -DTESTING -fforce-recomp -O2 --make -fhpc -i.. set-properties.hs |

1356 | |

1357 | import Data.List (nub,sort) |

1358 | import qualified Data.List as List |

1359 | hunk ./tests/set-properties.hs 65 |

1360 | where maxkey = 10000 |

1361 | |

1362 | arbtree :: (Enum a) => Int -> Int -> Int -> Gen (Set a) |

1363 | -arbtree lo hi n |

1364 | - | n <= 0 = return Tip |

1365 | - | lo >= hi = return Tip |

1366 | - | otherwise = do i <- choose (lo,hi) |

1367 | - m <- choose (1,30) |

1368 | - let (ml,mr) | m==(1::Int) = (1,2) |

1369 | - | m==2 = (2,1) |

1370 | - | m==3 = (1,1) |

1371 | - | otherwise = (2,2) |

1372 | - l <- arbtree lo (i-1) (n `div` ml) |

1373 | - r <- arbtree (i+1) hi (n `div` mr) |

1374 | - return (bin (toEnum i) l r) |

1375 | +arbtree lo hi n = do t <- gentree lo hi n |

1376 | + if balanced t then return t else arbtree lo hi n |

1377 | + where gentree lo hi n |

1378 | + | n <= 0 = return Tip |

1379 | + | lo >= hi = return Tip |

1380 | + | otherwise = do i <- choose (lo,hi) |

1381 | + m <- choose (1,70) |

1382 | + let (ml,mr) | m==(1::Int) = (1,2) |

1383 | + | m==2 = (2,1) |

1384 | + | m==3 = (1,1) |

1385 | + | otherwise = (2,2) |

1386 | + l <- gentree lo (i-1) (n `div` ml) |

1387 | + r <- gentree (i+1) hi (n `div` mr) |

1388 | + return (bin (toEnum i) l r) |

1389 | |

1390 | {-------------------------------------------------------------------- |

1391 | Valid tree's |

1392 | } |

1393 | [Changing delta to 3 in Data.Set. |

1394 | Milan Straka <fox@ucw.cz>**20100921090507 |

1395 | Ignore-this: a47d0c542ed9cee99ad6b17c52c977a1 |

1396 | |

1397 | Only possible values are 3 and 4. The value 3 has much faster inserts, |

1398 | value 4 slightly faster deletes, so choosing 3. |

1399 | |

1400 | Also changed the inequalities to rebalance only when one subtree |

1401 | is _strictly_ larger than delta * the other one, to mimic the behaviour |

1402 | from the proof (both from the Adams' and from the one to come). |

1403 | ] { |

1404 | hunk ./Data/Set.hs 724 |

1405 | join x Tip r = insertMin x r |

1406 | join x l Tip = insertMax x l |

1407 | join x l@(Bin sizeL y ly ry) r@(Bin sizeR z lz rz) |

1408 | - | delta*sizeL <= sizeR = balance z (join x l lz) rz |

1409 | - | delta*sizeR <= sizeL = balance y ly (join x ry r) |

1410 | - | otherwise = bin x l r |

1411 | + | delta*sizeL < sizeR = balanceL z (join x l lz) rz |

1412 | + | delta*sizeR < sizeL = balanceR y ly (join x ry r) |

1413 | + | otherwise = bin x l r |

1414 | |

1415 | |

1416 | -- insertMin and insertMax don't perform potentially expensive comparisons. |

1417 | hunk ./Data/Set.hs 750 |

1418 | merge Tip r = r |

1419 | merge l Tip = l |

1420 | merge l@(Bin sizeL x lx rx) r@(Bin sizeR y ly ry) |

1421 | - | delta*sizeL <= sizeR = balance y (merge l ly) ry |

1422 | - | delta*sizeR <= sizeL = balance x lx (merge rx r) |

1423 | - | otherwise = glue l r |

1424 | + | delta*sizeL < sizeR = balanceL y (merge l ly) ry |

1425 | + | delta*sizeR < sizeL = balanceR x lx (merge rx r) |

1426 | + | otherwise = glue l r |

1427 | |

1428 | {-------------------------------------------------------------------- |

1429 | [glue l r]: glues two trees together. |

1430 | hunk ./Data/Set.hs 828 |

1431 | - A higher [delta] performs less rebalancing. |

1432 | |

1433 | In the benchmarks, delta=3 is faster on insert operations, |

1434 | - but delta=4 has better overall performance, so we use delta=4. |

1435 | - |

1436 | - Note: in contrast to Adam's paper, we perform the rebalance |

1437 | - even in the case when (size left == delta * size right), instead |

1438 | - when (size left > delta * size) as in the paper. Both are correct, |

1439 | - but the former is slightly faster overall. |

1440 | + and delta=4 has slightly better deletes. As the insert speedup |

1441 | + is larger, we currently use delta=3. |

1442 | |

1443 | --------------------------------------------------------------------} |

1444 | delta,ratio :: Int |

1445 | hunk ./Data/Set.hs 833 |

1446 | -delta = 4 |

1447 | +delta = 3 |

1448 | ratio = 2 |

1449 | |

1450 | -- The balance function is equivalent to the following: |

1451 | hunk ./Data/Set.hs 840 |

1452 | -- |

1453 | -- balance :: a -> Set a -> Set a -> Set a |

1454 | -- balance x l r |

1455 | --- | sizeL + sizeR <= 1 = Bin sizeX x l r |

1456 | --- | sizeR >= delta*sizeL = rotateL x l r |

1457 | --- | sizeL >= delta*sizeR = rotateR x l r |

1458 | --- | otherwise = Bin sizeX x l r |

1459 | +-- | sizeL + sizeR <= 1 = Bin sizeX x l r |

1460 | +-- | sizeR > delta*sizeL = rotateL x l r |

1461 | +-- | sizeL > delta*sizeR = rotateR x l r |

1462 | +-- | otherwise = Bin sizeX x l r |

1463 | -- where |

1464 | -- sizeL = size l |

1465 | -- sizeR = size r |

1466 | hunk ./Data/Set.hs 886 |

1467 | | lrs < ratio*lls -> Bin (1+ls) lx ll (Bin (1+lrs) x lr Tip) |

1468 | | otherwise -> Bin (1+ls) lrx (Bin (1+lls+size lrl) lx ll lrl) (Bin (1+size lrr) x lrr Tip) |

1469 | r@(Bin rs rx rl rr) |

1470 | - | rs >= delta*ls -> case (rl, rr) of |

1471 | + | rs > delta*ls -> case (rl, rr) of |

1472 | (Bin rls rlx rll rlr, Bin rrs rrx rrl rrr) |

1473 | | rls < ratio*rrs -> Bin (1+ls+rs) rx (Bin (1+ls+rls) x l rl) rr |

1474 | | otherwise -> Bin (1+ls+rs) rlx (Bin (1+ls+size rll) x l rll) (Bin (1+rrs+size rlr) rx rlr rr) |

1475 | hunk ./Data/Set.hs 890 |

1476 | - | ls >= delta*rs -> case (ll, lr) of |

1477 | + | ls > delta*rs -> case (ll, lr) of |

1478 | (Bin lls llx lll llr, Bin lrs lrx lrl lrr) |

1479 | | lrs < ratio*lls -> Bin (1+ls+rs) lx ll (Bin (1+rs+lrs) x lr r) |

1480 | | otherwise -> Bin (1+ls+rs) lrx (Bin (1+lls+size lrl) lx ll lrl) (Bin (1+rs+size lrr) x lrr r) |

1481 | } |

1482 | [Further improve Data.Set balance function. |

1483 | Milan Straka <fox@ucw.cz>**20100921091828 |

1484 | Ignore-this: f23be37859224e9bbe919a3c0a71fdc6 |

1485 | |

1486 | As suggested by Kazu Yamamoto, we split balance to balanceL and |

1487 | balanceR, which handle only one-sided inbalance, but need fewer |

1488 | tests than balance. |

1489 | |

1490 | As nearly all functions modifying the structure use balance, this |

1491 | results in speedup of many functions. On my 32-bit GHC 6.12.1, |

1492 | 11% speedup for insert, 12% speedup for delete. |

1493 | ] { |

1494 | hunk ./Data/Set.hs 241 |

1495 | where |

1496 | go Tip = singleton x |

1497 | go (Bin sz y l r) = case compare x y of |

1498 | - LT -> balance y (go l) r |

1499 | - GT -> balance y l (go r) |

1500 | + LT -> balanceL y (go l) r |

1501 | + GT -> balanceR y l (go r) |

1502 | EQ -> Bin sz x l r |

1503 | {-# INLINE insert #-} |

1504 | |

1505 | hunk ./Data/Set.hs 253 |

1506 | where |

1507 | go Tip = singleton x |

1508 | go t@(Bin sz y l r) = case compare x y of |

1509 | - LT -> balance y (go l) r |

1510 | - GT -> balance y l (go r) |

1511 | + LT -> balanceL y (go l) r |

1512 | + GT -> balanceR y l (go r) |

1513 | EQ -> t |

1514 | {-# INLINE insertR #-} |

1515 | |

1516 | hunk ./Data/Set.hs 264 |

1517 | where |

1518 | go Tip = Tip |

1519 | go (Bin _ y l r) = case compare x y of |

1520 | - LT -> balance y (go l) r |

1521 | - GT -> balance y l (go r) |

1522 | + LT -> balanceR y (go l) r |

1523 | + GT -> balanceL y l (go r) |

1524 | EQ -> glue l r |

1525 | {-# INLINE delete #-} |

1526 | |

1527 | hunk ./Data/Set.hs 311 |

1528 | -- | /O(log n)/. Delete the minimal element. |

1529 | deleteMin :: Set a -> Set a |

1530 | deleteMin (Bin _ _ Tip r) = r |

1531 | -deleteMin (Bin _ x l r) = balance x (deleteMin l) r |

1532 | +deleteMin (Bin _ x l r) = balanceR x (deleteMin l) r |

1533 | deleteMin Tip = Tip |

1534 | |

1535 | -- | /O(log n)/. Delete the maximal element. |

1536 | hunk ./Data/Set.hs 317 |

1537 | deleteMax :: Set a -> Set a |

1538 | deleteMax (Bin _ _ l Tip) = l |

1539 | -deleteMax (Bin _ x l r) = balance x l (deleteMax r) |

1540 | +deleteMax (Bin _ x l r) = balanceL x l (deleteMax r) |

1541 | deleteMax Tip = Tip |

1542 | |

1543 | {-------------------------------------------------------------------- |

1544 | hunk ./Data/Set.hs 735 |

1545 | = case t of |

1546 | Tip -> singleton x |

1547 | Bin _ y l r |

1548 | - -> balance y l (insertMax x r) |

1549 | + -> balanceR y l (insertMax x r) |

1550 | |

1551 | insertMin x t |

1552 | = case t of |

1553 | hunk ./Data/Set.hs 741 |

1554 | Tip -> singleton x |

1555 | Bin _ y l r |

1556 | - -> balance y (insertMin x l) r |

1557 | + -> balanceL y (insertMin x l) r |

1558 | |

1559 | {-------------------------------------------------------------------- |

1560 | [merge l r]: merges two trees. |

1561 | hunk ./Data/Set.hs 762 |

1562 | glue Tip r = r |

1563 | glue l Tip = l |

1564 | glue l r |

1565 | - | size l > size r = let (m,l') = deleteFindMax l in balance m l' r |

1566 | - | otherwise = let (m,r') = deleteFindMin r in balance m l r' |

1567 | + | size l > size r = let (m,l') = deleteFindMax l in balanceR m l' r |

1568 | + | otherwise = let (m,r') = deleteFindMin r in balanceL m l r' |

1569 | |

1570 | |

1571 | -- | /O(log n)/. Delete and find the minimal element. |

1572 | hunk ./Data/Set.hs 774 |

1573 | deleteFindMin t |

1574 | = case t of |

1575 | Bin _ x Tip r -> (x,r) |

1576 | - Bin _ x l r -> let (xm,l') = deleteFindMin l in (xm,balance x l' r) |

1577 | + Bin _ x l r -> let (xm,l') = deleteFindMin l in (xm,balanceR x l' r) |

1578 | Tip -> (error "Set.deleteFindMin: can not return the minimal element of an empty set", Tip) |

1579 | |

1580 | -- | /O(log n)/. Delete and find the maximal element. |

1581 | hunk ./Data/Set.hs 784 |

1582 | deleteFindMax t |

1583 | = case t of |

1584 | Bin _ x l Tip -> (x,l) |

1585 | - Bin _ x l r -> let (xm,r') = deleteFindMax r in (xm,balance x l r') |

1586 | + Bin _ x l r -> let (xm,r') = deleteFindMax r in (xm,balanceL x l r') |

1587 | Tip -> (error "Set.deleteFindMax: can not return the maximal element of an empty set", Tip) |

1588 | |

1589 | -- | /O(log n)/. Retrieves the minimal key of the set, and the set |

1590 | hunk ./Data/Set.hs 896 |

1591 | | otherwise -> Bin (1+ls+rs) lrx (Bin (1+lls+size lrl) lx ll lrl) (Bin (1+rs+size lrr) x lrr r) |

1592 | | otherwise -> Bin (1+ls+rs) x l r |

1593 | |

1594 | +-- Functions balanceL and balanceR are specialised versions of balance. |

1595 | +-- balanceL only checks whether the left subtree is too big, |

1596 | +-- balanceR only checks whether the right subtree is too big. |

1597 | + |

1598 | +-- balanceL is called when left subtree might have been inserted to or when |

1599 | +-- right subtree might have been deleted from. |

1600 | +balanceL :: a -> Set a -> Set a -> Set a |

1601 | +balanceL x l r = case r of |

1602 | + Tip -> case l of |

1603 | + Tip -> Bin 1 x Tip Tip |

1604 | + l@(Bin ls lx Tip Tip) -> Bin 2 x l Tip |

1605 | + l@(Bin ls lx Tip lr@(Bin _ lrx _ _)) -> Bin 3 lrx (Bin 1 lx Tip Tip) (Bin 1 x Tip Tip) |

1606 | + l@(Bin ls lx ll@(Bin _ _ _ _) Tip) -> Bin 3 lx ll (Bin 1 x Tip Tip) |

1607 | + l@(Bin ls lx ll@(Bin lls llx lll llr) lr@(Bin lrs lrx lrl lrr)) |

1608 | + | lrs < ratio*lls -> Bin (1+ls) lx ll (Bin (1+lrs) x lr Tip) |

1609 | + | otherwise -> Bin (1+ls) lrx (Bin (1+lls+size lrl) lx ll lrl) (Bin (1+size lrr) x lrr Tip) |

1610 | + |

1611 | + r@(Bin rs rx rl rr) -> case l of |

1612 | + Tip -> Bin (1+rs) x Tip r |

1613 | + |

1614 | + l@(Bin ls lx ll lr) |

1615 | + | ls > delta*rs -> case (ll, lr) of |

1616 | + (Bin lls llx lll llr, Bin lrs lrx lrl lrr) |

1617 | + | lrs < ratio*lls -> Bin (1+ls+rs) lx ll (Bin (1+rs+lrs) x lr r) |

1618 | + | otherwise -> Bin (1+ls+rs) lrx (Bin (1+lls+size lrl) lx ll lrl) (Bin (1+rs+size lrr) x lrr r) |

1619 | + | otherwise -> Bin (1+ls+rs) x l r |

1620 | + |

1621 | +-- balanceR is called when right subtree might have been inserted to or when |

1622 | +-- left subtree might have been deleted from. |

1623 | +balanceR :: a -> Set a -> Set a -> Set a |

1624 | +balanceR x l r = case l of |

1625 | + Tip -> case r of |

1626 | + Tip -> Bin 1 x Tip Tip |

1627 | + r@(Bin rs rx Tip Tip) -> Bin 2 x Tip r |

1628 | + r@(Bin rs rx Tip rr@(Bin _ _ _ _)) -> Bin 3 rx (Bin 1 x Tip Tip) rr |

1629 | + r@(Bin rs rx rl@(Bin _ rlx _ _) Tip) -> Bin 3 rlx (Bin 1 x Tip Tip) (Bin 1 rx Tip Tip) |

1630 | + r@(Bin rs rx rl@(Bin rls rlx rll rlr) rr@(Bin rrs rrx rrl rrr)) |

1631 | + | rls < ratio*rrs -> Bin (1+rs) rx (Bin (1+rls) x Tip rl) rr |

1632 | + | otherwise -> Bin (1+rs) rlx (Bin (1+size rll) x Tip rll) (Bin (1+rrs+size rlr) rx rlr rr) |

1633 | + |

1634 | + l@(Bin ls lx ll lr) -> case r of |

1635 | + Tip -> Bin (1+ls) x l Tip |

1636 | + |

1637 | + r@(Bin rs rx rl rr) |

1638 | + | rs > delta*ls -> case (rl, rr) of |

1639 | + (Bin rls rlx rll rlr, Bin rrs rrx rrl rrr) |

1640 | + | rls < ratio*rrs -> Bin (1+ls+rs) rx (Bin (1+ls+rls) x l rl) rr |

1641 | + | otherwise -> Bin (1+ls+rs) rlx (Bin (1+ls+size rll) x l rll) (Bin (1+rrs+size rlr) rx rlr rr) |

1642 | + | otherwise -> Bin (1+ls+rs) x l r |

1643 | |

1644 | {-------------------------------------------------------------------- |

1645 | The bin constructor maintains the size of the tree |

1646 | } |

1647 | |

1648 | Context: |

1649 | |

1650 | [fix warnings |

1651 | Simon Marlow <marlowsd@gmail.com>**20100831114555 |

1652 | Ignore-this: 53df71bc054a779b8ad2dad89c09e02d |

1653 | ] |

1654 | [Missing MagicHash for IntSet |

1655 | Don Stewart <dons@galois.com>**20100831093446 |

1656 | Ignore-this: d075f760adb9a2aa0ee04676e38a07cc |

1657 | ] |

1658 | [Performance improvements for Data.IntMap (worker/wrapper and inlining) |

1659 | Don Stewart <dons@galois.com>**20100831093316 |

1660 | Ignore-this: 206036448558d270f0eb85ef4cd55368 |

1661 | ] |

1662 | [Add criterion-based benchmarking for IntMap |

1663 | Don Stewart <dons@galois.com>**20100831093240 |

1664 | Ignore-this: d7d85b9afb513532cc30f5b51a3f825e |

1665 | ] |

1666 | [Add comprehensive testsuite for IntMap |

1667 | Don Stewart <dons@galois.com>**20100831093202 |

1668 | Ignore-this: d455fedbc615e5b63ac488e605550557 |

1669 | ] |

1670 | [-O2 -fregs-graph is a uniform 10% improvements for IntMap |

1671 | Don Stewart <dons@galois.com>**20100831092956 |

1672 | Ignore-this: 2372cf4be945fe7939d0af94e32c567f |

1673 | ] |

1674 | [Missed base case for updateAt worker. Spotted by Jan-Willem Maessen |

1675 | Don Stewart <dons@galois.com>**20100829163329 |

1676 | Ignore-this: b8daf1c55c163c16f50c3b54cca2dba1 |

1677 | ] |

1678 | [Major bump (new functions, clarified strictness properties, vastly better performance) |

1679 | Don Stewart <dons@galois.com>**20100829122628 |

1680 | Ignore-this: 9bfbc58ecaa24a86be37b8c4cb043457 |

1681 | ] |

1682 | [Add two new functions: foldlWithKey' and insertLookupWithKey' |

1683 | Don Stewart <dons@galois.com>**20100829122147 |

1684 | Ignore-this: a2f112653ba38737fe1b38609e06c314 |

1685 | |

1686 | These two functions use strict accumulators, compared to their existing |

1687 | counterparts (which are lazy left folds, that appear not to be useful). |

1688 | Performance is significantly better. |

1689 | |

1690 | ] |

1691 | [Performance improvements to Data.Map |

1692 | Don Stewart <dons@galois.com>**20100829120245 |

1693 | Ignore-this: b4830cddfa6d62e4883f4e0f58ac4e57 |

1694 | |

1695 | Applied several standard transformations to improve the performance of |

1696 | code: |

1697 | |

1698 | * Worker/wrapper of all recursive functions with constant arguments |

1699 | * Inlining of all (non-recursive) wrappers |

1700 | * Consistent use of strict keys |

1701 | |

1702 | Average performance improvements across common API (with GHC 6.12.3): |

1703 | |

1704 | * Linux / x86_64 / 2.6Ghz i7 : 48% |

1705 | * Mac OSX 10.5 / x86 / 2 Ghz Xeon : 36% |

1706 | |

1707 | Graphs and raw data: http://is.gd/eJHIE |

1708 | |

1709 | This patch is (mostly) orthogonal to the algorithmic changes suggested |

1710 | by Milan Straka in his HW 2010 paper: |

1711 | |

1712 | http://research.microsoft.com/~simonpj/papers/containers/containers.pdf |

1713 | |

1714 | Those changes could be added separately, for some additional improvments. |

1715 | |

1716 | Work carried out over 28/29th August, 2010 in Utrecht, NL, by Johan Tibell |

1717 | and Don Stewart. |

1718 | |

1719 | ] |

1720 | [Add a criterion-based benchmark suite for Data.Map |

1721 | Don Stewart <dons@galois.com>**20100829114611 |

1722 | Ignore-this: ec61668f5bcb78bd15b72e2728c01c19 |

1723 | |

1724 | This adds a criterion-based micro-benchmarking suite for Data.Map. It |

1725 | can be used to measure performance improvements for individual top-level |

1726 | functions. |

1727 | |

1728 | Examples here: http://is.gd/eJHIE |

1729 | |

1730 | ] |

1731 | [Add a comprehensive testsuite for Data.Map |

1732 | Don Stewart <dons@galois.com>**20100829113545 |

1733 | Ignore-this: 891e7fe6bac3523868714ac1ff51c0a3 |

1734 | |

1735 | This patch adds a joint quickcheck2 / hunit testsuite, with coverage of |

1736 | 91% of top level functions (remaining features are mostly in instances). |

1737 | |

1738 | The coverage data is here: |

1739 | |

1740 | http://code.haskell.org/~dons/tests/containers/hpc_index.html |

1741 | |

1742 | No bugs were found. It includes unit tests for known past bugs |

1743 | (balancing). |

1744 | |

1745 | ] |

1746 | [Oops, get the #ifdef symbol correct. |

1747 | Malcolm.Wallace@me.com**20100902081938] |

1748 | [Protect a gratuitous GHC-ism with #ifdefs. |

1749 | Malcolm.Wallace@me.com**20100902081217] |

1750 | [Set Data.Map's delta to 4; fixes #4242 |

1751 | Ian Lynagh <igloo@earth.li>**20100815131954] |

1752 | [Add a test for #4242 |

1753 | Ian Lynagh <igloo@earth.li>**20100815131856] |

1754 | [Add a local type signature |

1755 | simonpj@microsoft.com**20100730124447 |

1756 | Ignore-this: b581d3f2c80a7a860456d589960f12f2 |

1757 | ] |

1758 | [Add type signature in local where clause |

1759 | simonpj@microsoft.com**20100727151709 |

1760 | Ignore-this: 5929c4156500b25b280eb414b508c508 |

1761 | ] |

1762 | [Fix Data.Sequence's breakr, and add a test for it; fixes trac #4157 |

1763 | Ian Lynagh <igloo@earth.li>**20100704140627] |

1764 | [Fix proposal #4109: Make Data.Map.insertWith's strictness consistent |

1765 | Ian Lynagh <igloo@earth.li>**20100615133055] |

1766 | [Tweak layout to work with the alternative layout rule |

1767 | Ian Lynagh <igloo@earth.li>**20091129154519] |

1768 | [Disable building Data.Sequence (and dependents) for nhc98. |

1769 | Malcolm.Wallace@cs.york.ac.uk**20091124025653 |

1770 | There is some subtlety of polymorphically recursive datatypes and |

1771 | type-class defaulting that nhc98's type system barfs over. |

1772 | ] |

1773 | [Fix another instance of non-ghc breakage. |

1774 | Malcolm.Wallace@cs.york.ac.uk**20091123092637] |

1775 | [Add #ifdef around ghc-only (<$) as member of Functor class. |

1776 | Malcolm.Wallace@cs.york.ac.uk**20091123085155] |

1777 | [Fix broken code in non-GHC branch of an ifdef. |

1778 | Malcolm.Wallace@cs.york.ac.uk**20091123084824] |

1779 | [doc bugfix: correct description of index argument |

1780 | Ross Paterson <ross@soi.city.ac.uk>**20091028105532 |

1781 | Ignore-this: 9790e7bf422c4cb528722c03cfa4fed9 |

1782 | |

1783 | As noted by iaefai on the libraries list. |

1784 | |

1785 | Please merge to STABLE. |

1786 | ] |

1787 | [Bump version to 0.3.0.0 |

1788 | Ian Lynagh <igloo@earth.li>**20090920141847] |

1789 | [update base dependency |

1790 | Ross Paterson <ross@soi.city.ac.uk>**20090916073125 |

1791 | Ignore-this: ad382ffc6c6a18c15364e6c072f19edb |

1792 | |

1793 | The package uses mkNoRepType and Data.Functor, which were not in the |

1794 | stable branch of base-4. |

1795 | ] |

1796 | [add fast version of <$ for Seq |

1797 | Ross Paterson <ross@soi.city.ac.uk>**20090916072812 |

1798 | Ignore-this: 5a39a7d31d39760ed589790b1118d240 |

1799 | ] |

1800 | [new methods for Data.Sequence (proposal #3271) |

1801 | Ross Paterson <ross@soi.city.ac.uk>**20090915173324 |

1802 | Ignore-this: cf17bedd709a6ab3448fd718dcdf62e7 |

1803 | |

1804 | Adds a lot of new methods to Data.Sequence, mostly paralleling those |

1805 | in Data.List. Several of these are significantly faster than versions |

1806 | implemented with the previous public interface. In particular, replicate |

1807 | takes O(log n) time and space instead of O(n). |

1808 | (by Louis Wasserman) |

1809 | ] |

1810 | [Fix "Cabal check" warnings |

1811 | Ian Lynagh <igloo@earth.li>**20090811215900] |

1812 | [TAG 2009-06-25 |

1813 | Ian Lynagh <igloo@earth.li>**20090625160202] |

1814 | Patch bundle hash: |

1815 | 37cecbd8d3b137d4c7087324ef19e72cf299eccb |