Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#3076 closed bug (wontfix)

Make genericLength tail-recursive so it doesn't overflow stack

Reported by: Syzygies Owned by:
Priority: normal Milestone:
Component: Compiler Version: 6.10.1
Keywords: genericLength Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

A likely use for genericLength is to count lists of more than Int elements. However, the source code is not tail-recursive, making a poor "consumer", so genericLength easily overflows stack.

Here is the source code from 6.10.1:

genericLength           :: (Num i) => [b] -> i
genericLength []        =  0
genericLength (_:l)     =  1 + genericLength l

Here is a proposed alternative:

genericLength ∷ (Num i) ⇒ [b] → i
genericLength = len 0 where
  len n [] = n
  len n (_:xt) = len (n+1) xt

In my test application (enumerating the 66,960,965,307 atomic lattices on six atoms) this alternative avoids overflowing the stack.

[This is not the same issue as http://hackage.haskell.org/trac/ghc/ticket/2962]

Attachments (1)

Main.hs (1.6 KB) - added by Syzygies 11 years ago.

Download all attachments as: .zip

Change History (6)

comment:1 Changed 11 years ago by igloo

difficulty: Unknown
Resolution: invalid
Status: newclosed

That isn't an equivalent definition, e.g.:

data Nat = Zero | Succ Nat
    deriving (Show, Eq)

works :: Bool
works = genericLength1 (cycle "foo") > (5 :: Nat)

fails :: Bool
fails = genericLength2 (cycle "foo") > (5 :: Nat)

instance Num Nat where
    fromInteger 0 = Zero
    fromInteger (n + 1) = Succ (fromInteger n)
    Zero + n = n
    Succ m + n = Succ (m + n)

instance Ord Nat where
    compare Zero Zero = EQ
    compare Zero _    = LT
    compare _    Zero = GT
    compare (Succ m) (Succ n) = compare m n

genericLength1           :: (Num i) => [b] -> i
genericLength1 []        =  0
genericLength1 (_:l)     =  1 + genericLength1 l

genericLength2 :: (Num i) => [b] -> i
genericLength2 = len 0 where
  len n [] = n
  len n (_:xt) = len (n+1) xt

works works, but fails doesn't terminate:

*Main> works
True
*Main> fails
^CInterrupted.

If you restrict the type to Int or Integer, then it will work thanks to #2962, though.

comment:2 Changed 11 years ago by Syzygies

Keywords: genericLength added
Resolution: invalid
Status: closedreopened

That was enlightening. Thanks.

The library genericLength blows stack on very short lists. My first proposal fails on infinite lists, using Peano arithmetic. So they both fail reasonable tests. However, one can make a logarithmic improvement in the stack usage of the library function, and pass both tests:

  module Main where

  import Data.Int

  data Nat = Zero | Succ Nat
    deriving (Show, Eq)

  instance Num Nat where
    fromInteger 0 = Zero
    fromInteger n = Succ $ fromInteger $ n - 1
    Zero + n = n
    Succ m + n = Succ (m + n)
    Zero * _ = Zero
    Succ m * n = n + m * n
    abs n = n
    signum Zero = Zero
    signum (Succ _) = Succ Zero

  instance Ord Nat where
    compare Zero Zero = EQ
    compare Zero _    = LT
    compare _    Zero = GT
    compare (Succ m) (Succ n) = compare m n

  genericLength1       ∷ (Num i) ⇒ [b] → i
  genericLength1 []    =  0
  genericLength1 (_:l) =  1 + genericLength1 l

  genericLength2 ∷ (Num i) ⇒ [b] → i
  genericLength2 = len 0 where
    len n [] = n
    len n (_:xt) = len (n+1) xt

  genericLength3 ∷ (Num i) ⇒ [b] → i
  genericLength3 = len 1 0 where
    len _ n [] = n
    len m n (_:xt)
      | m == n = n + len (n+n) 1 xt
      | otherwise = len m (n+1) xt

  intLength1, intLength2, intLength3 ∷ [a] → Int64
  intLength1 = genericLength1
  intLength2 = genericLength2
  intLength3 = genericLength3

  natLength1, natLength2, natLength3 ∷ [a] → Nat
  natLength1 = genericLength1
  natLength2 = genericLength2
  natLength3 = genericLength3

  list :: Int64 -> [Bool]
  list 0 = []
  list n = True : list (n-1)

  main ∷ IO ()
  main = do

--  print $ intLength1 $ list $ 2^19 -- fails w/ 8388608 byte stack
    print $ intLength2 $ list $ 2^32
    print $ intLength3 $ list $ 2^32

    print $ natLength1 (repeat 1) > 5
--  print $ natLength2 (repeat 1) > 5 -- fails
    print $ natLength3 (repeat 1) > 5

Getting this to work requires specializing genericLength, to avoid class overhead. This is nevertheless preferable to writing a length function from scratch, which might fall into either of these traps.

Getting this to work also requires a "good producer". I was surprised to discover that in this context, [1..n] isn't a "good producer".

Changed 11 years ago by Syzygies

Attachment: Main.hs added

comment:3 Changed 11 years ago by Syzygies

I ran a timing test, to compare intLengthN for N=1,2,3 on an example that N=1 (the existing library code) could handle:

print $ sum $ map intLength1 $ take 1000 $ iterate tail $ list $ 2^18

intLength3 runs slower (of course) than the invalid intLength2, but 5.5x faster than the current library code intlength1.

comment:4 Changed 10 years ago by igloo

Resolution: wontfix
Status: reopenedclosed

This assumes that (==) terminates, which it might not do, e.g. with some datatypes representing the reals.

comment:5 Changed 10 years ago by simonmar

Type of failure: Runtime performance bug
Note: See TracTickets for help on using tickets.