Opened 8 years ago

Closed 7 years ago

Last modified 7 years ago

#5926 closed feature request (fixed)

Add strict versions of modifyIORef and atomicModifyIORef

Reported by: joeyadams Owned by: simonmar
Priority: normal Milestone: 7.6.1
Component: libraries/base Version: 7.4.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

It is easy to misuse modifyIORef and atomicModifyIORef due to their lack of strictness. For example, if you use an IORef as a counter, use modifyIORef to increment it, and never evaluate the value until the very end, your program will leak memory, and you may even get a stack overflow:

ref <- newIORef 0
replicateM_ 1000000 $ modifyIORef ref (+1)
readIORef ref >>= print

Today, I found a space leak in the tls package. Repeatedly calling sendData would leak memory. It didn't take me long to find the cause once I noticed a module named "Measurement" used for gathering connection statistics. It used modifyIORef to update the Measurement structure. When I changed it to this:

x <- readIORef (ctxMeasurement ctx)
writeIORef (ctxMeasurement ctx) $! f x

the space leak went away.

A more subtle mistake can be made using atomicModifyIORef. Can you spot the problem with this code?

atomicModifyIORef ref (\_ -> (new_value, ()))

It's not incrementing anything, it's just replacing the value. However, it's still deferring evaluation of the function. This mistake was pointed out in The Monad.Reader Issue 19, where they suggested the following idiom:

x <- atomicModifyIORef ref (\_ -> (new_value, ()))
x `seq` return ()

Thus, I believe there should be strict variants of modifyIORef and atomicModifyIORef, if only to warn programmers of these pitfalls.

modifyIORef' is pretty straightforward: force the result of applying the function:

modifyIORef' ref f = do
    x <- readIORef ref
    let x' = f x
    x' `seq` writeIORef ref x'

The only question is: would it be better to force x' after writeIORef instead of before it, in case a caller is trying to share the IORef among threads (which they shouldn't be)?

atomicModifyIORef is less straightforward. Should we force the values themselves? It is possible to avoid the space leak above without forcing either the new value or the return value:

atomicModifyIORef' :: IORef a -> (a -> (a,b)) -> IO b
atomicModifyIORef' ref f = do
    p <- atomicModifyIORef ref (\a -> let p = f a in (fst p, p))
    p `seq` return (snd p)

It also allows f to decide what to force. For example, with this definition of atomicModifyIORef', the following program prints 10000000 and does not leak memory:

ref <- newIORef 0
replicateM_ 10000000 $
    atomicModifyIORef' ref
        (\n -> let n' = n + 1
                in n' `seq` (n', undefined))
readIORef ref >>= print

In the attached patch, I didn't implement atomicModifyIORef' this way. Instead, I made it force both the old and new values, and added a separate function called atomicWriteIORef that has the same signature as writeIORef, but is based on atomicModifyIORef.

I believe the real value of such functions is to warn programmers about these pitfalls.

Attachments (1)

0001-Add-strict-versions-of-modifyIORef-and-atomicModifyI.patch (3.5 KB) - added by joeyadams 8 years ago.
Add strict versions of modifyIORef and atomicModifyIORef

Download all attachments as: .zip

Change History (4)

Changed 8 years ago by joeyadams

Add strict versions of modifyIORef and atomicModifyIORef

comment:1 Changed 8 years ago by simonmar

difficulty: Unknown
Milestone: 7.6.1
Owner: set to simonmar

Ok, I'll apply your patch as is. There's another variant of strict atomicModifyIORef that I have been wanting to experiment with:

atomicModifyIORef' :: IORef a -> (a -> (a,b)) -> IO b
atomicModifyIORef' (IORef (STRef r#)) f = IO $ \s0 -> loop s0
  where
    loop s0 =
      case readMutVar# r# s0     of { (# s1, a #) ->
      let !(a',b) = f a in
      case casMutVar# r# a a' s1 of { (# s2, ok, _ #) ->
      case ok of { 0# -> (# s2, b #); _ -> loop s2 }}}

This is like a mini-STM transaction that evaluates f a and then attempts to compare-and-swap the result in. Since evaluating f a might take an arbitrary amount of time, the CAS might fail, and we have to loop. On the other hand, using this we can avoid ever writing thunks into the IORef, which avoids any problems with black holes.

comment:2 Changed 7 years ago by simonmar

Resolution: fixed
Status: newclosed
commit 531c09ed708b394c43775c2e07f14a39256bc498
Author: Joey Adams <joeyadams3.14159@gmail.com>
Date:   Sat Mar 10 19:15:40 2012 -0500

    Add strict versions of modifyIORef and atomicModifyIORef

comment:3 Changed 7 years ago by dons

For historical reference, see this 2009 thread on the issue (I was trying to find this thread, but google doesn't seem to like to index it).

Note: See TracTickets for help on using tickets.