Opened 5 years ago

Last modified 4 years ago

#9989 new bug

GHCI is slow for precompiled code

Reported by: remdezx Owned by:
Priority: normal Milestone:
Component: Runtime System (Linker) Version: 7.8.3
Keywords: Cc: kgadek@…
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by simonpj)

Hello! We found that loading compiled and optimised accelerate code into ghci works much much slower than code run directly.

(I'm marking it as a ghc-api bug, because I noticed this behavior while using it)

Here is the example:

module Main where

import Data.Array.Accelerate as A
import Data.Array.Accelerate.CUDA as C
import Data.Time.Clock       (diffUTCTime, getCurrentTime)

main :: IO ()
main = do
    start <- getCurrentTime
    print $ C.run $ A.maximum $ A.map (+1) $ A.use (fromList (Z:.1000000) [1..1000000] :: Vector Double)
    end   <- getCurrentTime
    print $ diffUTCTime end start

When I compile it and run

    $ ghc -O2 Main.hs -threaded
    [1 of 1] Compiling Main             ( Main.hs, Main.o )
    Linking Main ...
    $ ./Main
    Array (Z) [1000001.0]
    0.162204s

It takes 0.16 s to finish When I compile it and load into ghci:

    $ ghc -O2 -dynamic -c Main.hs
    $ ghci 
    GHCi, version 7.8.3: http://www.haskell.org/ghc/  :? for help
    Loading package ghc-prim ... linking ... done.
    Loading package integer-gmp ... linking ... done.
    Loading package base ... linking ... done.
    Prelude> :l Main
    Ok, modules loaded: Main.
    Prelude Main> main
    Loading package transformers-0.3.0.0 ... linking ... done.
    Loading package array-0.5.0.0 ... linking ... done.
    Loading package deepseq-1.3.0.2 ... linking ... done.
    Loading package bytestring-0.10.4.0 ... linking ... done.
    Loading package containers-0.5.5.1 ... linking ... done.
    Loading package binary-0.7.1.0 ... linking ... done.
    Loading package pretty-1.1.1.1 ... linking ... done.
    Loading package filepath-1.3.0.2 ... linking ... done.
    Loading package old-locale-1.0.0.6 ... linking ... done.
    Loading package time-1.4.2 ... linking ... done.
    Loading package unix-2.7.0.1 ... linking ... done.
    Loading package directory-1.2.1.0 ... linking ... done.
    Loading package process-1.2.0.0 ... linking ... done.
    Loading package stm-2.4.4 ... linking ... done.
    Loading package SafeSemaphore-0.10.1 ... linking ... done.
    Loading package mtl-2.1.3.1 ... linking ... done.
    Loading package template-haskell ... linking ... done.
    Loading package fclabels-2.0.2.2 ... linking ... done.
    Loading package text-1.1.1.3 ... linking ... done.
    Loading package hashable-1.2.3.1 ... linking ... done.
    Loading package primitive-0.5.4.0 ... linking ... done.
    Loading package vector-0.10.12.2 ... linking ... done.
    Loading package hashtables-1.2.0.1 ... linking ... done.
    Loading package unordered-containers-0.2.5.1 ... linking ... done.
    Loading package accelerate-0.15.0.0 ... linking ... done.
    Loading package byteable-0.1.1 ... linking ... done.
    Loading package cryptohash-0.11.6 ... linking ... done.
    Loading package cuda-0.6.5.1 ... linking ... done.
    Loading package exception-transformers-0.3.0.4 ... linking ... done.
    Loading package exception-mtl-0.3.0.5 ... linking ... done.
    Loading package old-time-1.1.0.2 ... linking ... done.
    Loading package polyparse-1.11 ... linking ... done.
    Loading package cpphs-1.18.6 ... linking ... done.
    Loading package haskell-src-exts-1.16.0.1 ... linking ... done.
    Loading package syb-0.4.4 ... linking ... done.
    Loading package th-lift-0.7 ... linking ... done.
    Loading package safe-0.3.8 ... linking ... done.
    Loading package th-expand-syns-0.3.0.4 ... linking ... done.
    Loading package th-reify-many-0.1.2 ... linking ... done.
    Loading package th-orphans-0.8.3 ... linking ... done.
    Loading package haskell-src-meta-0.6.0.8 ... linking ... done.
    Loading package srcloc-0.4.1 ... linking ... done.
    Loading package mainland-pretty-0.2.7 ... linking ... done.
    Loading package symbol-0.2.4 ... linking ... done.
    Loading package language-c-quote-0.8.0 ... linking ... done.
    Loading package accelerate-cuda-0.15.0.0 ... linking ... done.
    Array (Z) [1000001.0]
    0.256128s

It takes 0.26 s to finish. On other computer, using criterion, we observed even a 50x difference.

Why is it working so slow, isn't ghci just to load function and simply call it?

Here is how execution time changes for different matrix sizes. For comparison we have also measured time of interpreted code (without precompiling).

    size     | compiled  | precompiled  | interpreted        
    ---------+-----------+--------------+------------
    100      | 0.054076s | 0.082686s    | 0.151857s   
    1000     | 0.054509s | 0.08305s     | 0.135452s  
    10000    | 0.055405s | 0.08469s     | 0.12632s   
    100000   | 0.057768s | 0.093011s    | 0.155359s   
    1000000  | 0.089811s | 0.251359s    | 0.202022s   
    10000000 | 0.397642s | 1.400603s    | 0.898547s   

We believe that problem lies on the side of ghci rather than accelerate, how could we confirm this?

Moreover and even more important is there any workaround for it, if we need this precompiled code running fast in production environment?

Any guidance will be appreciated.

Change History (6)

comment:1 Changed 5 years ago by simonpj

Does the same thing happen if you tell Accelerate to generate code for a CPU rather than a GPU?

comment:2 Changed 5 years ago by simonpj

Description: modified (diff)

comment:3 Changed 5 years ago by simonpj

Description: modified (diff)

comment:4 Changed 5 years ago by remdezx

Yes, it is still slower when it is run on CPU.

When I change import Data.Array.Accelerate.CUDA as C to import Data.Array.Accelerate.Interpreter as C compiled code for matrix size 10000 executes 0.007s and precompiled and run under ghci is 0.035s (5x slower). For 1000000 matrix size execution times are 0.57s (compiled) vs 0.90s (precompiled).

Last edited 5 years ago by remdezx (previous) (diff)

comment:5 Changed 5 years ago by simonpj

OK that's helpful because it removes the dependency on GPU hardware

comment:6 Changed 4 years ago by ezyang

Component: GHC APIRuntime System (Linker)
Note: See TracTickets for help on using tickets.