Description
This is a general performance issue where I'll discuss results and issues.
I thought I would do a very simple measurement of the standard SDL event polling loop, with no window or anything:
import SDL
import Weigh
main :: IO ()
main =
mainWith
(do setColumns [Case, Allocated, GCs]
sequence_
[ action ("pollEvents " ++ show i) (sdl i)
| i <- [10, 100, 1000, 10000, 100000]
])
sdl :: Int -> IO ()
sdl iters = do
initializeAll
let go :: Int -> IO ()
go 0 = pure ()
go i = do
_events <- pollEvents
go (i - 1)
go iters
It turns out to allocate quite a bit: 😱
Case Allocated GCs
pollEvents 10 4,632 0
pollEvents 100 38,416 0
pollEvents 1000 369,112 0
pollEvents 10000 3,680,088 3
pollEvents 100000 36,802,144 35
If instead I do this, i.e. allocate once and repeatedly write to the same pointer and peek when there's something available:
sdl :: Int -> IO ()
sdl iters = do
initializeAll
alloca
(\(e) -> do
let go :: Int -> IO ()
go 0 = pure ()
go i = do
do n <- Raw.pollEvent e
if n == 0
then return ()
else void (peek e)
go (i - 1)
go iters)
Allocations are constant, no GCs: 💪
Case Allocated GCs
pollEvents 10 1,400 0
pollEvents 100 1,400 0
pollEvents 1000 1,400 0
pollEvents 10000 1,400 0
pollEvents 100000 1,400 0
So currently the example program in the docs which uses pollEvents
in a loop is pretty inefficient. ☝️
I suspect waitEvent
is probably better, but the SDL docs don't endors it like pollEvent
:
SDL_PollEvent() is the favored way of receiving system events since it can be done from the main loop and does not suspend the main loop while waiting on an event to be posted.
But using pollEvent
in a loop is guaranteed to allocate whether or not there are any events to be processed.
I realise there are worse things in life than a bit of allocation; I just have a hobby project as part of haskell-perf called "gcless" Haskell, in which I try out little programs and see what useful toys I can make that happen to not allocate beyond a constant initial factor.
Anyway, one alternative is to tweak pollEvent
so that it checks whether there is an event before allocating:
{-# LANGUAGE LambdaCase #-}
import Foreign
import SDL
import qualified SDL.Raw as Raw
import Criterion.Main
main :: IO ()
main = do
initializeAll
defaultMain
[ bgroup
"pollEvents"
[ bench ("orig") (whnfIO orig)
, bench ("check") (whnfIO check)
]
]
check :: IO ()
check = do
ret <- Raw.pollEvent nullPtr
if ret == 0
then pure ()
else alloca
(\e ->
(do Raw.pollEvent e
peek e >>= \case
Raw.AudioDeviceEvent {} -> pure ()
_ -> pure ()))
orig :: IO ()
orig = pollEvent
No discernible speed difference:
benchmarking pollEvents/orig
time 11.43 μs (11.37 μs .. 11.49 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 11.41 μs (11.37 μs .. 11.48 μs)
std dev 167.3 ns (122.4 ns .. 274.7 ns)
variance introduced by outliers: 11% (moderately inflated)
benchmarking pollEvents/check
time 11.05 μs (11.00 μs .. 11.11 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 11.04 μs (10.99 μs .. 11.13 μs)
std dev 216.5 ns (154.8 ns .. 296.8 ns)
variance introduced by outliers: 19% (moderately inflated)
So we just avoid allocating where not necessary. 👍