A Wander through GHC’s New IO library
Simon Marlow
The 100-mile view
• the API changes:– Unicode• putStr “A légpárnás hajóm tele van angolnákkal” works! (if your editor is set up right…)• locale-encoding by default, except for Handles in binary
mode (openBinaryFile, hSetBinaryMode)• changing the encoding on the fly
hSetEncoding :: Handle -> TextEncoding -> IO ()hGetEncoding :: Handle -> IO (Maybe TextEncoding)
data TextEncodinglatin1, utf8, utf16, utf32, … :: TextEncodingmkTextEncoding :: String -> IO TextEncodinglocaleEncoding :: TextEncoding
The 100-mile view (cont.)• Better newline support– teletypes needed both CR+LF to
start a new line, and we’ve been paying for it ever since.
hSetNewlineMode :: Handle -> NewlineMode -> IO ()
data Newline = LF {- “\n” –} | CRLF {- “\r\n” -}nativeNewline :: Newline
data NewlineMode = NewlineMode { inputNL :: Newline, outputNL :: Newline }
noNewlineTranslation = NewlineMode { inputNL = LF, outputNL = LF }universalNewlineMode = NewlineMode { inputNL = CRLF, outputNL = nativeNewline }nativeNewlineMode = NewlineMode { inputNL = nativeNewline, outputNL = nativeNewline }
The 10-mile view
• Unicode codecs:– built-in codecs for UTF-8, UTF-16(LE,BE), UTF-
32(LE-BE).– Other codecs use iconv on Unix systems– Built-in codecs only on Windows (no code pages)• yet…
– The pieces for building a codec are provided…
The 10-mile view
• Build your own codec: API in GHC.IO.Encoding
data BufferCodec from to state = BufferCodec { encode :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to) close :: IO () getState :: IO state setState :: state -> IO () }
type TextEncoder state = BufferCodec Char Word8 statetype TextDecoder state = BufferCodec Word8 Char state
data TextEncoding = forall dstate estate . TextEncoding { mkTextDecoder :: IO (TextDecoder dstate) mkTextEncoder :: IO (TextEncoder estate) }
Saving and restoring state is important since Handles support buffering, random access,
and changing encodings
The 1-mile view
• Make your own Handles!
– why mkFileHandle, not mkHandle?
mkFileHandle :: (IODevice dev, BufferedIO dev, Typeable dev) => dev -> FilePath -> IOMode -> Maybe TextEncoding -> NewlineMode -> IO Handle
Type class providing I/O device operations: close, seek, getSize, …
Type class providing buffered reading/writing
Typeable, in case we need to take the Handle apart again later
For error messages
ReadMode/WriteMode/…
IODevice-- | I/O operations required for implementing a 'Handle'.class IODevice a where -- | closes the device. Further operations on the device should -- produce exceptions. close :: a -> IO ()
-- | seek to the specified positing in the data. seek :: a -> SeekMode -> Integer -> IO () seek _ _ _ = ioe_unsupportedOperation
-- | return the current position in the data. tell :: a -> IO Integer tell _ = ioe_unsupportedOperation
-- | returns 'True' if the device is a terminal or console. isTerminal :: a -> IO Bool isTerminal _ = return False
… etc …
Default is for the operation to be unsupported
BufferedIOclass BufferedIO dev where newBuffer :: dev -> BufferState -> IO (Buffer Word8)
fillReadBuffer :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) fillReadBuffer0 :: dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8)
emptyWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer0 :: dev -> Buffer Word8 -> IO (Int, Buffer Word8)
Device gets to allocate the buffer. This allows the device to choose the buffer to point directly at the
data in memory, for example.
0-versions are non-blocking, non-0 versions must read or write at least one byte (but may transfer
less than the whole buffer)
RawIO-- | A low-level I/O provider where the data is bytes in memory.class RawIO a where read :: a -> Ptr Word8 -> Int -> IO Int readNonBlocking :: a -> Ptr Word8 -> Int -> IO (Maybe Int) write :: a -> Ptr Word8 -> Int -> IO () writeNonBlocking :: a -> Ptr Word8 -> Int -> IO Int
readBuf :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)
readBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8)
writeBuf :: RawIO dev => dev -> Buffer Word8 -> IO ()
writeBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)
Example: a memory-mapped Handle
• Random-access read/write doesn’t perform very well with ordinary buffered I/O. – Let’s implement a Handle backed by a memory-
mapped file– We need to
1. define our device type2. make it an instance of IODevice and BufferedIO3. provide a way to create instances
Example: memory-mapped files
1. Define our device typedata MemoryMappedFile = MemoryMappedFile { mmap_fd :: FD, mmap_addr :: !(Ptr Word8), mmap_length :: !Int, mmap_ptr :: !(IORef Int) } deriving Typeable
Ordinary file descriptor, provided by GHC.IO.FD
Address in memory where our file is mapped, and its length
The current file pointer (Handles have a built-in notion of the
“current position” that we have to emulate)
Typeable is one of the requirements for making a Handle
aside: Buffersmodule GHC.IO.Buffer ( Buffer(..), .. ) where
data Buffer e = Buffer {
bufRaw :: !(ForeignPtr e), bufState :: BufferState, -- ReadBuffer | WriteBuffer
bufSize :: !Int, -- in elements, not bytesbufL :: !Int, -- offset of first item in
the bufferbufR :: !Int -- offset of last item + 1
}
Data
bufRaw bufL bufR bufSize
Example: memory-mapped files
2. (a) make it an instance of BufferedIOinstance BufferedIO MemoryMappedFile where newBuffer m state = do fp <- newForeignPtr_ (mmap_addr m) return (emptyBuffer fp (mmap_length m) state)
fillReadBuffer m buf = do p <- readIORef (mmap_ptr m) let l = mmap_length m if (p >= l) then do return (0, buf{ bufL=p, bufR=p }) else do writeIORef (mmap_ptr m) l return (l-p, buf{ bufL=p, bufR=l })
flushWriteBuffer m buf = do writeIORef (mmap_ptr m) (bufR buf) return buf{ bufL = bufR buf }
fillReadBuffer returns the entire file!
flush is a no-op: just remember where to read
from next
Example: memory-mapped files
2. (b) make it an instance of IODeviceinstance IODevice MemoryMappedFile where close = IODevice.close . mmap_fd
seek m mode val = do let sz = mmap_length m ptr <- readIORef (mmap_ptr m) let off = case mode of AbsoluteSeek -> fromIntegral val RelativeSeek -> ptr + fromIntegral val SeekFromEnd -> sz + fromIntegral val when (off < 0 || off >= sz) $ ioe_seekOutOfRange writeIORef (mmap_ptr m) off
tell m = do o <- readIORef (mmap_ptr m); return (fromIntegral o)
getSize = return . fromIntegral . mmap_length
… etc …
Example: memory-mapped files3. provide a way to create instances
mmapFile :: FilePath -> IOMode -> Bool -> IO HandlemmapFile filepath iomode binary = do
(fd,_devtype) <- FD.openFile filepath iomode sz <- IODevice.getSize fd addr <- c_mmap nullPtr (fromIntegral sz) prot flags (FD.fdFD fd) 0 ptr <- newIORef 0
let m = MemoryMappedFile { mmap_fd = fd, mmap_addr = castPtr addr, mmap_length = fromIntegral sz, mmap_ptr = ptr } let (encoding, newline) | binary = (Nothing, noNewlineTranslation) | otherwise = (Just localeEncoding, nativeNewlineMode)
mkFileHandle m filepath iomode encoding newline
Open the file and mmap() it
Call mkFileHandle to build the Handle
Demo…$ ./Setup configureConfiguring mmap-handle-0.0...$ ./Setup buildPreprocessing library mmap-handle-0.0...Building mmap-handle-0.0...[1 of 1] Compiling System.Posix.IO.MMap ( dist/build/System/Posix/IO/MMap.hs, dist/build/System/Posix/IO/MMap.o )Registering mmap-handle-0.0...$ ./Setup register --inplace --userRegistering mmap-handle-0.0...$ ghc-pkg list --user/home/simonmar/.ghc/x86_64-linux-6.11.20090816/package.conf.d: mmap-handle-0.0
Demo…$ cat test.hsimport System.IOimport System.Posix.IO.MMapimport System.Environmentimport Data.Char
main = do [file,test] <- getArgs h <- if test == "mmap" then mmapFile file ReadWriteMode True else openBinaryFile file ReadWriteMode
sequence_ [ do hSeek h SeekFromEnd (-n) c <- hGetChar h hSeek h AbsoluteSeek n hPutChar h c | n <- [ 1..10000] ]
hClose h putStrLn "done"$ ghc test.hs --make[1 of 1] Compiling Main ( test.hs, test.o )Linking test ...
Timings…
$ time ./test /tmp/words filedone0.24s real 0.14s user 0.10s system 99% ./test /tmp/words file$ time ./test /tmp/words mmapdone0.09s real 0.09s user 0.00s system 99% ./test /tmp/words mmap$ time ./test ./words file # ./ is NFS-mounteddone10.44s real 0.20s user 0.52s system 6% ./test tmp file$ time ./test ./words mmap # ./ is NFS-mounteddone0.10s real 0.09s user 0.00s system 93% ./test tmp mmap
More examples
• A Handle that pipes output bytes to a Chan• Handles backed by Win32 HANDLEs• Handle that reads from a Bytestring/text• Handle that reads from text
The -1 mile view
• Inside the IO library– The file-descriptor functionality is cleanly
separated from the implementation of Handles:• GHC.IO.FD implements file descriptors, with
instances of IODevice and BufferedIO• GHC.IO.Handle.FD defines openFile, using FDs
as the underlying device• GHC.IO.Handle has nothing to do with FDs
Implementation of Handle
data Handle__ = forall dev enc_state dec_state . (IODevice dev, BufferedIO dev, Typeable dev) => Handle__ { haDevice :: !dev, haType :: HandleType, -- read/write/append etc. haByteBuffer :: !(IORef (Buffer Word8)), haCharBuffer :: !(IORef (Buffer CharBufElem)), haEncoder :: Maybe (TextEncoder enc_state), haDecoder :: Maybe (TextDecoder dec_state), haCodec :: Maybe TextEncoding, haInputNL :: Newline, haOutputNL :: Newline, .. some other things .. } deriving Typeable
Existential: packs up the IODevice, BufferedIO, Typeable dictionaries, and codec state is
existentially quantified
Two buffers: one for bytes, one for Chars.
Where to go from here• This is a step in the right direction, but there is still some
obvious ugliness– We haven’t changed the external API, only added to it– There should be a binary I/O layer
• hPutBuf working on Handles is wrong: binary Handles should have a different type
• in a sense, BufferedIO is a binary I/O layer: it is efficient, but inconvenient– FilePath should be an abstract type.
• On Windows, FilePath = String, but on Unix, FilePath = [Word8].– Should we rethink Handles entirely?
• OO-style layers: binary IO, buffering, encoding• Separate read Handles from write Handles?
– read/write Handles are a pain
Top Related