为什么 Haskell 中基于 [Char] 的输入比基于 [Char] 的输出慢得多?
众所周知,在 Haskell 中不使用 [Char]
读取大量数据。使用 ByteString 来完成这项工作。 对此的通常解释是 Char
很大并且列表增加了它们的开销。
然而,这似乎不会对输出造成任何问题。
例如,以下程序:
main = interact $ const $ unwords $ map show $ replicate 500000 38000000
在我的计算机上运行只需 131 毫秒,而以下程序:
import Data.List
sum' :: [Int] -> Int
sum' = foldl' (+) 0
main = interact $ show . sum' . map read . words
如果将第一个程序的输出作为输入,则需要 3.38 秒!
使用String
的输入和输出性能之间存在如此差异的原因是什么?
It is a common knowledge that one does not use [Char]
to read large amounts of data in Haskell. One uses ByteString
s to do the job.
The usual explanation for this is that Char
s are large and lists add their overhead.
However, this does not seem to cause any problems with the output.
For example the following program:
main = interact $ const $ unwords $ map show $ replicate 500000 38000000
takes just 131 ms to run on my computer, while the following one:
import Data.List
sum' :: [Int] -> Int
sum' = foldl' (+) 0
main = interact $ show . sum' . map read . words
takes 3.38 seconds if fed the output of the first program as an input!
What is the reason for such a disparity between the input and output performance using String
s?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为这个问题不一定与 I/O 有关。相反,它表明
Int
的Read
实例效率相当低。首先,考虑以下仅处理惰性列表的程序。在我的机器上需要 4.1 秒(使用
-O2
编译):用
length
替换read
函数将时间降至 0.48 秒:此外,用手写版本替换
read
函数会导致 0.52 秒的时间:我猜测为什么
read
效率如此低下,因为它的实现使用了Text.ParserCombinators.ReadP
模块,对于读取单个整数的简单情况来说,这可能不是最快的选择。I don't think that this issue necessarily has to do with I/O. Rather, it demonstrates that the
Read
instance forInt
is pretty inefficient.First, consider the following program which just processes a lazy list. It takes 4.1s on my machine (compiled with
-O2
):Replacing the
read
function withlength
drops the time down to 0.48s:Furthermore, replacing the
read
function with a handwritten version results in a time of 0.52s:My guess as to why
read
is so inefficient is that its implementation uses theText.ParserCombinators.ReadP
module, which may not be the fastest choice for the simple case of reading a single integer.