为什么 Haskell 中基于 [Char] 的输入比基于 [Char] 的输出慢得多？

发布于 2024-12-05 14:55:06 字数 523 浏览 1 评论 0原文

众所周知，在 Haskell 中不使用 [Char] 读取大量数据。使用 ByteString 来完成这项工作。对此的通常解释是 Char 很大并且列表增加了它们的开销。

然而，这似乎不会对输出造成任何问题。

例如，以下程序：

main = interact $ const $ unwords $ map show $ replicate 500000 38000000

在我的计算机上运行只需 131 毫秒，而以下程序：

import Data.List

sum' :: [Int] -> Int
sum' = foldl' (+) 0

main = interact $ show . sum' . map read . words

如果将第一个程序的输出作为输入，则需要 3.38 秒！

使用String的输入和输出性能之间存在如此差异的原因是什么？

原文

It is a common knowledge that one does not use [Char] to read large amounts of data in Haskell. One uses ByteStrings to do the job.
The usual explanation for this is that Chars are large and lists add their overhead.

However, this does not seem to cause any problems with the output.

For example the following program:

main = interact $ const $ unwords $ map show $ replicate 500000 38000000

takes just 131 ms to run on my computer, while the following one:

import Data.List

sum' :: [Int] -> Int
sum' = foldl' (+) 0

main = interact $ show . sum' . map read . words

takes 3.38 seconds if fed the output of the first program as an input!

What is the reason for such a disparity between the input and output performance using Strings?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

篱下浅笙歌 2024-12-12 14:55:06

我认为这个问题不一定与 I/O 有关。相反，它表明 Int 的 Read 实例效率相当低。

首先，考虑以下仅处理惰性列表的程序。在我的机器上需要 4.1 秒（使用 -O2 编译）：

main = print $ sum' $ map read $ words
        $ unwords $ map show $ replicate 500000 38000000

用 length 替换 read 函数将时间降至 0.48 秒：

main = print $ sum' $ map length $ words
        $ unwords $ map show $ replicate 500000 38000000

此外，用手写版本替换 read 函数会导致 0.52 秒的时间：

main = print $ sum' $ map myread $ words
        $ unwords $ map show $ replicate 500000 38000000

myread :: String -> Int
myread = loop 0
  where
    loop n [] = n
    loop n (d:ds) = let d' = fromEnum d  - fromEnum '0' :: Int
                        n' = 10 * n + d'
                    in loop n' ds

我猜测为什么 read 效率如此低下，因为它的实现使用了Text.ParserCombinators.ReadP 模块，对于读取单个整数的简单情况来说，这可能不是最快的选择。

I don't think that this issue necessarily has to do with I/O. Rather, it demonstrates that the Read instance for Int is pretty inefficient.

First, consider the following program which just processes a lazy list. It takes 4.1s on my machine (compiled with -O2):

main = print $ sum' $ map read $ words
        $ unwords $ map show $ replicate 500000 38000000

Replacing the read function with length drops the time down to 0.48s:

main = print $ sum' $ map length $ words
        $ unwords $ map show $ replicate 500000 38000000

Furthermore, replacing the read function with a handwritten version results in a time of 0.52s:

main = print $ sum' $ map myread $ words
        $ unwords $ map show $ replicate 500000 38000000

myread :: String -> Int
myread = loop 0
  where
    loop n [] = n
    loop n (d:ds) = let d' = fromEnum d  - fromEnum '0' :: Int
                        n' = 10 * n + d'
                    in loop n' ds

My guess as to why read is so inefficient is that its implementation uses the Text.ParserCombinators.ReadP module, which may not be the fastest choice for the simple case of reading a single integer.

回复收藏 0 原文

~没有更多了~