我在 SPOJ 上的 PRIME1 问题上做了相当糟糕的尝试。我发现使用 ByteString 确实有助于提高阅读问题文本的性能。然而,使用 ByteString 写出结果实际上比使用 Prelude 函数稍慢。我试图弄清楚我是否做错了,或者这是否是预期的。
我使用 (putStrLn.show) 和 ByteString 等价物以三种不同的方式进行了分析和计时:
- 我测试每个候选者,看看它是否
是素数。如果是这样,我将其添加到列表中
并用 (putStrLn .
显示)
- 我列出了所有素数
并使用写出列表
(putStrLn . unlines. show)
- 我列出了所有素数
并使用写出列表
map (putStrLn . show)
当您在一个函数中构建列表并在另一个函数中使用它时,我预计数字 2 和 3 的执行速度会较慢。通过在生成数字时打印数字,我可以避免为列表分配任何内存。另一方面,每次调用 putStrLn 时都会进行调用系统调用。正确的?所以我测试了一下,#1 实际上是最快的。
使用选项 #1 和 Prelude ([Char]) 函数实现了最佳性能。我预计我的最佳性能是使用 ByteString 的选项#1,但事实并非如此。我只使用了惰性字节字符串,但我认为这并不重要。会吗?
一些问题:
- 您希望 ByteStrings
写一堆表现更好
整数到标准输出?
- 我是否缺少一种方式模式
生成并写出答案
这会带来更好的结果
表现?
- 如果我只写出数字
文本,什么时候,如果有的话,有一个
使用 ByteString 有什么好处?
我的工作假设是,如果您不将它们与其他文本组合,则用 ByteString 写出 Integer 会更慢。如果将整数与 [Char] 结合使用,那么使用字节字符串会获得更好的性能。即,ByteString 重写:
putStrLn $ "the answer is: " ++ (show value)
将比上面编写的版本快得多。这是真的吗?
感谢您的阅读!
I've been making rather poor attempts at the PRIME1 problem on SPOJ. I discovered using that using ByteString really helped performance for reading in the problem text. However, using ByteString to write out the results is actually slightly slower than using Prelude functions. I'm trying to figure out if I'm doing it wrong, or if this is expected.
I've conducted profiling and timing using (putStrLn.show) and the ByteString equivalents three different ways:
- I test each candidate to see if it
is prime. If so, I add it to a list
and write it out with (putStrLn .
show)
- I make a list of all primes
and write out the list using
(putStrLn . unlines. show)
- I make a list of all primes
and write out the list using
map (putStrLn . show)
I expected numbers 2 and 3 to perform slower as you are building a list in one function and consuming it in another. By printing the numbers as I generate them, I avoid allocating any memory for the list. On the other hand, you are making a call system call with each call to putStrLn. Right? So I tested and #1 was in fact the fastest.
The best performance was achieved with option #1 and the Prelude ([Char]) functions. I expected that my best performance would be option #1 with ByteString, but this was not the case. I only used lazy ByteStrings, but I didn't think this would matter. Would it?
Some questions:
- would you expect the ByteStrings to
perform better for writing a bunch of
Integers to stdout?
- Am I missing a way pattern to
generate and write out the answers
that would lead to better
performance?
- If I am only writing out numbers as
text, when, if ever, is there a
benefit to using ByteString?
My working hypothesis is that writing out Integer's with ByteString is slower iff you aren't combining them with other text. If you are combining Integers with [Char], then you'd get better performance working with ByteStrings. I.e., the ByteString rewrite of:
putStrLn $ "the answer is: " ++ (show value)
will be much faster than the version written above. Is this true?
Thanks for reading!
发布评论
评论(2)
使用字节串进行批量输入通常会更快,因为数据很密集,从磁盘转移到内存中的数据更少。
然而,将数据写入输出略有不同。通常,您会序列化一个结构,生成许多小写入。因此,在这种情况下,密集的、批量的字节串写入对你没有多大帮助。即使是常规的
字符串
也能在增量输出中合理地工作。然而,一切并没有失去。我们可以通过在内存中有效地构建字节串来恢复快速批量写入。各种
*-builder
包都采用这种方法:而不是将值转换为大量微小的字节串,然后一次一个地写出它们时间,我们将转换流式传输到不断增长的缓冲区,然后将该缓冲区写入一大块。与字符串 IO 相比,这会大大减少 IO 开销,并提高性能(通常是显着的)。
这种方法被 Haskell 中的网络服务器或高效的 HTML 系统所采用,blaze 。
此外,即使是批量写入,性能也将取决于类型和字节串之间的任何转换函数的效率。对于
Integer
,您可以简单地将内存中的位模式复制到输出,或者通过一些低效的解码器。因此,有时您必须考虑一下所使用的编码函数的质量,而不仅仅是是否使用 Char/String 还是 bytestring IO。Doing bulk input is usually faster with bytestrings, since the data is dense, there's simply less data to shuffle from the disk into memory.
Writing data as output however, is a little different. Typically, you're serializing a structure, generating many small writes. So the dense, bulk writes of bytestrings don't help you much in that case. Even regular
Strings
will do reasonably at incremental output.However, all is not lost. We can recover fast bulk writes by efficiently building up bytestrings in memory. This approach is taken by the various
*-builder
packages:Instead of converting values to lots of tiny bytestrings, and writing them out one at a time, we stream the conversion into an ever-growing buffer, and in turn, write that buffer in one big piece. This results in a lot less IO overhead, and performance improvements (often signficant) over string IO.
This kind of approach is taken by e.g. webservers in Haskell, or the efficient HTML system, blaze.
Also, the performance, even with bulk writes, will depend on the efficiency of whatever conversion function you have between your types and bytestrings. For
Integer
, you could be simply copying the bit pattern in memory to output, or instead going through some inefficient decoder. As a result, you sometimes have to think a bit about the quality of the encoding function you're using, and not just whether to use Char/String or bytestring IO.请注意,性能并不是
ByteString
和String
之间的主要区别。前者适用于二进制数据,后者适用于 Unicode 文本。如果您有二进制数据,请使用ByteString
,如果您有 Unicode 文本,请使用 文本包。Note that performance isn't the main difference between
ByteString
andString
. The former is for binary data while the latter is for Unicode text. If you have binary data, useByteString
, if you have Unicode text, use theText
type from the text package.