比较 wc 和 Smalltalk 之间的换行计数速度
我正在比较读取文件包含多少行的性能。
我首先使用 wc 命令行工具完成:
$ time wc -l bigFile.csv
1673820 bigFile.csv
real 0m0.157s
user 0m0.124s
sys 0m0.062s
然后在干净的 Pharo Core Smalltalk 最新 1.3 中
| file lineCount |
Smalltalk garbageCollect.
( Duration milliSeconds: [ file := FileStream readOnlyFileNamed: 'bigFile.csv'.
lineCount := 0.
[ file atEnd ] whileFalse: [
file nextLine.
lineCount := lineCount + 1 ].
file close.
lineCount. ] timeToRun ) asSeconds.
15
如何加速 Smalltalk 代码,使其比 wc 性能更快或更接近?
I am comparing performance for reading how many lines contains a file.
I did it first using the wc command line tool:
$ time wc -l bigFile.csv
1673820 bigFile.csv
real 0m0.157s
user 0m0.124s
sys 0m0.062s
and then in a clean Pharo Core Smalltalk latest 1.3
| file lineCount |
Smalltalk garbageCollect.
( Duration milliSeconds: [ file := FileStream readOnlyFileNamed: 'bigFile.csv'.
lineCount := 0.
[ file atEnd ] whileFalse: [
file nextLine.
lineCount := lineCount + 1 ].
file close.
lineCount. ] timeToRun ) asSeconds.
15
How can I speed up the Smalltalk code to be faster or closer than the wc performance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
上面的报告~207毫秒,其中时间报告:
我在开玩笑,但也是认真的。无需重新发明轮子。 FFI、OSProcess、Zinc 等提供了充足的机会来利用经过数十年考验的 UNIX 实用程序等工具。
如果您的问题确实更多关于 Smalltalk 本身,那么开始将是:
That will get you down to 2.5秒:
一个更干净、但长 1/2 秒的操作将是:
file contentoccurrencesOf: 10.
当然,如果需要更好的性能,并且您不想使用FFI/OSProcess,然后您将编写一个插件。
The above reports ~207 milliseconds, where time reported:
I'm kidding, but also serious. No need to reinvent the wheel. FFI, OSProcess, Zinc, etc. provide ample opportunity to utilize things like UNIX utilities that have been battle-tested over decades.
If your question was really more about Smalltalk itself, a start would be:
That will get you down to 2.5 seconds:
A cleaner, but 1/2 second longer op would be:
file contents occurrencesOf: 10.
Of course, if better performance is needed, and you don't want to use FFI/OSProcess, you would then write a plugin.
如果您有能力读取内存中的整个文件,那么最简单的代码是
这将处理 LF(Linux)、CR(旧 Mac)、CR-LF(您能想到的)的动物园。
Sean 的代码仅处理 LF,成本大致相同。
我想说,对于此类基本操作,Smalltalk 与 C 相比预计会增加 10 倍,因此我怀疑如果不添加自己的原语,您是否会获得更高的效率。
If you can afford reading the whole file in memory, then the simplest code is
This will handle the zoo of LF (Linux), CR (Old Mac), CR-LF (you name it).
The code from Sean only handles LF, for approximately the same cost.
I'd say a factor 10 for Smalltalk vs C is expected for such basic operations, so I doubt you get much more efficiency without adding your own primitives.