在 Javascript 中按空格分割文本文件的最快方法
我正在考虑在浏览器中进行一些文本处理,并尝试大致了解我是否会受到 CPU 限制或 I/O 限制。为了测试等式中 CPU 端的速度,我看到了在 Javascript 中分割一段文本(~8.9MB - 这是古腾堡计划的福尔摩斯重复了很多次)的速度有多快一旦它在记忆中。目前我只是在做:
pieces = theText.split(" ");
执行 100 次并取平均值。在 2011 年 Macbook Pro i5 上,Firefox 中的平均分割时间为 92.81 毫秒,Chrome 中的平均分割时间为 237.27 毫秒。因此,CPU 上的速度为 1000/92.81ms * 8.9MB = 95.8MBps,这可能比硬盘 I/O 快一点,但也快不了多少。
所以我的问题实际上是三个部分:
- 是否有 Javascript 替代品
split()
在进行简单的文本处理(例如在空格、换行符等处分割)时往往会更快? - 我在这里看到的乏善可陈的 CPU 结果可能是由于基本的字符串匹配/算法限制,还是 Javascript 执行速度很慢?
- 如果您认为 Javascript 可能是限制因素,那么您能否在任何其他编程语言的可比机器/可比文本上展示出明显更好的性能?
编辑:我也怀疑这可以通过 WebWorkers 来加速,尽管目前我主要对单线程方法感兴趣。
I'm looking at doing some text processing in the browser and am trying to get a rough idea of whether I am going to be CPU bound or I/O bound. To test the speed on the CPU side of the equation, I am seeing how quickly I can split a piece of text (~8.9MB - it's Project Gutenberg's Sherlock Holmes repeated a number of times over) in Javascript once it is in memory. At the moment I'm simply doing:
pieces = theText.split(" ");
and executing it 100 times and taking the average. On a 2011 Macbook Pro i5, the average split in Firefox takes 92.81ms and in Chrome 237.27ms. So 1000/92.81ms * 8.9MB = 95.8MBps on the CPU, which is probably a little faster than the harddisk I/O, but not by much.
So my question is really three parts:
- Are there Javascript alternatives to
split()
that tend to be faster when doing simple text processing (e.g. splitting at spaces, newlines, etc. etc.)? - Are the lackluster CPU results I'm seeing here likely due to fundamental string matching/algorithmic constraints, or is the Javascript execution just slow?
- If you think Javascript is likely the limiting factor, can you demonstrate substantially better performance on a comparable machine/comparable text in any other programming language?
Edit: I also suspect this could be sped up with WebWorkers, though for now am primarily interested in single-threaded approaches.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
据我所知,用for循环分割是在javascript中进行简单文本处理的最快方法。它比正则表达式更快,这里是 jsperf 的链接 http://jsperf .com/query-str-parsing-regex-vs-split/2
As far as i know split with for loop is the fastest way to do simple text processing in javascript. It is faster than regex, here is the link to jsperf http://jsperf.com/query-str-parsing-regex-vs-split/2