.NET 中的字符串比较与数组比较
我有一段对速度至关重要的代码,其中涉及 2 个 4 字节数组的比较。我一直在努力找出实现这一目标的最快方法,并查看了 this 。
使用 pinvoke 和 memcmp 进行 100,000,000 次比较大约需要 9.5 秒,使用上述链接中发布的 UnsafeCompare 方法大约需要 3.5 秒。
如果设置 2 4 个字符串并使用 s1 == s2 进行比较,则需要大约 0.5 秒。如果我使用 string.Compare(s1, s2) 大约需要 12 秒。
有什么方法可以让我的字节数组比较来比较执行 s1 == s2 的速度吗?如果没有,我做类似下面的事情(基本上将字节数组存储为字符串)是否会出现任何问题?
string s1 = Convert.ToChar(1).ToString() + Convert.ToChar(2).ToString() + Convert.ToChar(3).ToString() + Convert.ToChar(4).ToString();
string s2 = Convert.ToChar(1).ToString() + Convert.ToChar(2).ToString() + Convert.ToChar(3).ToString() + Convert.ToChar(4).ToString();
if (s1 == s2)
.....
希望有人能帮助我解决这个问题。谢谢!
I've got a speed critical piece of code which involves the comparison of 2 4 byte arrays. I've been trying to work out the fastest way to achieve this, and have looked at this.
Doing 100,000,000 comparisons using pinvoke and memcmp takes ~9.5 seconds, using the UnsafeCompare method posted in the above link takes ~3.5 seconds.
If set 2 4 character strings and compare those using s1 == s2 it takes ~0.5 seconds. If I use string.Compare(s1, s2) it takes about ~12 seconds.
Is there some way I can get my byte array comparisons to compare is speed to doing s1 == s2? And if not, could there be any problems with me doing something like below, basically storing my byte arrays as strings?
string s1 = Convert.ToChar(1).ToString() + Convert.ToChar(2).ToString() + Convert.ToChar(3).ToString() + Convert.ToChar(4).ToString();
string s2 = Convert.ToChar(1).ToString() + Convert.ToChar(2).ToString() + Convert.ToChar(3).ToString() + Convert.ToChar(4).ToString();
if (s1 == s2)
.....
Hoping someone can help me out with this. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
还没有尝试过这个速度,但是对长度进行硬编码并进行如下比较怎么样:
Haven't tried out this for speed, but what about hard-coding the length and doing a comparison like this:
我建议执行以下步骤:
逐字节比较两个 4 字节数组,即
a[0] == b[1] && ……&& a[3] == b[3]
。这将比任何对memcmp
等的调用快得多。 JIT 编译器很可能会将其编译为高效(且内联)的指令序列。您只比较四个字节,您不能指望设计用于比较任意长内存块的算法能够更好地执行。考虑将数据存储为 32 位整数而不是 4 字节数组。这将提供另一个性能提升,因为比较 (
a == b
) 将被转换为单个 32 位比较指令。尝试重新思考您的算法 - 真的有必要执行 1 亿次比较吗?没有什么办法可以降低算法的时间复杂度吗? 这将会带来相当大的性能提升。
然而,如果不了解更广泛的背景,就很难推荐任何更好和具体的优化。
I'd recommend these steps:
Compare the two 4-byte arrays byte-by-byte, i.e.
a[0] == b[1] && … && a[3] == b[3]
. This will be much faster than any calls tomemcmp
and alikes. Most likely, the JIT compiler will compile this into an efficient (and inline) sequence of instructions. You are comparing only four bytes, you cannot expect an algorithm designed for comparing arbitrarily long memory chunks to perform better.Think about storing the data as 32-bit integers instead of 4-byte arrays. That will provide another performance gain, because the comparison (
a == b
) will be translated into a single 32-bit comparison instruction.Try to rethink your algorithm — is it really necessary to perform 100 million comparisons? Aren't there any options to reduce the time complexity of the algorithm? That would yield a considerable performance boost.
However, without knowing a broader context it's hard to recommend any better and specific optimizations.