使用 64 位类型?
我正在为编译器编写一些哈希函数,并且经常使用 __int64 数据类型。该编译器旨在支持(到目前为止)不同操作系统的支持。我知道 __int64
是一种可以由大多数主要 C++ 编译器针对我的目标系统进行编译的类型,因此这不是问题。我正在使用散列函数来使大字符串变得更小并更快地进行比较,它们在 64 位操作系统上发挥了神奇作用;但是 32 位操作系统的性能是否会大幅下降以抵消其带来的好处?我可以使用 32 位整数,但这会大大降低哈希函数的有效性。
编辑: 这是自定义代码并且非常简单。第一个哈希函数从 12 个字母数字(包括下划线)字符生成唯一的 64 位 int。然后,一个类通过创建 64 位哈希值的地址链接列表来处理超过 12 个字符的哈希值,并重载比较运算符。重载的比较被短路并沿着地址链表进行比较。我在我的机器上运行了测试,以比较随机生成大型哈希(100 - 300 个字符)与其自身(最坏情况情况)的速度,并且事实证明它比字符串比较更快。为了更好地模拟生成哈希值的开销,我还对预先生成的大型哈希值与它们本身进行了比较测试。这一切都是在代码优化关闭的情况下运行的。约 10 亿次哈希比较与约 10 亿次字符串比较,哈希花费了大约 16% 的时间。但这都是在 64 位环境中进行的。我没有 32 位机器来运行测试
I am writing some hash functions for a compiler and I use the __int64
datatype frequently. The compiler is intended to be supported (and so far is) on different OS's. I know that __int64
is a type that can be compiled by most major C++ compilers for my target systems so that's not the problem. I am using hash functions to make large character strings smaller and quicker to compare and they work wonders on 64-bit capable OS's; but would there be a large enough performance decrease on 32 bit OS's to cancel out the benefits? I could use 32 bit integers but then it would greatly lessen the effectiveness of the hash functions.
Edit:
It is custom code and very simple. The first hash function generates a unique 64-bit int from 12 alphanumeric (including underscore) characters. Then a class handles hashes over 12 characters by creating address-linked lists of 64bit hashes and overloads the comparison operators. The overloaded compares are short circuited and compare down the address-linked list. I've ran tests on my machine to compare speed of randomly generate large hashes (100 - 300 characters) compared to themselves (worst-case senario) and it proved to be faster than string compares. In order to better simulate the overhead of generating hashes, I've also ran compare tests of pre-generated large hashes compares against them selves. This is all running with code optimization turned off. With ~1 billion hash compares vs. ~1 billion string compares, the hash took around 16% of the time. This was all in a 64 environment though. I don't have a 32-bit machine to run tests with
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
64 位大小的整数在 32 位 x86 架构上根本没有慢多少。显然,它们不如 32 位整数快,但也不是特别慢。无论 x86 还是 x64,使用 64 位 int 进行哈希值都不是鲁莽的。与一些不需要的动态分配或失败的算法相比,额外的开销可能很小。
64bit sized integers aren't substantially slower at all on a 32bit x86 architecture. They're not as fast as 32bit ints, obviously, but aren't notably slower. It's not at all reckless to use a 64bit int for hashes regardless of x86 or x64. The additional overhead will likely be minimal compared to say, a couple of unneeded dynamic allocations or failed algorithms.
我不认为比较四个 32 位变量会比比较两个 64 位变量更快,因为我猜编译器将生成最快的代码:如果您的处理器不支持 64 位操作,您的编译器将生成分两步进行比较的代码,就像您手动进行的操作一样。
这当然取决于您的编译器。
无论如何,还有其他工具可以使您的比较速度更快,但并非随处可用,例如矢量运算(由 SSE 扩展提供)允许一次比较甚至 8*4 字节。
如果您需要尽可能优化代码,我建议您添加一些预处理器指令,以便仅在系统支持时才启用优化。
I don't think that comparing four 32-bit variables will be faster than comparing two 64-bit variables, since I guess the compiler will generate the fastest code: if your processor doesn't support 64-bit operations, your compiler will generate code that compares it in two steps, just like you would do by hand.
This of course depends on your compiler.
Anyway, there are other tools that will make your comparisons even faster, but which are not available everywhere, for example vectorial operations (provided by SSE extensions) that allow to compare even 8*4 bytes at once.
If you need to optimize your code as much as possible I'd suggest you to add some preprocessor directives in order to enable optimizations only when the system supports them.
您确定这会大大降低哈希函数的有效性吗?你进行过测试吗?当然,如果 (i) 散列的项数明显多于 2^16 并且 (ii) 计算 64 位散列值成本较低,则 64 位散列值比 32 位值更好。对于您的情况,(i) 或 (ii)(或两者)哪一个是正确的?如果性能很重要,您可能需要根据底层操作系统使用不同的哈希函数。否则我会说:写一个32位版本,再写一个64位版本;在 64 位系统和 32 位系统上尝试它们;然后你就会知道是否值得费尽心思。
Are you sure it would greatly lessen the effectiveness of the hash function? Have you run tests? Certainly 64 bits is a better hash than 32 bits if (i) the number of items hashed is significantly more than 2^16 and (ii) computing the 64-bit hash is cheap. Which of (i) or (ii) (or both) is true in your case? If performance is important, you might want to use different hash functions depending on the underlying operating system. Otherwise, I would say: write a 32-bit version, and a 64-bit version; try them both out on a 64-bit system, and a 32-bit system; and you'll see whether it's worth busting a gut over.
我使用的所有哈希函数都返回字节数组(uchar)中的值,以避免出现问题。
All hash function that I've used return the value in an array of bytes (uchar) to avoid your problem.