强制A在C中的128位登记册中进行的比较
我正在使用bsearch(数组,数组,num_arrays,16,compare_func)进行二进制搜索
,并且
int compare(const void *p1, const void *p2)
{
return memcmp(p1, p2, 16); // unsigned char arrays[][16]
}
由于它是16个字节,它将适合单个128位寄存器。
如何修改此C函数以强制使用128位寄存器CPU指令进行比较?应该更快。
链接的问题:比较x86--128位无签名整数的比较32组装,是什么128位至512位寄存器用于?,但没有直接回答。
I'm doing a binary search with bsearch(array, arrays, NUM_ARRAYS, 16, compare_func)
and
int compare(const void *p1, const void *p2)
{
return memcmp(p1, p2, 16); // unsigned char arrays[][16]
}
Since it is 16 bytes, it would fit in a single 128-bit register.
How to modify this C function to force the comparison to be done with a 128-bit register CPU instruction? It should be much faster.
Linked questions: Comparison of 128 bit unsigned integers in x86-32 assembly, What are the 128-bit to 512-bit registers used for? but it doesn't answer directly.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果数字存储在大元素顺序中,并且指针在16个字节边界上对齐,则与
memcmp
的比较为16个字节边界。它是否更有效将取决于编译器和优化设置。这是一个修改后的功能:
问题是您的目标系统可能使用 Little Endian 顺序(例如:x86 cpus)。如果您的目标是在数组数组中找到数组,那么只要使用相同的比较对数组进行排序,您仍然可以使用此技巧。
使用
bsearch
需要一个功能指针,该函数指针返回等于0
对比较均等的元素的签名值,如果元素指向p1 p1
小于p2
指向的一个,否则为正值。这种方法的另一个问题是类型的puning和对齐问题,这些问题产生了不确定的行为。编写在工会数组中运行的二进制搜索功能并使用单个迭代的单个比较来定位匹配条目将更加安全,更有效。必须对此数组进行排序,并且可以使用
QSort()
使用compare128()
函数对其进行排序。以下是一个示例:
在没有128位整数支持的平台上,您可以使用以下方式:
If numbers are stored in big endian order and the pointers are aligned on 16 byte boundaries, the comparison as unsigned 128 bit values will produce the same result as
memcmp
. Whether it will be more efficient will depend on the compiler and optimisation settings.Here is a modified function:
The problem is your target system likely uses little endian order (eg: x86 CPUs). If your goal is to find the array in an array of arrays, you could still use this trick as long as the array is sorted using the same comparison.
Using
bsearch
requires a function pointer that returns a signed value equal to0
for elements that compare equal, is negative if the element pointed to byp1
is less than the one pointed to byp2
and a positive value otherwise. Another problem with this approach is type punning and alignment issues which produce undefined behavior.It would be safer and more efficient to write a binary search function that operates on an array of unions and uses a single comparison per iteration to locate the matching entry. This array must be sorted and sorting it can be performed using
qsort()
with thecompare128()
function.Here is an example:
On platforms without 128-bit integer support, you can use this: