数组反向 - XOR 比使用临时对象切换慢
我正在阅读一个关于反转数组的最快方法的问题(最终结果并不令人兴奋),我在此处的链接中发现了一条有趣的评论:
https://stackoverflow.com/a/1129028/857994
引用的解决方案显示了这两种可能性:
//Possibility #1
void reverse(char word[])
{
int len=strlen(word);
char temp;
for (int i=0;i<len/2;i++)
{
temp=word[i];
word[i]=word[len-i-1];
word[len-i-1]=temp;
}
}
//Possibility #2
void reverse(char word[])
{
int len=strlen(word);
for (int i=0;i<len/2;i++)
{
word[i]^=word[len-i-1];
word[len-i-1]^=word[i];
word[i]^=word[len-i-1];
}
}
并且评论指出:“使用 XOR 会慢得多而不是使用临时对象进行交换。”
没有人对此提出异议。所以,我的问题是:
- 这是真的吗?
- 为什么这是真的?
- 如果这是一个非内置类型的数组,这仍然是真的吗?
I was reading a question regarding the fastest way to reverse an array (which ended up being less than thrilling), and I came across an interesting comment located at the link here:
https://stackoverflow.com/a/1129028/857994
The solution referenced shows these two possibilities:
//Possibility #1
void reverse(char word[])
{
int len=strlen(word);
char temp;
for (int i=0;i<len/2;i++)
{
temp=word[i];
word[i]=word[len-i-1];
word[len-i-1]=temp;
}
}
//Possibility #2
void reverse(char word[])
{
int len=strlen(word);
for (int i=0;i<len/2;i++)
{
word[i]^=word[len-i-1];
word[len-i-1]^=word[i];
word[i]^=word[len-i-1];
}
}
and the comment states: "Using XOR will be far slower than swapping using a temp object."
Nobody disputed this. So, my questions are:
- Is this true?
- Why is it true?
- Would it still be true if this was an array of a non-built-in-type?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
xor 循环每行包含 2 次内存读取和 1 次内存写入,每次循环迭代总共 6 次读取和 3 次写入。此外,写入 word[i] 的第一行与从 word[i] 读取的下一行之间存在很强的依赖性。这将阻止流水线操作,或者如果两行并行执行,则第二行从 word[i] 的读取将停止,直到第一行的写入完成。第二行和第三行之间还有另一个这样的依赖关系。
在临时变量循环中,临时变量几乎肯定会存储在 CPU 寄存器中,而不是主内存中。因此 temp var 循环的总内存 I/O 计数为 2 次读取和 2 次写入。语句之间存在松散的数据流依赖关系,但它们是先读后写的,可以管道化。异或示例中的数据流依赖关系是先读后写的,在不停止管道的情况下很难做到这一点。
6 次读取 + 3 次写入与 2 次读取 + 2 次写入相比。 2+2有明显的优势。
The xor loop contains 2 memory reads and 1 memory write per line, for a total of 6 reads and 3 writes for each loop iteration. Furthermore, there is a strong dependency between the first line the writes to word[i] and the next line that reads from word[i]. This will prevent pipelining, or if the two lines execute in parallel, the second line's read from word[i] will stall until the first line's write is complete. There is another such dependency between the 2nd and 3rd lines.
In the temp var loop, the temp var will almost certainly be stored in a CPU register, not in main memory. So the total memory I/O count for the temp var loop is 2 reads and 2 writes. There are loose data flow dependencies between the statements, but they are read-before-write which can be pipelined. The data flow dependencies in the xor example are read-after-write, which are much harder to do without stalling the pipeline.
6 reads + 3 writes compared to 2 reads + 2 writes. 2 + 2 has a distinct advantage.