通过指针算术访问数组值与 C 中的下标访问数组值
我一直读到,在 C 语言中,使用指针算术通常比数组访问的下标更快。 即使对于现代(据称是优化的)编译器也是如此吗?
如果是这样,当我开始从学习 C 转向 Objective-C 和 Cocoa?
在 C 和 Objective-C 中,哪种是数组访问的首选编码风格? (由各自语言的专业人士认为)哪一个更清晰、更“正确”(因为缺乏更好的术语)?
I keep reading that, in C, using pointer arithmetic is generally faster than subscripting for array access. Is this true even with modern (supposedly-optimizing) compilers?
If so, is this still the case as I begin to move away from learning C into Objective-C and Cocoa on Macs?
Which is the preferred coding style for array access, in both C and Objective-C? Which is considered (by professionals of their respective languages) more legible, more "correct" (for lack of a better term)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
,无论哪种方式都是相同的操作。下标是将(元素大小*索引)添加到数组的起始地址的语法糖。
也就是说,当迭代数组中的元素时,获取指向第一个元素的指针并增加每次通过循环通常会比每次从循环变量计算当前元素的位置稍快一些(尽管这在现实应用程序中并不重要。首先检查您的算法,过早优化是根源。所有邪恶等等)
Nah. It's the same operation either way. Subscripting is syntactic sugar for adding (element size * index) to the array's start address.
That said, when iterating over the elements in an array, taking a pointer to the first element and increasing it each time through the loop will usually be slightly faster than calculating the current element's position from the loop variable each time. (Though it is unusual for this to matter much in a real life application. Examine your algorithm first, premature optimisation is the root of all evil, etc etc)
这可能有点偏离主题(抱歉),因为它没有回答您有关执行速度的问题,但您应该考虑到过早的优化是万恶之源 (Knuth)。 在我看来,特别是当仍在(重新)学习这门语言时,无论如何首先要以最容易阅读的方式编写它。
然后,如果您的程序运行正确,请考虑优化速度。
无论如何,大多数时候你的编码速度都会足够快。
This might be a bit off topic (sorry) because it doesn't answer your question regarding execution speed, but you should consider that premature optimization is the root of all evil (Knuth). In my opinion, specially when still (re)learning the language, by all means write it the way it is easiest to read first.
Then, if your program runs correct, consider optimizing speed.
Most of the time you code will be fast enough anyway.
Mecki 有一个很好的解释。 根据我的经验,索引与指针最重要的事情之一是循环中的其他代码。 示例:
在基于 Core 2 的快速系统(g++ 4.1.2、x64)上,时序如下:
有时索引更快,有时指针算术更快。 这取决于 CPU 和编译器如何管道化循环执行。
Mecki has a great explanation. From my experience, one of the things that often matters with indexing vs. pointers is what other code sits in the loop. Example:
On a fast Core 2-based system (g++ 4.1.2, x64), here's the timing:
Sometimes indexing is faster, sometimes pointer arithmetic is. It depends on the how the CPU and compiler are able to pipeline the loop execution.
如果您正在处理数组类型数据,我会说使用下标会使代码更具可读性。 在今天的机器上(特别是对于像这样简单的东西),可读的代码更为重要。
现在,如果您正在显式处理您 malloc() 的一大块数据,并且您想要获取该数据内的指针,例如音频文件头内的 20 个字节,那么我认为地址算术更清楚地表达了您的意思试图做。
我不确定这方面的编译器优化,但即使下标速度较慢,最多也只会慢几个时钟周期。 当您可以从清晰的思路中获得更多收益时,这几乎没有什么意义。编辑:根据其他一些回复,下标只是一个句法元素,对性能没有影响,就像我想象的那样。 在这种情况下,一定要使用您试图通过指针指向的块内的访问数据来表达的任何上下文。
If you're dealing with array-type data, I'd say using subscripts makes the code more readable. On today's machines (especially for something simple like this), readable code is more important.
Now if you're dealing explicitly with a chunk of data you malloc()'d and you want to get a pointer inside that data, say 20 bytes inside a audio file header, then I think address arithmetic more clearly expresses what you're trying to do.
I'm not sure about compiler optimizations in this regard, but even if subscripting is slower it's only slower by maybe a few clock cycles at the most. That's hardly anything when you can gain so much more from the clarity of your train of thought.EDIT: According to some of these other responses, subscripting is just a syntacitic element and has no effect on performance like I figured. In that case, definitely go with whatever context you're trying to express through access data inside the block pointed to by the pointer.
请记住,即使在使用超标量 cpu 等查看具有
这不仅仅是计算机器指令,甚至不仅仅是计算时钟周期。
在真正需要的情况下进行测量似乎更容易。 即使计算给定程序的正确周期计数并非不可能(我们必须在大学里这样做),但这一点也不有趣,而且很难做到正确。
旁注:在多线程/多处理器环境中正确测量也很困难。
Please keep in mind that execution speed is hard to predict even when looking at the machine code with superscalar cpus and the like with
It's not just counting machine instructions and not even only counting clock cylces.
Seems easier just to measure in cases where really necessary. Even if it's not impossible to calculate the correct cycle count for a given program (we had to do it in university) but it's hardly fun and hard to get right.
Sidenote: Measuring correctly is also hard in multithreaded / mulit-processor environments.
C 标准没有说哪个更快。 可观察的行为是相同的,由编译器以任何它想要的方式实现它。 通常它甚至根本不会读取内存。
一般来说,除非您指定编译器、版本、体系结构和编译选项,否则您无法说出哪个“更快”。 即便如此,优化仍将取决于周围的环境。
因此,一般建议是使用任何可以提供更清晰和更简单的代码的东西。 使用 array[ i ] 使某些工具能够尝试查找索引越界条件,因此如果您使用数组,最好将它们视为数组。
如果它很关键 - 请查看编译器生成的汇编程序。 但请记住,当您更改它周围的代码时,它可能会发生变化。
The C standard doesn't say which is faster. On the observable behavior is same and it is up to compiler to implement it in any way it wants. More often than not it won't even read memory at all.
In general, you have no way to say which is "faster" unless you specify a compiler, version, architecture, and compile options. Even then, optimization will depend on the surrounding context.
So the general advice is to use whatever gives clearer and simpler code. Using array[ i ] gives some tools ability to try and find index-out-of-bound conditions, so if you are using arrays, it's better to just treat them as such.
If it is critical - look into assembler that you compiler generates. But keep in mind it may change as you change the code that surrounds it.
不,使用指针算术并不更快,而且很可能更慢,因为优化编译器可能会使用 Intel 处理器上的 LEA(加载有效地址)等指令或其他处理器上的类似指令来进行指针算术,这比 add 或 add/mul 更快。 它的优点是可以同时做几件事并且不影响标志,并且还需要一个周期来计算。 顺便说一句,以下内容来自 GCC 手册。 因此
-Os
并不是主要针对速度进行优化。我也完全同意themarko的观点。 首先尝试编写干净、可读和可重用的代码,然后考虑优化并使用一些分析工具来查找瓶颈。 大多数时候,性能问题与 I/O 相关,或者是一些糟糕的算法,或者是一些您必须找出的错误。 Knuth 就是这个人;-)
我刚刚想到,你会用一个结构数组。 如果你想进行指针运算,那么你绝对应该对结构体的每个成员进行运算。 听起来是不是太过分了? 是的,这当然是矫枉过正,而且还为掩盖错误打开了大门。
No, using pointer arithmetic is not faster and most probably slower, because an optimizing compiler may use instructions like LEA (Load Effective Address) on Intel processors or similar on other processors for pointer arithmetic which is faster than add or add/mul. It has the advantage of doing several things at once and NOT effecting the flags, and it also takes one cycle to compute. BTW, the below is from the GCC manual. So
-Os
does not optimize primarily for speed.I also completely agree with themarko. First try to write clean, readable and reusable code and then think about optimization and use some profiling tools to find the bottleneck. Most of the time the performance problem is I/O related or some bad algorithm or some bug that you have to hunt down. Knuth is the man ;-)
It just occurred to me that what will you do it with a structure array. If you want to do pointer arithmetic, then you definitely should do it for each member of the struct. Does it sound like overkill? Yes, of course it is overkill and also it opens a wide door to obscure bugs.
这不是真的。 它与下标运算符一样快。 在 Objective-C 中,您可以像 C 和面向对象风格一样使用数组,其中面向对象风格要慢得多,因为由于调用的动态性质,它在每次调用中都会进行一些操作。
It's not true. It's exactly as fast as with subscript operators. In Objective-C, you can use arrays like in C and in object-oriented style where object-oriented style is a lot slower, because it makes some operations in every call due to dynamic nature of calling.
速度上不太可能有任何差异。
使用数组运算符 [] 可能是首选,因为在 C++ 中,您可以对其他容器(例如向量)使用相同的语法。
It's unlikely that there will be any difference in speed.
Using the array operator [] is probably preferred, as in C++ you can use the same syntax with other containers (e.g. vector).
我为多个 AAA 游戏进行了 10 年的 C++/汇编优化工作,我可以说,在我所使用的特定平台/编译器上,指针算术产生了相当大的差异。
作为一个正确看待事物的例子,通过用指针算术替换所有数组访问,我能够在粒子生成器中建立一个非常紧密的循环,速度提高了 40%,这让我的同事们完全难以置信。 我当时从一位老师那里听说这是一个好技巧,但我认为它不会对我们今天拥有的编译器/CPU 产生任何影响。 我错了;)
必须指出的是,许多控制台 ARM 处理器不支持具有现代 CISC CPU 的所有可爱功能,但编译器有时有点不稳定。
I've worked on C++/assembly optimization for several AAA titles for 10 years and I can say that on the particular platforms/compiler I've worked on, pointer arithmetic made a quite measurable difference.
As an example to put things in perspective, I was able to make a really tight loop in our particle generator 40% faster by replacing all array access by pointer arithmetic to the complete disbelief of my coworkers. I'd heard of it from one of my teachers as a good trick back in the day, but I assumed there would be no way it'd make a difference with the compilers/CPU we have today. I was wrong ;)
It must be pointed out that many of the console ARM processors don't have all the cute features of modern CISC CPUs and the compiler were a bit shaky sometimes.
您需要了解这种说法背后的原因。 您是否曾经问过自己为什么它更快? 让我们比较一些代码:
它们都是零,真是令人惊讶:-P 问题是,
a[i]
实际上在低级机器代码中意味着什么? 意思是取得
a
在内存中的地址。将
i
乘以a
单个项目的大小添加到该地址(int 通常是四个字节)。从该地址获取值。
因此,每次从
a
获取值时,a
的基地址都会添加到i
乘以 4 的结果中。 如果您只是取消引用指针,则不需要执行步骤 1. 和 2.,只需执行步骤 3。请考虑下面的代码。
这段代码可能更快...但即使是这样,差异也很小。 为什么它可能会更快? “*b”与上面的步骤3相同。 但是,“b++”与步骤 1 和步骤 2 不同。“b++”会将指针增加 4。
好的,但是为什么它会更快呢? 因为向指针添加 4 比将
i
乘以 4 然后将其添加到指针要快。 在这两种情况下,您都会进行加法运算,但在第二种情况下,您不会进行乘法运算(您可以避免一次乘法运算所需的 CPU 时间)。 考虑到现代 CPU 的速度,即使数组有 1 个 mio 元素,我想知道您是否真的可以对差异进行基准测试。现代编译器是否可以将任一编译器优化得同样快,您可以通过查看它生成的汇编输出来检查。 您可以通过将“-S”选项(大写 S)传递给 GCC 来完成此操作。
这是第一个 C 代码的代码(已使用优化级别
-Os
,这意味着针对代码大小和速度进行优化,但不要进行会显着增加代码大小的速度优化,这与不同-O2
与-O3
非常不同):与第二个代码相同:
嗯,它是不同的,这是肯定的。 104 和 108 的数字差异来自于变量 b(在第一个代码中,堆栈上少了一个变量,现在我们多了一个,改变了堆栈地址)。 for 循环中的真正代码差异
与
实际上对我来说,看起来第一种方法更快(!),因为它发出一个 CPU 机器代码来执行所有工作(CPU 确实这一切都是为了我们),而不是有两个机器代码。 另一方面,下面的两个汇编命令的运行时间可能比上面的命令要短。
作为结束语,我想说,根据您的编译器和 CPU 功能(CPU 提供哪些命令以何种方式访问内存),结果可能是任一方式。 任一者都可能更快/更慢。 除非您将自己严格限制为一种编译器(也意味着一种版本)和一种特定的 CPU,否则您不能肯定地说。 由于 CPU 可以在单个汇编命令中执行越来越多的操作(很久以前,编译器实际上必须手动获取地址,将
i
乘以四,然后将两者相加,然后再获取值),使用的语句很久以前的绝对真理现在变得越来越值得怀疑。 还有谁知道CPU内部是如何工作的? 上面我将一个汇编指令与另外两个指令进行了比较。我可以看到指令的数量不同,并且指令所需的时间也可能不同。 此外,这些指令在其机器表示中需要多少内存(毕竟它们需要从内存传输到 CPU 缓存)也是不同的。 然而,现代 CPU 并不按照您输入的方式执行指令。 他们将大指令(通常称为 CISC)拆分为小子指令(通常称为 RISC),这也使他们能够更好地优化内部程序流程以提高速度。 事实上,第一条指令和下面的另外两条指令可能会产生相同的子指令集,在这种情况下,没有任何可测量的速度差异。
对于 Objective-C,它只是带有扩展的 C。 因此,适用于 C 的所有内容也适用于 Objective-C,就指针和数组而言也是如此。 另一方面,如果您使用对象(例如,
NSArray
或NSMutableArray
),那么这是一个完全不同的野兽。 但是,在这种情况下,无论如何您都必须使用方法访问这些数组,没有指针/数组访问可供选择。You need to understand the reason behind this claim. Have you ever questioned yourself why it is faster? Let's compare some code:
They are all zero, what a surprise :-P The question is, what means
a[i]
actually in low level machine code? It meansTake the address of
a
in memory.Add
i
times the size of a single item ofa
to that address (int usually is four bytes).Fetch the value from that address.
So each time you fetch a value from
a
, the base address ofa
is added to the result of the multiplication ofi
by four. If you just dereference a pointer, step 1. and 2. don't need to be performed, only step 3.Consider the code below.
This code might be faster... but even if it is, the difference is tiny. Why might it be faster? "*b" is the same as step 3. of above. However, "b++" is not the same as step 1. and step 2. "b++" will increase the pointer by 4.
Okay, but why might it be faster? Because adding four to a pointer is faster than multiplying
i
by four and adding that to a pointer. You have an addition in either case, but in the second one, you have no multiplication (you avoid the CPU time needed for one multiplication). Considering the speed of modern CPUs, even if the array was 1 mio elements, I wonder if you could really benchmark a difference, though.That a modern compiler can optimize either one to be equally fast is something you can check by looking at the assembly output it produces. You do so by passing the "-S" option (capital S) to GCC.
Here's the code of first C code (optimization level
-Os
has been used, which means optimize for code size and speed, but don't do speed optimizations that will increase code size noticeably, unlike-O2
and much unlike-O3
):Same with the second code:
Well, it's different, that's for sure. The 104 and 108 number difference comes of the variable
b
(in the first code there was one variable less on stack, now we have one more, changing stack addresses). The real code difference in thefor
loop iscompared to
Actually to me it rather looks like the first approach is faster(!), since it issues one CPU machine code to perform all the work (the CPU does it all for us), instead of having two machine codes. On the other hand, the two assembly commands below might have a lower runtime altogether than the one above.
As a closing word, I'd say depending on your compiler and the CPU capabilities (what commands CPUs offer to access memory in what way), the result might be either way. Either one might be faster/slower. You cannot say for sure unless you limit yourself exactly to one compiler (meaning also one version) and one specific CPU. As CPUs can do more and more in a single assembly command (ages ago, a compiler really had to manually fetch the address, multiply
i
by four and add both together before fetching the value), statements that used to be an absolute truth ages ago are nowadays more and more questionable. Also who knows how CPUs work internally? Above I compare one assembly instructions to two other ones.I can see that the number of instructions is different and the time such an instruction needs can be different as well. Also how much memory these instructions needs in their machine presentation (they need to be transferred from memory to CPU cache after all) is different. However modern CPUs don't execute instructions the way you feed them. They split big instructions (often referred to as CISC) into small sub-instructions (often referred to as RISC), which also allows them to better optimize program flow for speed internally. In fact, the first, single instruction and the two other instructions below might result in the same set of sub-instructions, in which case there is no measurable speed difference whatsoever.
Regarding Objective-C, it is just C with extensions. So everything that holds true for C will hold true for Objective-C as well in terms of pointers and arrays. If you use Objects on the other hand (for example, an
NSArray
orNSMutableArray
), this is a completely different beast. However in that case you must access these arrays with methods anyway, there is no pointer/array access to choose from.