PIC 板上全局变量与局部方法的 C 性能

发布于 2025-01-03 11:01:04 字数 328 浏览 3 评论 0原文

所有,

我有每秒调用多次的 C 函数,因为它们是 PIC18 板上控制循环的一部分。这些函数具有只需要方法作用域的变量,但我想知道与使用全局变量或至少更高作用域的变量相比,不断分配这些变量是否会产生任何开销。 (如果性能要求不使用方法局部变量,则考虑对结构进行 typedef'ing 以从更高的范围传递,以避免使用全局变量)

这里有一些很好的线程涵盖了这个主题,但我还没有看到明确的答案因为大多数人都宣扬最佳实践,我同意并且会遵循,只要没有性能提升,因为每一微秒都很重要。

一个线程提到使用文件范围的静态变量作为全局变量的替代品,但我不禁想知道这是否有必要。

大家觉得怎么样?

All,

I have C functions that are called many times a second as they are part of a control loop on a PIC18 board. These functions have variables that only need method scope, but I was wondering what if any overhead there was to constantly allocating these variables vs. using a global or at least higher scoped variable. (Thought of typedef'ing a struct to pass around from a higher scope to avoid global variable use if performance dictates not using method local varables)

There are some good threads on here that cover this topic, but I have yet to see a definitive answer as most preach best practices which I agree and would follow as long as there are not performance gains to be had as every microsecond counts.

One thread mentioned using file scoped static variables as a substitute for global variables, but I can't help wonder if even that is necessary.

What does everyone think?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

晨曦慕雪 2025-01-10 11:01:04

访问局部变量需要执行类似 *(SP + offset) 的操作(其中 SP 是堆栈指针),而访问静态变量(包括全局变量)则需要执行类似的操作*(地址)

据我所知,PIC 指令集的寻址模式非常有限。因此,访问全局很可能会更快,至少在第一次访问时是这样。如果编译器将计算出的地址保存在寄存器中,则后续访问可能是相同的。

正如 @unwind 在评论中所说,您应该查看编译器输出和配置文件以进行确认。如果您已经证明它在程序的运行时方面是值得的,我只会牺牲清晰度/可维护性。

Accessing a local variable requires doing something like *(SP + offset) (where SP is the stack-pointer), whereas accessing a static (which includes globals) requires something like *(address).

From what I recall, the PIC instruction set has very limited addressing modes. So it's very likely that accessing the global will be faster, at least for the first time it's accessed. Subsequent accesses may be identical if the compiler holds the computed address in a register.

As @unwind said in the comments, you should take a look at the compiler output, and profile to confirm. I would only sacrifice clarity/maintainability if you've proved that it's worthwhile in terms of the runtime of your program.

情栀口红 2025-01-10 11:01:04

虽然我没有使用过现有的所有 PIC 编译器,但有两种风格。我使用的样式通过分析程序的调用图来静态分配所有局部变量。如果实际上执行了每个可能的调用,则本地消耗的堆栈内存量将与静态分配所需的量相匹配,但有一些注意事项(描述 HiTech 的 PICC-18“标准”编译器的行为 - 其他可能会有所不同)

  1. 可变参数函数的处理方式是在调用者范围内定义局部变量存储,并将指向该存储的两字节指针传递给被调用的函数。
  2. 对于间接函数指针的每个不同签名,编译器都会在调用图中生成一个“伪函数”;调用该签名的函数的所有内容都会调用伪函数,而该伪函数会调用具有该签名并已获取其地址的每个函数。

在这种风格的编译器中,对局部变量的连续访问将与对全局变量的连续访问一样快。然而,除了显式声明为“near”的全局变量和静态变量总计不得超过 64-128 字节(因 PIC 型号不同而异)之外,每个模块的全局变量和静态变量与局部变量分开放置,并且需要银行切换指令来访问不同银行中的东西。

我没有使用过的一些编译器采用了“增强指令集”选项。此选项吞噬“附近”存储体的 96 字节(或全部,在小于 96 字节的 PIC 上),并使用它来访问相对于 FSR2 寄存器的 96 字节。如果它使用前 16 个或可能 32 个字节作为堆栈帧,这将是一个很棒的概念。使用 96 字节意味着放弃所有“附近”存储,这是一个相当严重的限制。尽管如此,使用此指令集的编译器可以访问堆栈上的局部变量,即使不是更快,也可以与全局变量一样快(不需要存储体切换)。我真的希望 Microchip 可以选择只为堆栈帧留出 16 个字节左右,从而留下有用的“公共存储区”RAM,但尽管如此,有些人对这种模式还是很幸运。

While I've not used every single PIC compiler in existence, there are two styles. The style I've used allocates all local variables statically by analyzing the program's call graph. If every possible call were in fact performed, the amount of stack memory consumed by locals would match what would be required by static allocation, with a couple of caveats (describing the behavior of HiTech's PICC-18 "standard" compiler--others may vary)

  1. Variadic functions are handled by defining local-variable storage in the scope of the caller, and passing a two-byte pointer to that storage to the function being called.
  2. For every different signature of indirect function pointer, the compiler generates a "pseudo-function" in the call graph; everything that calls a function of that signature calls the pseudo-function, and that pseudo-function calls every function with that signature that has its address taken.

In this style of compiler, consecutive accesses to local variables will be just as fast as consecutive accesses to globals. Other than global and static variables explicitly-declared as "near", however, which must total no more than 64-128 bytes (varies with different models of PIC), the global and static variables for each module are located separately from local variables, and bank-switching instructions are needed to access things in different banks.

Some compilers which I have not used employ the "enhanced instruction set" option. This option gobbles up 96 bytes of the "near" bank (or all of it, on PICs with less than 96 bytes) and uses it to access 96 bytes relative to the FSR2 register. This would be a wonderful concept if it used the first 16, or maybe 32, bytes as a stack frame. Using 96 bytes means giving up all of the "near" storage, which is a pretty severe limitation. Nonetheless, compilers which use this instruction set can access local variables on a stack just as fast, if not faster, than global variables (no bank-switch required). I really wish Microchip had an option to only set aside 16 bytes or so for the stack frame, leaving a useful amount of 'common bank' RAM, but nonetheless some people have good luck with that mode.

泛泛之交 2025-01-10 11:01:04

我想这很大程度上取决于您使用的编译器。我不了解 PIC,但我猜测一些(全部?)PIC 编译器会优化代码,以便尽可能将局部变量存储在 CPU 寄存器中。如果是这样,那么局部变量可能会和全局变量一样快。

否则,如果局部变量分配在堆栈上,则全局变量的访问速度可能会更快(请参阅奥利的回答)。

I would imagine that this depends a lot on which compiler you are using. I don't know PIC but I'm guessing some (all?) PIC compilers will optimize the code so that local variables are stored in CPU registers whenever possible. If so, then local variables will likely be equally fast as globals.

Otherwise if the local variable is allocated on the stack the global may be a bit faster to access (see Oli's answer).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文