PIC 板上全局变量与局部方法的 C 性能

发布于 2025-01-03 11:01:04 字数 328 浏览 7 评论 0原文

所有，

我有每秒调用多次的 C 函数，因为它们是 PIC18 板上控制循环的一部分。这些函数具有只需要方法作用域的变量，但我想知道与使用全局变量或至少更高作用域的变量相比，不断分配这些变量是否会产生任何开销。（如果性能要求不使用方法局部变量，则考虑对结构进行 typedef'ing 以从更高的范围传递，以避免使用全局变量）

这里有一些很好的线程涵盖了这个主题，但我还没有看到明确的答案因为大多数人都宣扬最佳实践，我同意并且会遵循，只要没有性能提升，因为每一微秒都很重要。

一个线程提到使用文件范围的静态变量作为全局变量的替代品，但我不禁想知道这是否有必要。

大家觉得怎么样？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

晨曦慕雪 2025-01-10 11:01:04

访问局部变量需要执行类似 *(SP + offset) 的操作（其中 SP 是堆栈指针），而访问静态变量（包括全局变量）则需要执行类似的操作*（地址）。

据我所知，PIC 指令集的寻址模式非常有限。因此，访问全局很可能会更快，至少在第一次访问时是这样。如果编译器将计算出的地址保存在寄存器中，则后续访问可能是相同的。

正如 @unwind 在评论中所说，您应该查看编译器输出和配置文件以进行确认。如果您已经证明它在程序的运行时方面是值得的，我只会牺牲清晰度/可维护性。

回复收藏 0 原文

情栀口红 2025-01-10 11:01:04

虽然我没有使用过现有的所有 PIC 编译器，但有两种风格。我使用的样式通过分析程序的调用图来静态分配所有局部变量。如果实际上执行了每个可能的调用，则本地消耗的堆栈内存量将与静态分配所需的量相匹配，但有一些注意事项（描述 HiTech 的 PICC-18“标准”编译器的行为 - 其他可能会有所不同)

可变参数函数的处理方式是在调用者范围内定义局部变量存储，并将指向该存储的两字节指针传递给被调用的函数。
对于间接函数指针的每个不同签名，编译器都会在调用图中生成一个“伪函数”；调用该签名的函数的所有内容都会调用伪函数，而该伪函数会调用具有该签名并已获取其地址的每个函数。

在这种风格的编译器中，对局部变量的连续访问将与对全局变量的连续访问一样快。然而，除了显式声明为“near”的全局变量和静态变量总计不得超过 64-128 字节（因 PIC 型号不同而异）之外，每个模块的全局变量和静态变量与局部变量分开放置，并且需要银行切换指令来访问不同银行中的东西。

我没有使用过的一些编译器采用了“增强指令集”选项。此选项吞噬“附近”存储体的 96 字节（或全部，在小于 96 字节的 PIC 上），并使用它来访问相对于 FSR2 寄存器的 96 字节。如果它使用前 16 个或可能 32 个字节作为堆栈帧，这将是一个很棒的概念。使用 96 字节意味着放弃所有“附近”存储，这是一个相当严重的限制。尽管如此，使用此指令集的编译器可以访问堆栈上的局部变量，即使不是更快，也可以与全局变量一样快（不需要存储体切换）。我真的希望 Microchip 可以选择只为堆栈帧留出 16 个字节左右，从而留下有用的“公共存储区”RAM，但尽管如此，有些人对这种模式还是很幸运。

While I've not used every single PIC compiler in existence, there are two styles. The style I've used allocates all local variables statically by analyzing the program's call graph. If every possible call were in fact performed, the amount of stack memory consumed by locals would match what would be required by static allocation, with a couple of caveats (describing the behavior of HiTech's PICC-18 "standard" compiler--others may vary)

Variadic functions are handled by defining local-variable storage in the scope of the caller, and passing a two-byte pointer to that storage to the function being called.
For every different signature of indirect function pointer, the compiler generates a "pseudo-function" in the call graph; everything that calls a function of that signature calls the pseudo-function, and that pseudo-function calls every function with that signature that has its address taken.

In this style of compiler, consecutive accesses to local variables will be just as fast as consecutive accesses to globals. Other than global and static variables explicitly-declared as "near", however, which must total no more than 64-128 bytes (varies with different models of PIC), the global and static variables for each module are located separately from local variables, and bank-switching instructions are needed to access things in different banks.

Some compilers which I have not used employ the "enhanced instruction set" option. This option gobbles up 96 bytes of the "near" bank (or all of it, on PICs with less than 96 bytes) and uses it to access 96 bytes relative to the FSR2 register. This would be a wonderful concept if it used the first 16, or maybe 32, bytes as a stack frame. Using 96 bytes means giving up all of the "near" storage, which is a pretty severe limitation. Nonetheless, compilers which use this instruction set can access local variables on a stack just as fast, if not faster, than global variables (no bank-switch required). I really wish Microchip had an option to only set aside 16 bytes or so for the stack frame, leaving a useful amount of 'common bank' RAM, but nonetheless some people have good luck with that mode.

回复收藏 0 原文