当前位置：文江博客话题详情

optimization embedded c++

在 C++ 中优化空间而不是速度；

发布于 2024-08-18 08:17:16 字数 510 浏览 9 评论 0原文

当提到“优化”时，人们往往会想到“速度”。但是对于速度并不是那么重要但内存是主要限制的嵌入式系统呢？有哪些准则、技术和技巧可用于削减 ROM 和 RAM 中额外的千字节？如何通过“分析”代码来查看内存膨胀的位置？

PS One 可能会说，“过早”优化嵌入式系统中的空间并不是那么邪恶，因为您为自己留下了更多的数据存储和功能扩展的空间。它还可以让您降低硬件生产成本，因为您的代码可以在较小的 ROM/RAM 上运行。

PPS 也欢迎参考文章和书籍！

PPPS 这些问题密切相关：404615、1561629

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（16）

风筝在阴天搁浅。 2024-08-25 08:17:16

我在极其受限的嵌入式内存环境中获得的经验：

使用固定大小的缓冲区。不要使用指针或动态分配，因为它们的开销太大。
使用可用的最小 int 数据类型。
永远不要使用递归。始终使用循环。
不要传递大量函数参数。请改用全局变量。 :)

回复收藏 0 原文

溺孤伤于心 2024-08-25 08:17:16

您可以采取很多措施来减少内存占用，我相信人们已经写过有关该主题的书籍，但其中一些主要的是：

用于减少代码大小的编译器选项（包括 -Os 和打包/对齐选项）
用于去除死代码的链接器选项
如果您正在加载从闪存（或 ROM）到 RAM 执行（而不是从闪存执行），然后使用压缩的闪存映像，并使用引导加载程序解压缩它。
使用静态分配：堆是分配有限内存的低效方式，如果受到限制，可能会因碎片而失败。
使用静态分配：
用于查找堆栈高水位线的工具（通常它们用模式填充堆栈，执行程序，然后查看模式保留的位置），以便您可以最佳地设置堆栈大小
当然，还可以优化您的算法用于内存占用（通常以牺牲速度为代价）

回复收藏 0 原文

虐人心 2024-08-25 08:17:16

一些明显的问题

如果速度并不重要，请直接从闪存执行代码。
使用const声明常量数据表。这将避免将数据从闪存复制到 RAM
使用最小的数据类型紧密地打包大型数据表，并以正确的顺序避免填充。
对大量数据使用压缩（只要压缩代码不超过数据）
关闭异常处理和 RTTI。
有人提到过使用 -Os 吗？ ;-)

将知识折叠成数据

Unix 哲学的规则之一可以帮助代码更紧凑：

表示规则：将知识折叠到数据中，使程序逻辑变得愚蠢而健壮。

我无法计算有多少次看到复杂的分支逻辑，跨越许多页面，它们可以被折叠成一个漂亮的紧凑的规则、常量和函数指针表。状态机通常可以用这种方式表示（状态模式）。命令模式也适用。这都是关于声明式与命令式编程风格的。

日志代码 + 二进制数据而不是文本

不记录纯文本，而是记录事件代码和二进制数据。然后使用“短语手册”来重构事件消息。短语手册中的消息甚至可以包含 printf 样式的格式说明符，以便事件数据值在文本中整齐地显示。

最小化线程数量

每个线程都需要自己的内存块来存放堆栈和 TSS。如果您不需要抢占，请考虑让您的任务在同一线程中协同执行（

使用内存池而不是囤积

为了避免堆碎片，我经常看到单独的模块囤积大量静态内存缓冲区供自己使用，即使只是偶尔需要内存。可以使用内存池来代替，因此内存仅“按需”使用。然而，这种方法可能需要仔细的分析和检测，以确保池在运行时不会耗尽。

仅在初始化时动态分配

在只有一个应用程序无限期运行的嵌入式系统中，您可以以一种不会导致碎片的合理方式使用动态分配：只需在各种初始化例程中动态分配一次，并且永远不会释放内存。 reserve() 将容器保留到正确的容量，并且不要让它们自动增长。如果您需要频繁分配/释放数据缓冲区（例如，用于通信数据包），请使用内存池。我曾经甚至扩展了 C/C++ 运行时，这样如果在初始化序列之后有任何东西试图动态分配内存，它就会中止我的程序。

A few obvious ones

If speed isn't critical, execute the code directly from flash.
Declare constant data tables using const. This will avoid the data being copied from flash to RAM
Pack large data tables tightly using the smallest data types, and in the correct order to avoid padding.
Use compression for large sets of data (as long as the compression code doesn't outweigh the data)
Turn off exception handling and RTTI.
Did anybody mention using -Os? ;-)

Folding knowledge into data

One of the rules of Unix philosophy can help make code more compact:

Rule of Representation: Fold knowledge into data so program logic can be stupid and robust.

I can't count how many times I've seen elaborate branching logic, spanning many pages, that could've been folded into a nice compact table of rules, constants, and function pointers. State machines can often be represented this way (State Pattern). The Command Pattern also applies. It's all about the declarative vs imperative styles of programming.

Log codes + binary data instead of text

Instead of logging plain text, log event codes and binary data. Then use a "phrasebook" to reconstitute the event messages. The messages in the phrasebook can even contain printf-style format specifiers, so that the event data values are displayed neatly within the text.

Minimize the number of threads

Each thread needs it own memory block for a stack and TSS. Where you don't need preemption, consider making your tasks execute co-operatively within the same thread (cooperative multi-tasking).

Use memory pools instead of hoarding

To avoid heap fragmentation, I've often seen separate modules hoard large static memory buffers for their own use, even when the memory is only occasionally required. A memory pool could be used instead so the the memory is only used "on demand". However, this approach may require careful analysis and instrumentation to make sure pools are not depleted at runtime.

Dynamic allocation only at initialization

In embedded systems where only one application runs indefinitely, you can use dynamic allocation in a sensible way that doesn't lead to fragmentation: Just dynamically allocate once in your various initialization routines, and never free the memory. reserve() your containers to the correct capacity and don't let them auto-grow. If you need to frequently allocate/free buffers of data (say, for communication packets), then use memory pools. I once even extended the C/C++ runtimes so that it would abort my program if anything tried to dynamically allocate memory after the initialization sequence.

回复收藏 0 原文

琉璃繁缕 2024-08-25 08:17:16

与所有优化一样，首先优化算法，其次优化代码和数据，最后优化编译器。

我不知道你的程序是做什么的，所以我无法对算法提出建议。许多其他人写过有关编译器的文章。因此，这里有一些关于代码和数据的建议：

消除代码中的冗余。任何长度为三行或更多行且在代码中重复三次的重复代码都应更改为函数调用。
消除数据中的冗余。找到最紧凑的表示：合并只读数据，并考虑使用压缩代码。
通过常规分析器运行代码；消除所有未使用的代码。

回复收藏 0 原文

不念旧人 2024-08-25 08:17:16

从链接器生成映射文件。它将显示内存是如何分配的。这是优化内存使用时的良好开端。它还将显示所有函数以及代码空间的布局方式。

回复收藏 0 原文

萧瑟寒风 2024-08-25 08:17:16

这是一本关于该主题的书小内存软件：内存有限的系统的模式。

回复收藏 0 原文

妖妓 2024-08-25 08:17:16

在 VS 中使用 /O 进行编译。通常，这甚至比优化速度更快，因为更小的代码大小==更少的分页。

应在链接器中启用 Comdat 折叠（默认情况下在发布版本中）

注意数据结构打包；通常，这会导致编译器生成更多代码（==更多内存）来生成访问未对齐内存的程序集。使用 1 位作为布尔标志是一个典型的示例。

此外，在选择内存高效算法而不是运行时间更好的算法时要小心。这就是过早优化的用武之地。

回复收藏 0 原文

沉鱼一梦 2024-08-25 08:17:16

好吧，大多数内容已经提到了，但无论如何，这里是我的列表：

了解您的编译器可以做什么。阅读编译器文档，尝试代码示例。检查设置。
在目标优化级别检查生成的代码。有时结果令人惊讶，而且通常情况下，优化实际上会减慢速度（或者只是占用太多空间）。
选择合适的内存模型。如果您的目标是非常小的紧凑系统，那么大或巨大的内存模型可能不是最佳选择（但通常最容易编程......）
更喜欢静态分配。仅在启动或结束时使用动态分配
静态分配的缓冲区（池或最大实例大小的静态缓冲区）。
使用C99 样式数据类型。对于存储类型，使用最小的足够数据类型。诸如循环变量之类的局部变量有时对于“快速”数据类型更有效。
选择内联候选人。一些参数重、函数体相对简单的函数在内联时效果更好。或者考虑传递参数结构。全局变量也是一种选择，但要小心 - 如果其中任何人没有足够的纪律，测试和维护可能会变得困难。
充分使用 const 关键字，注意数组初始化的影响。
映射文件，最好还包含模块大小。还要检查 crt 中包含的内容（真的有必要吗？）。
递归只是说不（有限的堆栈空间）
浮点数字 - 更喜欢定点数学。倾向于包含和调用大量代码（即使是简单的加法或乘法）。
C++ 你应该非常了解 C++。如果您不这样做，请使用 C 语言对受限嵌入式系统进行编程。那些敢于尝试的人必须小心所有高级 C++ 构造（继承、模板、异常、重载等）。考虑接近硬件代码
相反，Super-C 和 C++ 在重要的地方使用：在高级逻辑、GUI 等中。
禁用编译器设置中不需要的任何内容（无论是库的一部分、语言结构等）。

最后但并非最不重要的 - while寻找尽可能最小的代码大小 - 不要过度。还要注意性能和可维护性。过度优化的代码往往会很快退化。

Ok most were mentioned already, but here is my list anyway:

Learn what your compiler can do. Read compiler documentation, experiment with code examples. Check settings.
Check generated code at target optimization level. Sometimes results are surprising and often it turns out optimization actually slows things down (or just take too much space).
choose suitable memory model. If you target really small tight system, large or huge memory model might not be the best choice (but usually easisest to program for...)
Prefer static allocation. Use dynamic allocation only on startup or over
statically allocated buffer (pool or maximum instance sized static buffer).
Use C99 style data types. Use smallest sufficient data type, for storage types. Local variables like loop variables are sometimes more efficient with "fast" data types.
Select inline candidates. Some parameter heavy function with relatively simple bodies are better off when inlined. Or consider passing structure of parameters. Globals are also option, but be careful - tests and maintenance can become difficult if anyone in them isn't disciplned enough.
Use const keyword well , be aware of array initialization implications.
Map file, ideally also with module sizes. Check also what is included from crt (is it really neccessary?).
Recursion just say no (limited stack space)
Floating point numbers - prefer fixed point math. Tends to include and call a lot of code (even for simple addition or multiplication).
C++ you should know C++ VERY WELL. If you don't, program constrainted embedded systems in C, please. Those who dare must be careful with all advanced C++ constructs (inheritance, templates, exceptions, overloading, etc.). Consider close to HW code to be
rather Super-C and C++ is used where it counts: in high level logic, GUI, etc.
Disable whatever you don't need in compiler settings (be it parts of libraries, language constructs, etc.)

Last but not least - while hunting for smallest possible code size - don't overdo it. Watch out also for performance and maintainability. Over-optimized code tends to decay very quickly.

回复收藏 0 原文

风柔一江水 2024-08-25 08:17:16

首先，告诉编译器优化代码大小。 GCC 为此提供了 -Os 标志。

其他一切都在算法级别 - 使用与查找内存泄漏类似的工具，但寻找可以避免的分配和释放。

还要看看常用的数据结构打包 - 如果您可以将它们削减一两个字节，则可以大大减少内存使用。

回复收藏 0 原文

油焖大侠 2024-08-25 08:17:16

如果您正在寻找一种分析应用程序堆使用情况的好方法，请查看 valgrind 的 massif工具。它可以让您拍摄一段时间内应用程序内存使用情况的快照，然后您可以使用该信息更好地了解“容易实现的目标”在哪里，并相应地进行优化。

回复收藏 0 原文

无边思念无边月 2024-08-25 08:17:16

分析代码或数据膨胀可以通过映射文件完成：对于 gcc，请参阅此处，对于 VS，请参见此处.
不过，我还没有看到一个有用的大小分析工具（并且没有时间修复我的 VS AddIn hack）。

回复收藏 0 原文

浅忆 2024-08-25 08:17:16

在其他人的建议之上：

限制使用 C++ 功能，像 ANSI C 那样编写，并带有较小的扩展。标准（std::）模板使用大型动态分配系统。如果可以的话，完全避免使用模板。虽然本质上没有危害，但它们使得仅通过几个简单、干净、优雅的高级指令生成大量机器代码变得太容易了。尽管有所有“干净的代码”优势，但这鼓励以一种非常消耗内存的方式进行编写。

如果您必须使用模板，请编写自己的模板或使用专为嵌入式使用而设计的模板，将固定大小作为模板参数传递，并编写测试程序，以便您可以测试您的模板并检查 -S 输出以确保编译器不会生成可怕的程序集代码来实例化它。

手动对齐结构，或使用 #pragma pack

{char a; long b; char c; long d; char e; char f; } //is 18 bytes, 
{char a; char c; char d; char f; long b; long d; } //is 12 bytes.

出于同样的原因，使用集中式全局数据存储结构而不是分散的局部静态变量。

智能平衡 malloc()/new 和静态结构的使用。

如果您需要给定库的功能子集，请考虑编写自己的库。

展开短循环。

for(i=0;i<3;i++){ transform_vector[i]; }

比更长的

transform_vector[0];
transform_vector[1];
transform_vector[2];

不要这样做。

将多个文件打包在一起，让编译器内联短函数并执行链接器无法执行的各种优化。

on top what others suggest:

Limit use of c++ features, write like in ANSI C with minor extensions. Standard (std::) templates use a large system of dynamic allocation. If you can, avoid templates altogether. While not inherently harmful, they make it way too easy to generate lots and lots of machine code from just a couple simple, clean, elegant high-level instructions. This encourages writing in a way that - despite all the "clean code" advantages - is very memory hungry.

If you must use templates, write your own or use ones designed for embedded use, pass fixed sizes as template parameters, and write a test program so you can test your template AND check your -S output to ensure the compiler is not generating horrible assembly code to instantiate it.

Align your structures by hand, or use #pragma pack

{char a; long b; char c; long d; char e; char f; } //is 18 bytes, 
{char a; char c; char d; char f; long b; long d; } //is 12 bytes.

For the same reason, use a centralized global data storage structure instead of scattered local static variables.

Intelligently balance usage of malloc()/new and static structures.

If you need a subset of functionality of given library, consider writing your own.

Unroll short loops.

for(i=0;i<3;i++){ transform_vector[i]; }

is longer than

transform_vector[0];
transform_vector[1];
transform_vector[2];

Don't do that for longer ones.

Pack multiple files together to let the compiler inline short functions and perform various optimizations Linker can't.

回复收藏 0 原文

总以为 2024-08-25 08:17:16

不要害怕在程序中编写“小语言”。有时，一个字符串表和一个解释器可以完成很多工作。例如，在我工作过的系统中，我们有很多内部表，必须以各种方式访问它们（循环，等等）。我们有一个用于引用表格的内部命令系统，该系统形成了一种半途语言，对于它所得到的内容来说非常紧凑。

但是，要小心！知道你正在写这样的东西（我自己无意中写了一篇），并记录你在做什么。最初的开发人员似乎并没有意识到他们在做什么，因此管理起来比应有的要困难得多。

回复收藏 0 原文

枕花眠 2024-08-25 08:17:16

优化是一个流行的术语，但在技术上常常是不正确的。它的字面意思是使最优。无论是速度还是尺寸，这样的条件实际上从未实现。我们可以简单地采取一些措施来走向优化。

许多（但不是全部）用于实现计算结果最短时间的技术都会牺牲内存需求，并且许多（但不是全部）用于实现最小内存需求的技术会延长获得结果的时间。

内存需求的减少相当于固定数量的通用技术。很难找到一种特定的技术不能完美地适应其中的一个或多个。如果您完成了所有这些，即使不是绝对最小可能的空间要求，您也会非常接近程序的最小空间要求。对于一个真正的应用程序，一个经验丰富的程序员团队可能需要一千年的时间才能完成。

删除存储数据中的所有冗余，包括中间数据。
消除了存储可以流式传输的数据的所有需要。
仅分配所需的字节数，不再分配任何字节。
删除所有未使用的数据。
删除所有未使用的变量。
一旦不再需要数据，就立即免费。
删除所有未使用的算法和算法中的分支。
找出最小执行单元中表示的算法。
删除项目之间所有未使用的空间。

这是该主题的计算机科学观点，而不是开发人员的观点。

例如，打包数据结构就是结合上面的（3）和（9）的工作。压缩数据是至少部分实现上述(1)的一种方法。减少高级编程结构的开销是在 (7) 和 (8) 中取得一些进展的一种方法。动态分配是一种利用多任务环境来使用（3）的尝试。编译警告如果打开，可以帮助解决 (5) 问题。析构函数尝试协助 (6)。套接字、流和管道可用于完成 (2)。简化多项式是在（8）中取得进展的一种技术。

对九的含义以及实现它们的各种方法的理解是多年学习和检查编译产生的内存映射的结果。由于可用内存有限，嵌入式程序员通常可以更快地学习它们。

在 gnu 编译器上使用 -Os 选项会向编译器发出请求，尝试查找可以转换以完成这些任务的模式，但 -Os 是一个聚合标志，可打开许多优化功能，每个功能都尝试执行转换以完成上述 9 项任务之一。

编译器指令可以在不需要程序员努力的情况下产生结果，但是编译器中的自动化过程很少纠正由于代码编写者缺乏意识而产生的问题。

Optimizing is a popular term but often technically incorrect. It literally means to make optimal. Such a condition is never actually achieved for either speed or size. We can simply take measures to move toward optimization.

Many (but not all) of the techniques used to move toward minimum time to a computing result sacrifices memory requirement, and many (but not all) of the techniques used to move toward minimum memory requirement lengthens the time to result.

Reduction of memory requirements amounts to a fixed number of general techniques. It is difficult to find a specific technique that does not neatly fit into one or more of these. If you did all of them, you'd have something very close to the minimal space requirement for the program if not the absolute minimum possible. For a real application, it could take a team of experienced programmers a thousand years to do it.

Remove all redundancy from stored data, including intermediates.
Remove all need for storing data that could be streamed instead.
Allocate only the number of bytes needed, never a single more.
Remove all unused data.
Remove all unused variables.
Free data as soon as it is no longer possibly needed.
Remove all unused algorithms and branches within algorithms.
Find the algorithm that is represented in the minimally sized execution unit.
Remove all unused space between items.

This is a computer science view of the topic, not a developer's one.

For instance, packing a data structure is an effort that combines (3) and (9) above. Compressing data is a way to at least partly achieve (1) above. Reducing overhead of higher level programming constructs is a way to achieve some progress in (7) and (8). Dynamic allocation is an attempt to exploit a multitasking environment to employ (3). Compilation warnings, if turned on, can help with (5). Destructors attempt to assist with (6). Sockets, streams, and pipes can be used to accomplish (2). Simplifying a polynomial is a technique to gain ground in (8).

Understanding of the meaning of nine and the various ways to achieve them is the result of years of learning and checking memory maps resulting from compilation. Embedded programmers often learn them more quickly because of limited memory available.

Using the -Os option on a gnu compiler makes a request to the compiler to attempt to find patterns that can be transformed to accomplish these, but the -Os is an aggregate flag that turns on a number of optimization features, each of which attempts to perform transformations to accomplish one of the 9 tasks above.

Compiler directives can produce results without programmer effort, but automated processes in the compiler rarely correct problems created by lack of awareness in the writers of the code.

回复收藏 0 原文

浅听莫相离 2024-08-25 08:17:16

请记住某些 C++ 功能的实现成本，例如虚函数表和创建临时对象的重载运算符。

回复收藏 0 原文

So要识趣 2024-08-25 08:17:16

除了其他人所说的之外，我还想补充一点，不要使用虚拟函数，因为使用虚拟函数必须创建一个 VTable，谁知道它会占用多少空间。

还要注意异常情况。使用 gcc，我不相信每个 try-catch 块的大小都会增加（每个 try-catch 的 2 个函数 call 除外），但是有一个固定大小的函数，它必须是链接其中可能会浪费宝贵的字节

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

0 文章

0 评论

22 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

linfzu01

文章 0 评论 0

§对你不离不弃

文章 0 评论 0

可遇━不可求

文章 0 评论 0

枕梦

文章 0 评论 0

qq_3LFa8Q

文章 0 评论 0

JP

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文