动态内存分配失败恢复

发布于 2024-07-09 22:28:16 字数 547 浏览 6 评论 0原文

我正在开发嵌入式处理器(400 MHz Intel PXA255 XScale),我想我看到了一种没有足够内存来满足“新”操作的情况。 程序没有崩溃,所以我假设其他线程已经释放了它们的内存,这只是暂时的事情。 这是一些非常关键的代码,因此不能选择退出,并且需要将某种错误返回给远程用户。

下面的小修复足以解决问题,还是有更好的方法? 在用以下代码替换每个“新”之前,我想我会问一下。

char someArr[];
do{ 
    someArr = new char[10]; 
    Sleep(100); // no justification for choosing 100 ms
} while ( someArr == NULL );

睡眠有帮助吗? 我应该设置一些最大重试次数吗? 是否可以在任何地方使用静态初始化?

最终更新:非常感谢您的有用回复,但事实证明在检查内存分配失败的代码中存在错误。 我会记住所有这些答案,并尽可能多地替换 malloc 和 new(特别是在错误处理代码中)。

I'm working on an embedded processor (400 MHz Intel PXA255 XScale), and I thought I saw one case where there wasn't enough memory to satisfy a 'new' operation. The program didn't crash, so I assumed other threads had freed their memory and it was just a transient thing. This is some pretty critical code, so exiting is not an option, and some sort of error needs to be returned to the remote user.

Would the following small fix be enough to solve the problem, or is there a better way? Before replacing every 'new' with the following code, I thought I'd ask.

char someArr[];
do{ 
    someArr = new char[10]; 
    Sleep(100); // no justification for choosing 100 ms
} while ( someArr == NULL );

Does the Sleep help? Should I set some max number of retries? Is it possible to use static initialization everywhere?

FINAL UPDATE: Thank you very much for the helpful responses, but it turns out there was an error in the code checking for failed memory allocation. I will keep all of these answers in mind, and replace as many malloc's and new's as I can, though (especially in error-handling code).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

友欢 2024-07-16 22:28:16

你正试图通过局部推理来解决全球性问题。 全局问题是整个设备用于操作系统和所有应用程序的 RAM(可能还有后备存储)数量有限。 为了确保不超过此 RAM 量,您有以下几种选择

  • : 程序员进行推理以确保一切都合适。 所以,是的,可以静态分配所有内容。 这只是大量的工作,并且每次更改系统配置时,都必须重新考虑分配

  • 进程了解自己的内存使用情况和需求,并不断地相互建议它们需要多少内存。 他们合作,这样就不会耗尽内存。 这假设系统中的至少一些进程可以调整它们自己的内存需求(例如,通过改变内部高速缓存的大小)。 阿隆索和阿佩尔写了一篇关于这种方法的论文

  • 每个进程都知道内存可能会耗尽,并且可以故障转移到消耗最小内存量的状态。 通常,此策略是通过内存不足异常来实现的。 异常是在 main() 中或附近处理的,内存不足事件本质上是从头开始重新启动程序。 如果内存响应用户请求而增长,则这种故障转移模式可以发挥作用; 如果程序的内存需求增长与用户的操作无关,可能会导致系统崩溃。

您的上述建议与任何场景都不匹配。相反,您希望其他一些进程会解决问题,并且您需要的内存最终会出现。 你可能会幸运。 你可能不会。

如果您希望系统可靠地工作,则最好根据共享有限内存的需要重新考虑系统上运行的每个进程的设计。 这可能是一项比您预期的工作更大的工作,但如果您了解问题所在,您就可以做到。 祝你好运!

You are trying to solve a global problem through local reasoning. The global problem is that the entire device has a limited amount of RAM (and possibly backing store) for the operating system and all of the applications. To make sure this amount of RAM is not exceeded, you have a few options:

  • Each process operates in a fixed amount of RAM to be determined per process at startup time; the programmer does the reasoning to make sure everything fits. So, yes, it is possible to allocate everything statically. It's just a lot of work, and every time you change your system's configuration, you have to reconsider the allocations.

  • Processes are aware of their own memory usage and needs and continually advise each other about how much memory they need. They cooperate so they don't run out of memory. This assumes that at least some processes in the system can adjust their own memory requirements (e.g., by changing the size of an internal cache). Alonso and Appel wrote a paper about this approach.

  • Each process is aware that memory can become exhausted and can fail over to a state in which it consumes a minimum amount of memory. Often this strategy is implemented by having an out-of-memory exception. The exception is handled in or near main() and the out-of-memory event essentially restarts the program from scratch. This failover mode can work if memory grows in response to user requests; if the program's memory requirements grow independent of what the user does, it can lead to thrashing.

Your proposal above matches none of the scenarios. Instead, you are hoping some other process will solve the problem and the memory you need will eventually appear. You might get lucky. You might not.

If you want your system to work reliably, you would do well to reconsider the design of every process running on the system in light of the need to share limited memory. It might be a bigger job than you expected, but if you understand the problem, you can do this. Good luck!

白鸥掠海 2024-07-16 22:28:16

有几种不同的方法可以解决此问题 - 请注意,根据您使用的 Windows CE / Windows Mobile 版本,工具说明会略有不同。

需要回答的一些问题:

1. 您的应用程序是否泄漏内存,导致内存不足的情况?

2. 您的应用程序是否只是在某些阶段使用了过多的内存,导致内存不足的情况?

1 和 2 可以使用 Windows CE AppVerifier 工具进行调查,该工具可以为您的产品提供详细的内存记录工具。 其他堆包装工具也可以提供类似的信息(并且可能具有更高的性能),具体取决于您的产品设计。

http://msdn.microsoft.com/en-us/library/aa446904。 3.aspx

3. 您在此过程中是否非常频繁地分配和释放内存?

操作系统版本 6.0 之前的 Windows CE(不要与 Windows Mobile 6.x 混淆)具有 32MB/进程虚拟内存限制,这往往会导致大量内存占用。有趣的碎片问题。 在这种情况下,即使您有足够的可用物理内存,也可能会耗尽虚拟内存。 使用自定义块分配器通常可以缓解此问题。

4. 您是否分配了非常大的内存块? (> 2MB)

与 3 相关,您可能会耗尽进程虚拟内存空间。 有一些技巧(在某种程度上取决于操作系统版本)可以在进程空间之外的共享虚拟机空间中分配内存。 如果您的虚拟机用完了,但物理 RAM 还没有用完,这可能会有所帮助。

5. 您是否使用大量 DLL?

也与 3 相关,根据操作系统版本,DLL 也可能会很快减少可用 VM 总量。

进一步的起点:

CE内存工具概述

http://blogs.msdn.com/ce_base/archive/2006/01/11/511883.aspx

目标控制窗口“mi”工具

http://msdn.microsoft.com/en-us/library/aa450013.aspx

There are a few different ways to attack this - note that the tool instructions will vary a bit, based on what version of Windows CE / Windows Mobile you are using.

Some questions to answer:

1. Is your application leaking memory, leading to this low memory condition?

2. Does your application simply use too much memory at certain stages, leading to this low memory condition?

1 and 2 can be investigated using the Windows CE AppVerifier tool, which can provide detailed memory logging tools for your product. Other heap wrapping tools can also provide similar information (and may be higher-performance), depending on your product design.

http://msdn.microsoft.com/en-us/library/aa446904.aspx

3. Are you allocating and freeing memory very frequently in this process?

Windows CE, prior to OS version 6.0 (don't confuse with Windows Mobile 6.x) had a 32MB / process virtual memory limit, which tends to cause lots of fun fragmentation issues. In this case, even if you have sufficient physical memory free, you might be running out of virtual memory. Use of custom block allocators is usually a mitigation for this problem.

4. Are you allocating very large blocks of memory? (> 2MB)

Related to 3, you could just be exhausting the process virtual memory space. There are tricks, somewhat dependent on OS version, to allocate memory in a shared VM space, outside the process space. If you are running out of VM, but not physical RAM, this could help.

5. Are you using large numbers of DLLs?

Also related to 3, Depending on OS version, DLLs may also reduce total available VM very quickly.

Further jumping off points:

Overview of CE memory tools

http://blogs.msdn.com/ce_base/archive/2006/01/11/511883.aspx

Target control window 'mi' tool

http://msdn.microsoft.com/en-us/library/aa450013.aspx

朦胧时间 2024-07-16 22:28:16

其他答案中有很多好东西,但我确实认为值得补充的是,如果所有线程都进入类似的循环,那么程序将陷入死锁。

对于这种情况的“正确”答案可能是对程序的不同部分进行严格的限制,以确保它们不会过度消耗内存。 这可能需要重写程序所有部分的主要部分。

下一个最佳解决方案是进行一些回调,其中失败的分配尝试可以告诉程序的其余部分需要更多内存。 也许程序的其他部分可以比平常更积极地释放一些缓冲区,或者释放用于缓存搜索结果的内存,等等。 这将需要为程序的其他部分添加新代码。 然而,这可以逐步完成,而不需要重写整个程序。

另一种解决方案是让程序使用互斥体保护大型(临时)内存请求。 听起来你很有信心,如果你稍后再试一次,内存很快就会被释放。 我建议您对可能消耗大量内存的操作使用互斥体,这将使线程在另一个线程释放所需内存时立即被唤醒。 否则,即使内存立即释放,您的线程也会休眠十分之一秒。

您也可以尝试 sleep(0),它将简单地将控制权移交给任何其他准备运行的线程。 如果所有其他线程都进入睡眠状态,这将允许您的线程立即重新获得控制权,而不必等待 100 毫秒的句子结束。 但是,如果至少有一个线程仍想运行,您仍然需要等待,直到它放弃控制权。 我上次检查过,在 Linux 机器上这通常是 10 毫秒。 其他平台我不知道。 如果您的线程自愿进入睡眠状态,那么它在调度程序中也可能具有较低的优先级。

There are lots of good things in the other answers, but I did think it worth adding that if all the threads get in a similar loop, then the program will be deadlocked.

The "correct" answer to this situation is probably to have strict limits for the different parts of the program to ensure that they don't over consume memory. That would probably require rewriting major sections across all parts of the program.

The next best solution would be to have some callback where a failed allocation attempt can tell the rest of the program that more memory is needed. Perhaps other parts of the program can release some buffers more aggressively than they normally would, or release memory used to cache search results, or something. This would require new code for other parts of the program. However, this could be done incrementally, rather than requiring a rewrite across the entire program.

Another solution would be to have the program protect large (temporary) memory requests with a mutex. It sounds like you are confident that memory will be released soon if you can just try again later. I suggest that you use a mutex for operations that might consume a lot of memory, this will allow the thread to be woken up immediately when another thread has released the memory that is needed. Otherwise your thread will sleep for a tenth of a second even if the memory frees up immediately.

You might also try sleep(0), which will simply hand off control to any other thread that is ready to run. This will allow your thread to regain control immediately if all other threads go to sleep, rather than having to wait out its 100 millisecond sentence. But if at least one thread still wants to run, you will still have to wait until it gives up control. This is typically 10 milliseconds on Linux machines, last I checked. I don't know about other platforms. Your thread may also have a lower priority in the scheduler if it has voluntarily gone to sleep.

蓝海似她心 2024-07-16 22:28:16

根据您的问题,我假设您的堆在多个线程之间共享。

如果不是,那么上面的代码将不起作用,因为循环运行时不会从堆中释放任何内容。

如果堆是共享的,那么上面的方法可能会起作用。 但是,如果您有共享堆,那么调用“new”可能会导致自旋锁(与您拥有的循环类似,但使用 CAS 指令),或者它会根据某些内核资源而阻塞。

在这两种情况下,循环都会降低系统的吞吐量。 这是因为您要么会发生比需要更多的上下文切换,要么需要更长的时间来响应“内存现在可用”事件。

我会考虑覆盖“新”和“删除”运算符。 当 new 失败时,您可以阻塞(或在某种计数器变量上旋转锁定)等待另一个线程释放内存,然后删除可以向阻塞的“新”线程发出信号或使用 CAS 递增计数器变量。

这应该会给你带来更好的吞吐量和更高的效率

Based on your question, I'm assuming that your heap is shared between multiple threads.

If it isn't then the code above won't work, because nothing will be freed from the heap while the loop is running.

If the heap is shared, then the above would probably work. However, if you have a shared heap, then calling "new" will probably result in either a spin lock ( a similar loop to the one you have, but using CAS instructions), or it will block based on some kernel resources.

In both cases, the loop you have will decrease the throughput of your system. This is because you will either incur more context switches then you need to, or will take longer to respond to the "memory is now available" event.

I would consider overriding the "new" and "delete" operators. When new fails you can block (or spin lock on a counter variable of some sort) waiting for another thread to free memory, and then delete can either signal the blocked "new" thread or increment the counter variable using CAS.

That should give you better throughput and be a bit more efficent

童话 2024-07-16 22:28:16

几点:

  • 嵌入式程序通常在启动时分配所有内存或仅使用静态内存以避免此类情况。
  • 除非设备上运行其他程序来定期释放内存,否则您的解决方案不太可能有效。
  • 我的 Viper 有 64MB RAM,我不认为它们的内存少于 32MB,您的应用程序使用了多少内存?

A few points:

  • Embedded programs often allocate all memory at startup or use only static memory to avoid situations like this.
  • Unless there is something else running on the device that frees memory on a regular basis your solution isn't likely to be effective.
  • The Viper I have has 64MB RAM, I don't think they come with less than 32MB, how much memory is your application using?
只涨不跌 2024-07-16 22:28:16

我其次认为,最明智的做法是使用静态分配内存,这样您就可以了解发生了什么。 动态内存分配是桌面编程的一个坏习惯,不适合资源受限的计算机(除非您花费大量时间和精力创建良好的托管和控制内存管理系统)。

另外,检查您的设备中的操作系统(假设它有一个像这样的高端 ARM 设备往往运行操作系统)具有哪些用于处理内存的功能。

I second that the most sensible thing to do is to use static allocation of memory, so you have some idea of what is going on. Dynamic memory allocation is a bad habit from desktop programming that is not suited on restricted-resource machines (unless you spend a fair bit of time and effort creating a good managed and controlled memory management system).

Also, check what features the OS in your device (assuming it has one, high-end ARM devices like this one tends to run an OS) has for handling memory.

傲娇萝莉攻 2024-07-16 22:28:16

你使用C++。 因此,您可以使用一些 C++ 实用程序来让您的生活更轻松。 例如,为什么不使用new_handler?

void my_new_handler() {
    // make room for memory, then return, or throw bad_alloc if
    // nothing can be freed.
}

int main() {
    std::set_new_handler(&my_new_handler);

    // every allocation done will ask my_new_handler if there is
    // no memory for use anymore. This answer tells you what the
    // standard allocator function does: 
    // https://stackoverflow.com/questions/377178
}

在 new_handler 中,您可以向所有应用程序发送一个信号,以便它们知道某些应用程序需要内存,然后稍等一下,以便为其他应用程序提供时间来满足内存请求。 重要的是,您要做某事,而不是默默地希望获得可用内存。 如果仍然没有足够的可用内存,new 运算符将再次调用您的处理程序,因此您不必担心所有应用程序是否已经释放了所需的内存。 如果您需要知道 new_handler 中所需的内存大小,您还可以重载 new 运算符。 请参阅我的其他答案 如何做到这一点。 这样,您就有一个中心位置来处理内存问题,而不是许多与之相关的地方。

You use C++. So you can make use of some C++ utilities to make your life easier. For example, why not use new_handler?

void my_new_handler() {
    // make room for memory, then return, or throw bad_alloc if
    // nothing can be freed.
}

int main() {
    std::set_new_handler(&my_new_handler);

    // every allocation done will ask my_new_handler if there is
    // no memory for use anymore. This answer tells you what the
    // standard allocator function does: 
    // https://stackoverflow.com/questions/377178
}

In the new_handler, you could send all applications a signal so that they know that memory is needed for some application, and then wait a bit to give other applications the time to fulfill the request for memory. Important is that you do something and not silently hope for available memory. The new operator will call your handler again if still not enough memory is available, so you don't have to worry about whether or not all applications have free'ed the needed memory already. You can also overload operator new if you need to know the size of memory that is needed in the new_handler. See my other answer on how to do that. This way, you have one central place to handle memory problems, instead of many places concerned with that.

又爬满兰若 2024-07-16 22:28:16

正如其他人提到的,理想情况下,您可以通过预先设计和软件架构来避免这个问题,但我认为目前这确实不是一个选择。

正如另一篇文章提到的,最好将逻辑包装在一些实用函数中,这样您就不会在所有地方编写内存不足的代码。

为了解决真正的问题,您尝试使用共享资源(内存),但无法使用,因为该共享资源正在被系统中的另一个线程使用。 理想情况下,您想要做的是等待系统中的其他线程之一释放您需要的资源,然后获取该资源。 如果您有办法拦截所有分配和释放调用,您可以设置一些东西,以便分配线程阻塞直到内存可用,并且当内存可用时释放向分配线程发出信号。 但我认为这工作量太大了。

考虑到无法完全重新架构系统或重新编写内存分配器的限制,那么我认为您的解决方案是最实用的,只要您(和团队中的其他人)了解限制,以及它将导致的问题。

现在,为了改进您的特定方法,您可能需要测量工作负载以了解分配和释放内存的频率。 这将使您更好地计算重试间隔应该是多少。

其次,您可能希望尝试增加每次迭代的超时,以减少系统上该线程的负载。

最后,如果线程在经过一定次数的迭代后无法取得进展,那么您绝对应该经历一段时间的错误/恐慌情况。 如果所有线程都在等待系统中的另一个线程释放内存,这至少可以让您看到可能遇到的潜在活锁情况。 您可以简单地根据经验证明有效的方法选择一些迭代,或者您可以更聪明地处理它并跟踪有多少线程被卡在等待内存,以及是否最终导致所有线程恐慌。

注意:这显然不是一个完美的解决方案,正如其他发帖者提到的那样,需要对整个应用程序有更全局的了解才能正确解决问题,但上述是一种实用的技术,应该短期工作。

As others have mentioned, ideally, you would avoid this problem by up-front design and software architecture, but I’m assuming at this point that really isn’t an option.

As another post mentions, it would be good to wrap the logic in some utility functions so that you don’t end up writing the out-of-memory code all of the place.

To get to the real problem, you try to use a shared resource, memory, but are unable to because that shared resource is being used by another thread in the system. Ideally what you would like to do is wait for one of the other threads in the system to release the resource you need, and then acquire that resource. If you had a way of intercepting all the allocation and free calls you could set-up something so that the allocating thread blocked until memory was available, and the freeing signalled the allocating thread when memory was available. But I’m going to assume that is simply too much work.

Given the constraints of not being able to totally re-architect the system, or re-write the memory allocator, then I think your solution is the most practical, as long as you (and the others on your team), understand the limitations, and the problems it will cause down the track.

Now to improve your specific approach, you may want to measure the workloads to see how often memory is allocated and freed. This would give you a better ay of calculating what the retry interval should be.

Secondly, you way want to try increasing the time-out for each iteration to reduce the load of that thread on the system.

Finally, you definitely should have some time of error/panic case if the thread is unable to make progress after some number of iterations. This will let you at least see the potential live-lock case you may encounter if all threads are waiting for another thread in the system to free memory. You could simply pick a number of iterations based on what is empirically shown to work, or you could get smarter about it and keep track of how many threads are stuck waiting for memory, and if that ends up being all threads panic.

Note: This is obviously not a perfect solution and as the other posters have mentioned a more global view of the application as a whole is needed to solve the problem correctly, but the above is a practical technique that should work in the short-term.

懒的傷心 2024-07-16 22:28:16

当然,这取决于您是否对 100(毫秒?)睡眠期间内存变得可用有合理的期望? 当然,您应该限制它尝试的次数。

对我来说,这里没有什么味道。 嗯……

嵌入式系统通常需要具有极高的确定性 - 也许您应该检查整个系统并预先考虑可能发生故障的可能性; 然后努力失败,它在实践中确实发生了。

Surely it would depend on whether you have a reasonable expectation of memory becoming available in the 100 (millisecond?) sleep? Certainly, you should limit the number of times it attempts.

To me something doesn't smell right here. Hmmm...

Embedded systems typically need to be extremely deterministic - perhaps you should review the entire system and head of the potential for this to fail up front; and then just fail hard it it actually happens in practice.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文