“内存不足” 可恢复的错误?

发布于 2024-07-09 02:15:25 字数 197 浏览 10 评论 0原文

我已经编程很长时间了,我看到的程序,当内存不足时,会尝试清理并退出,即优雅地失败。 我不记得上次看到有人真正尝试恢复并继续正常运行是什么时候。

如此多的处理依赖于能够成功分配内存,尤其是在垃圾收集语言中,似乎内存不足错误应该被归类为不可恢复的错误。 (不可恢复的错误包括堆栈溢出之类的情况。)

使其成为可恢复错误的令人信服的论据是什么?

I've been programming a long time, and the programs I see, when they run out of memory, attempt to clean up and exit, i.e. fail gracefully. I can't remember the last time I saw one actually attempt to recover and continue operating normally.

So much processing relies on being able to successfully allocate memory, especially in garbage collected languages, it seems that out of memory errors should be classified as non-recoverable. (Non-recoverable errors include things like stack overflows.)

What is the compelling argument for making it a recoverable error?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(24

朮生 2024-07-16 02:15:26

我知道你要求论证,但我只能看到反对的论证。

无论如何,我看不到在多线程应用程序中实现这一点。 您如何知道哪个线程实际上对内存不足错误负责? 一个线程可以不断分配新内存,并且具有 99% 的堆的 gc-roots,但第一次分配失败发生在另一个线程中。

一个实际的例子:每当我的 Java 应用程序(在 JBoss 服务器上运行)中发生 OutOfMemoryError 时,它并不像一个线程死亡而服务器的其余部分继续运行:不,有几个 OOME,杀死了几个线程(一些其中是JBoss的内部线程)。 我不知道作为一名程序员我可以做些什么来从中恢复,甚至不知道 JBoss 可以做些什么来从中恢复。 事实上,我什至不确定你可以: javadoc for VirtualMachineError 表明,抛出此类错误后,JVM 可能会“损坏”。 但也许这个问题更针对语言设计。

I know you asked for arguments for, but I can only see arguments against.

I don't see anyway to achieve this in a multi-threaded application. How do you know which thread is actually responsible for the out-of-memory error? One thread could allocating new memory constantly and have gc-roots to 99% of the heap, but the first allocation that fails occurs in another thread.

A practical example: whenever I have occurred an OutOfMemoryError in our Java application (running on a JBoss server), it's not like one thread dies and the rest of the server continues to run: no, there are several OOMEs, killing several threads (some of which are JBoss' internal threads). I don't see what I as a programmer could do to recover from that - or even what JBoss could do to recover from it. In fact, I am not even sure you CAN: the javadoc for VirtualMachineError suggests that the JVM may be "broken" after such an error is thrown. But maybe the question was more targeted at language design.

三生殊途 2024-07-16 02:15:26

uClibc 有一个 8 字节左右的内部静态缓冲区,用于在没有更多内存可动态分配时用于文件 I/O。

uClibc has an internal static buffer of 8 bytes or so for file I/O when there is no more memory to be allocated dynamically.

↙温凉少女 2024-07-16 02:15:26

使其成为可恢复错误的令人信服的论据是什么?

在 Java 中,使其成为可恢复错误的一个令人信服的论据是,Java 允许在任何时间发出 OOM 信号,包括结果可能是您的程序进入的时间不一致的状态。 因此,从 OOM 中可靠地恢复是不可能的; 如果捕获 OOM 异常,则不能依赖任何程序状态。 看
不抛出 VirtualMachineError 保证

What is the compelling argument for making it a recoverable error?

In Java, a compelling argument for not making it a recoverable error is because Java allows OOM to be signalled at any time, including at times where the result could be your program entering an inconsistent state. Reliable recoery from an OOM is therefore impossible; if you catch the OOM exception, you can not rely on any of your program state. See
No-throw VirtualMachineError guarantees

想你的星星会说话 2024-07-16 02:15:26

我正在研究 SpiderMonkey,这是 Firefox(以及 gnome 和其他一些)中使用的 JavaScript VM。 当内存不足时,您可能需要执行以下任一操作:

  1. 运行垃圾收集器。 我们不会一直运行垃圾收集器,因为它会降低性能和电池电量,因此当您遇到内存不足错误时,可能已经积累了一些垃圾。
  2. 释放内存。 例如,删除一些内存缓存。
  3. 终止或推迟非必要的任务。 例如,从内存中卸载一些长时间不用的选项卡。
  4. 记录日志以帮助开发人员解决内存不足错误。
  5. 显示一条半友好的错误消息,让用户知道发生了什么。
  6. ...

所以,是的,有很多原因需要手动处理内存不足错误!

I'm working on SpiderMonkey, the JavaScript VM used in Firefox (and gnome and a few others). When you're out of memory, you may want to do any of the following things:

  1. Run the garbage-collector. We don't run the garbage-collector all the time, as it would kill performance and battery, so by the time you're reaching out of memory error, some garbage may have accumulated.
  2. Free memory. For instance, get rid of some of the in-memory cache.
  3. Kill or postpone non-essential tasks. For instance, unload some tabs that haven't be used in a long time from memory.
  4. Log things to help the developer troubleshoot the out-of-memory error.
  5. Display a semi-nice error message to let the user know what's going on.
  6. ...

So yes, there are many reasons to handle out-of-memory errors manually!

洛阳烟雨空心柳 2024-07-16 02:15:26

我有这个:

void *smalloc(size_t size) {
  void *mem = null; 
  for(;;) {
   mem = malloc(size);
   if(mem == NULL) {
    sleep(1);
   } else 
     break;
  }
  return mem;
}

它已经保存了系统几次。 仅仅因为您现在内存不足,并不意味着系统的其他部分或系统上运行的其他进程有一些内存,它们很快就会归还。 在尝试这些技巧之前,您最好非常小心,并完全控制您在程序中分配的每个内存。

I have this:

void *smalloc(size_t size) {
  void *mem = null; 
  for(;;) {
   mem = malloc(size);
   if(mem == NULL) {
    sleep(1);
   } else 
     break;
  }
  return mem;
}

Which has saved a system a few times already. Just because you're out of memory now, doesn't mean some other part of the system or other processes running on the system have some memory they'll give back soon. You better be very very careful before attempting such tricks, and have all control over every memory you do allocate in your program though.

久伴你 2024-07-16 02:15:25

这实际上取决于您正在构建的内容。

对于网络服务器来说,一个请求/响应对失败但随后继续处理进一步的请求并不是完全不合理的。 然而,您必须确保单一故障不会对全局状态产生有害影响 - 这将是棘手的一点。 鉴于故障会在大多数托管环境(例如 .NET 和 Java)中导致异常,我怀疑如果在“用户代码”中处理该异常,则将来的请求可以恢复该异常 - 例如,如果一个请求尝试分配 10GB 内存并且失败了,这不会损害系统的其余部分。 然而,如果系统在尝试将请求交给用户代码时内存不足,那么这种情况可能会更糟糕。

It really depends on what you're building.

It's not entirely unreasonable for a webserver to fail one request/response pair but then keep on going for further requests. You'd have to be sure that the single failure didn't have detrimental effects on the global state, however - that would be the tricky bit. Given that a failure causes an exception in most managed environments (e.g. .NET and Java) I suspect that if the exception is handled in "user code" it would be recoverable for future requests - e.g. if one request tried to allocate 10GB of memory and failed, that shouldn't harm the rest of the system. If the system runs out of memory while trying to hand off the request to the user code, however - that kind of thing could be nastier.

忘东忘西忘不掉你 2024-07-16 02:15:25

在库中,您希望高效地复制文件。 当您这样做时,您通常会发现使用少量大块进行复制比复制大量较小块更有效(例如,通过复制 15 个 1MB 块来复制 15MB 文件比复制 15,000 个块要快) 1K 块)。

但该代码适用于任何块大小。 因此,虽然使用 1MB 块可能会更快,但如果您为复制大量文件的系统进行设计,那么捕获 OutOfMemoryError 并减小块大小可能是明智之举,直到成功为止。

另一个地方是存储在数据库中的对象的缓存。 您希望在缓存中保留尽可能多的对象,但又不想干扰应用程序的其余部分。 由于可以重新创建这些对象,因此将缓存附加到内存不足处理程序以删除条目,直到应用程序的其余部分再次有足够的空间呼吸,这是节省内存的明智方法。

最后,对于图像处理,您希望将尽可能多的图像加载到内存中。 同样,OOM 处理程序允许您在不事先知道用户或操作系统将授予您的代码多少内存的情况下实现这一点。

[编辑] 请注意,我在这里工作的假设是您已经为应用程序提供了固定数量的内存,并且该数量小于不包括交换空间的总可用内存。 如果您可以分配如此多的内存以至于必须换出一部分内存,那么我的一些评论就不再有意义了。

In a library, you want to efficiently copy a file. When you do that, you'll usually find that copying using a small number of big chunks is much more effective than copying a lot of smaller ones (say, it's faster to copy a 15MB file by copying 15 1MB chunks than copying 15'000 1K chunks).

But the code works with any chunk size. So while it may be faster with 1MB chunks, if you design for a system where a lot of files are copied, it may be wise to catch OutOfMemoryError and reduce the chunk size until you succeed.

Another place is a cache for Object stored in a database. You want to keep as many objects in the cache as possible but you don't want to interfere with the rest of the application. Since these objects can be recreated, it's a smart way to conserve memory to attach the cache to an out of memory handler to drop entries until the rest of the app has enough room to breathe, again.

Lastly, for image manipulation, you want to load as much of the image into memory as possible. Again, an OOM-handler allows you to implement that without knowing in advance how much memory the user or OS will grant your code.

[EDIT] Note that I work under the assumption here that you've given the application a fixed amount of memory and this amount is smaller than the total available memory excluding swap space. If you can allocate so much memory that part of it has to be swapped out, several of my comments don't make sense anymore.

皓月长歌 2024-07-16 02:15:25

MATLAB 用户在对大型数组执行算术运算时总是会出现内存不足的情况。 例如,如果变量 x 适合内存并且运行“x+1”,则 MATLAB 会为结果分配空间,然后填充它。 如果分配失败,MATLAB 会出错,用户可以尝试其他方法。 如果每当出现此用例时 MATLAB 就退出,那将是一场灾难。

Users of MATLAB run out of memory all the time when performing arithmetic with large arrays. For example if variable x fits in memory and they run "x+1" then MATLAB allocates space for the result and then fills it. If the allocation fails MATLAB errors and the user can try something else. It would be a disaster if MATLAB exited whenever this use case came up.

内心激荡 2024-07-16 02:15:25

OOM 应该是可恢复的,因为关闭并不是从 OOM 恢复的唯一策略。

对于应用程序级别的 OOM 问题,实际上有一个相当标准的解决方案。
作为应用程序设计的一部分,确定从内存不足情况中恢复所需的安全最小内存量。 (例如,自动保存文档、弹出警告对话框、记录关机数据所需的内存)。

在应用程序启动时或关键块开始时,预先分配该内存量。 如果您检测到内存不足的情况,请释放您的保护内存并执行恢复。 该策略仍然可能会失败,但总体而言是物有所值的。

请注意,应用程序不需要关闭。 它可以显示模式对话框,直到 OOM 条件得到解决。

我不是 100% 确定,但我很确定“代码完整” (任何受人尊敬的软件工程师必读)涵盖了这一点。

PS您可以扩展您的应用程序框架来帮助实现此策略,但请不要在库中实施此类策略(好的库不会在未经应用程序同意的情况下做出全局决策)

OOM should be recoverable because shutdown isn't the only strategy to recovering from OOM.

There is actually a pretty standard solution to the OOM problem at the application level.
As part of you application design determine a safe minimum amount of memory required to recover from an out of memory condition. (Eg. the memory required to auto save documents, bring up warning dialogs, log shutdown data).

At the start of your application or at the start of a critical block, pre-allocate that amount of memory. If you detect an out of memory condition release your guard memory and perform recovery. The strategy can still fail but on the whole gives great bang for the buck.

Note that the application need not shut down. It can display a modal dialog until the OOM condition has been resolved.

I'm not 100% certain but I'm pretty sure 'Code Complete' (required reading for any respectable software engineer) covers this.

P.S. You can extend your application framework to help with this strategy but please don't implement such a policy in a library (good libraries do not make global decisions without an applications consent)

眼趣 2024-07-16 02:15:25

我认为就像许多事情一样,这是成本/收益分析。 您可以通过编程尝试从 malloc() 故障中恢复 - 尽管这可能很困难(您的处理程序最好不要遇到它要处理的内存短缺问题)。

您已经注意到,最常见的情况是清理并优雅地失败。 在这种情况下,我们决定优雅中止的成本低于恢复时开发成本和性能成本的总和。

我相信您可以想到自己的例子,其中终止程序是一个非常昂贵的选择(生命维持机、宇宙飞船控制、长时间运行和时间关键的财务计算等) - 尽管第一道防线是当然要确保程序具有可预测的内存使用情况并且环境可以提供该内存使用情况。

I think that like many things, it's a cost/benefit analysis. You can program in attempted recovery from a malloc() failure - although it may be difficult (your handler had better not fall foul of the same memory shortage it's meant to deal with).

You've already noted that the commonest case is to clean up and fail gracefully. In that case it's been decided that the cost of aborting gracefully is lower than the combination of development cost and performance cost in recovering.

I'm sure you can think of your own examples of situations where terminating the program is a very expensive option (life support machine, spaceship control, long-running and time-critical financial calculation etc.) - although the first line of defence is of course to ensure that the program has predictable memory usage and that the environment can supply that.

雨轻弹 2024-07-16 02:15:25

我正在开发一个为 IO 缓存分配内存以提高性能的系统。 然后,在检测到 OOM 时,它会收回其中的一部分,以便业务逻辑可以继续进行,即使这意味着更少的 IO 缓存和稍低的写入性能。

我还使用了一个嵌入式 Java 应用程序,该应用程序尝试通过强制垃圾回收来管理 OOM,并可选择释放一些非关键对象,例如预取或缓存的数据。

OOM处理的主要问题是:

1)能够在发生的地方重试或者能够回滚并从更高的点重试。 大多数当代程序过于依赖语言来抛出,并且没有真正管理它们最终的位置以及如何重试操作。 通常操作的上下文将会丢失,如果它没有被设计为保留

2) 能够实际释放一些内存。 这意味着一种资源管理器知道哪些对象是关键的,哪些不是关键的,并且系统能够在它们以后变得关键时重新请求已释放的对象

另一个重要的问题是能够在不触发的情况下回滚另一种 OOM 情况。 这是在高级语言中很难控制的事情。

此外,底层操作系统对于 OOM 的行为必须是可预测的。 例如,如果启用了内存过量使用,Linux 就不会。 许多支持交换的系统在向有问题的应用程序报告 OOM 之前就会死亡。

而且,在这种情况下,不是您的进程造成了这种情况,因此如果有问题的进程继续泄漏,释放内存也无济于事。

正因为如此,大型嵌入式系统通常会采用这种技术,因为它们可以控制操作系统和内存以实现这些技术,并有实现它们的纪律/动机。

I'm working on a system that allocates memory for IO cache to increase performance. Then, on detecting OOM, it takes some of it back, so that the business logic could proceed, even if that means less IO cache and slightly lower write performance.

I also worked with an embedded Java applications that attempted to manage OOM by forcing garbage collection, optionally releasing some of non-critical objects, like pre-fetched or cached data.

The main problems with OOM handling are:

1) being able to re-try in the place where it happened or being able to roll back and re-try from a higher point. Most contemporary programs rely too much on the language to throw and don't really manage where they end up and how to re-try the operation. Usually the context of the operation will be lost, if it wasn't designed to be preserved

2) being able to actually release some memory. This means a kind of resource manager that knows what objects are critical and what are not, and the system be able to re-request the released objects when and if they later become critical

Another important issue is to be able to roll back without triggering yet another OOM situation. This is something that is hard to control in higher level languages.

Also, the underlying OS must behave predictably with regard to OOM. Linux, for example, will not, if memory overcommit is enabled. Many swap-enabled systems will die sooner than reporting the OOM to the offending application.

And, there's the case when it is not your process that created the situation, so releasing memory does not help if the offending process continues to leak.

Because of all this, it's often the big and embedded systems that employ this techniques, for they have the control over OS and memory to enable them, and the discipline/motivation to implement them.

铃予 2024-07-16 02:15:25

只有抓住并正确处理才能恢复。

例如,在相同的情况下,请求尝试分配大量内存。 这是相当可预测的,你可以很好地处理它。

然而,在很多情况下,在多线程应用程序中,OOE也可能发生在后台线程(包括由系统/第三方库创建的线程)上。
预测几乎是不可能的,并且您可能无法恢复所有线程的状态。

It is recoverable only if you catch it and handle it correctly.

In same cases, for example, a request tried to allocate a lot memory. It is quite predictable and you can handle it very very well.

However, in many cases in multi-thread application, OOE may also happen on background thread (including created by system/3rd-party library).
It is almost imposable to predict and you may unable to recover the state of all your threads.

許願樹丅啲祈禱 2024-07-16 02:15:25

不。
来自 GC 的内存不足错误通常不应在当前线程内恢复。 (不过,应该支持可恢复线程(用户或内核)创建和终止)

关于反例:我目前正在开发一个 D 编程语言项目,该项目使用 NVIDIA 的 CUDA 平台进行 GPU 计算。 我没有手动管理 GPU 内存,而是创建了代理对象来利用 D 的 GC。 因此,当 GPU 返回内存不足错误时,我会运行完整收集,并且仅在第二次失败时引发异常。 但是,这实际上并不是内存不足恢复的示例,它更多的是 GC 集成的示例。 恢复的其他示例(缓存、空闲列表、没有自动收缩的堆栈/哈希等)都是具有自己的收集/压缩内存方法的结构,这些方法与 GC 分离,并且往往不是分配的本地方法。功能。
因此,人们可能会实现如下所示的内容:

T new2(T)( lazy T old_new ) {
    T obj;
    try{
        obj = old_new;
    }catch(OutOfMemoryException oome) {
        foreach(compact; Global_List_Of_Delegates_From_Compatible_Objects)
            compact();
        obj = old_new;
    }
    return obj;
}

一般来说,这是添加对向垃圾收集器注册/取消注册自收集/压缩对象的支持的一个不错的论点。

No.
An out of memory error from the GC is should not generally be recoverable inside of the current thread. (Recoverable thread (user or kernel) creation and termination should be supported though)

Regarding the counter examples: I'm currently working on a D programming language project which uses NVIDIA's CUDA platform for GPU computing. Instead of manually managing GPU memory, I've created proxy objects to leverage the D's GC. So when the GPU returns an out of memory error, I run a full collect and only raise an exception if it fails a second time. But, this isn't really an example of out of memory recovery, it's more one of GC integration. The other examples of recovery (caches, free-lists, stacks/hashes without auto-shrinking, etc) are all structures that have their own methods of collecting/compacting memory which are separate from the GC and tend not to be local to the allocating function.
So people might implement something like the following:

T new2(T)( lazy T old_new ) {
    T obj;
    try{
        obj = old_new;
    }catch(OutOfMemoryException oome) {
        foreach(compact; Global_List_Of_Delegates_From_Compatible_Objects)
            compact();
        obj = old_new;
    }
    return obj;
}

Which is a decent argument for adding support for registering/unregistering self-collecting/compacting objects to garbage collectors in general.

べ繥欢鉨o。 2024-07-16 02:15:25

这个问题被标记为“与语言无关”,但如果不考虑语言和/或底层系统,就很难回答。

如果内存分配是隐式的,没有机制来检测给定的分配是否成功,那么从内存不足的情况中恢复可能会很困难或不可能。

例如,如果您调用一个尝试分配大数组的函数,大多数语言只是不定义无法分配数组时的行为。 (在 Ada 中,这会引发 Storage_Error 异常,至少在原则上是这样,并且应该可以处理该异常。)

另一方面,如果您有一种尝试分配内存并且能够如果报告执行此操作失败(如 C 的 malloc() 或 C++ 的 new),那么是的,当然可以从该失败中恢复。 至少在 malloc()new 的情况下,失败的分配除了报告失败之外不会执行任何操作(它不会损坏任何内部数据结构,例如例子)。

尝试恢复是否有意义取决于应用程序。 如果应用程序在分配失败后无法成功,那么它应该尽其所能进行清理并终止。 但是,如果分配失败仅仅意味着无法执行一项特定任务,或者如果该任务仍然可以使用更少的内存以更慢的速度执行,那么继续操作是有意义的。

一个具体的例子:假设我正在使用文本编辑器。 如果我尝试在编辑器中执行某些需要大量内存的操作,并且该操作无法执行,我希望编辑器告诉我它不能执行我要求的操作并让我继续编辑< /em>. 终止而不保存我的工作将是不可接受的反应。 保存我的工作并终止会更好,但仍然不必要地对用户产生敌意。

The question is tagged "language-agnostic", but it's difficult to answer without considering the language and/or the underlying system.

If memory allocation is implicit, with no mechanism to detect whether a given allocation succeeded or not, then recovering from an out-of-memory condition may be difficult or impossible.

For example, if you call a function that attempts to allocate a huge array, most languages just don't define the behavior if the array can't be allocated. (In Ada this raises a Storage_Error exception, at least in principle, and it should be possible to handle that.)

On the other hand, if you have a mechanism that attempts to allocate memory and is able to report a failure to do so (like C's malloc() or C++'s new), then yes, it's certainly possible to recover from that failure. In at least the cases of malloc() and new, a failed allocation doesn't do anything other than report failure (it doesn't corrupt any internal data structures, for example).

Whether it makes sense to try to recover depends on the application. If the application just can't succeed after an allocation failure, then it should do whatever cleanup it can and terminate. But if the allocation failure merely means that one particular task cannot be performed, or if the task can still be performed more slowly with less memory, then it makes sense to continue operating.

A concrete example: Suppose I'm using a text editor. If I try to perform some operation within the editor that requires a lot of memory, and that operation can't be performed, I want the editor to tell me it can't do what I asked and let me keep editing. Terminating without saving my work would be an unacceptable response. Saving my work and terminating would be better, but is still unnecessarily user-hostile.

天荒地未老 2024-07-16 02:15:25

一般情况下是无法恢复的。

但是,如果您的系统包含某种形式的动态缓存,则内存不足处理程序通常可以转储缓存(甚至整个缓存)中最旧的元素。

当然,您必须确保“转储”过程不需要新的内存分配:)此外,恢复失败的特定分配可能很棘手,除非您能够将缓存转储代码直接插入分配器级别,以便故障不会传播到调用者。

In the general case, it's not recoverable.

However, if your system includes some form of dynamic caching, an out-of-memory handler can often dump the oldest elements in the cache (or even the whole cache).

Of course, you have to make sure that the "dumping" process requires no new memory allocations :) Also, it can be tricky to recover the specific allocation that failed, unless you're able to plug your cache dumping code directly at the allocator level, so that the failure isn't propagated up to the caller.

他不在意 2024-07-16 02:15:25

这取决于您所说的内存不足是什么意思。

malloc() 在大多数系统上失败时,这是因为地址空间已用完。

如果大部分内存被缓存或 mmap 区域占用,您可以通过释放缓存或取消映射来回收其中的一些内存。 然而,这确实要求您知道该内存的用途——正如您所注意到的,大多数程序要么不知道,要么它没有什么区别。

如果您对自己使用了 setrlimit() (也许是为了防止不可预见的攻击,或者可能是 root 对您这样做的),您可以放宽错误处理程序中的限制。 我经常这样做——在可能的情况下提示用户并记录事件之后。

另一方面,捕获堆栈溢出有点困难,而且不可移植。 我为 ECL 编写了一个 posixish 解决方案,并描述了 Windows 实现(如果您要走这条路线)。 几个月前它已被检查到 ECL,但如果您有兴趣,我可以挖掘原始补丁。

It depends on what you mean by running out of memory.

When malloc() fails on most systems, it's because you've run out of address-space.

If most of that memory is taken by cacheing, or by mmap'd regions, you might be able to reclaim some of it by freeing your cache or unmmaping. However this really requires that you know what you're using that memory for- and as you've noticed either most programs don't, or it doesn't make a difference.

If you used setrlimit() on yourself (to protect against unforseen attacks, perhaps, or maybe root did it to you), you can relax the limit in your error handler. I do this very frequently- after prompting the user if possible, and logging the event.

On the other hand, catching stack overflow is a bit more difficult, and isn't portable. I wrote a posixish solution for ECL, and described a Windows implementation, if you're going this route. It was checked into ECL a few months ago, but I can dig up the original patches if you're interested.

巡山小妖精 2024-07-16 02:15:25

特别是在垃圾收集环境中,如果您在应用程序的高层捕获 OutOfMemory 错误,则很可能会发现许多内容超出了范围,并且可以回收以恢复内存。

在单个过度分配的情况下,应用程序可能能够继续完美地工作。 当然,如果您有逐渐的内存泄漏,您将再次遇到问题(更可能早晚),但给应用程序一个优雅地停止运行的机会仍然是一个好主意,将未保存的更改保存在GUI应用程序的情况等

Especially in garbage collected environments, it's quote likely that if you catch the OutOfMemory error at a high level of the application, lots of stuff has gone out of scope and can be reclaimed to give you back memory.

In the case of single excessive allocations, the app may be able to continue working flawlessly. Of course, if you have a gradual memory leak, you'll just run into the problem again (more likely sooner than later), but it's still a good idea to give the app a chance to go down gracefully, save unsaved changes in the case of a GUI app, etc.

强辩 2024-07-16 02:15:25

是的,OOM是可以恢复的。 作为一个极端的例子,Unix 和 Windows 操作系统在大多数情况下都能很好地从 OOM 条件中恢复。 应用程序失败,但操作系统仍然存在(假设有足够的内存供操作系统正确启动)。

我举这个例子只是为了表明这是可以做到的。

处理 OOM 的问题实际上取决于您的程序和环境。

例如,在许多情况下,最有可能发生 OOM 的位置并不是实际从 OOM 状态恢复的最佳位置。

现在,自定义分配器可以作为代码中处理 OOM 的中心点。 Java 分配器将在实际抛出 OOM 异常之前执行完整的 GC。

您的分配器越具有“应用程序意识”,它就越适合作为 OOM 的中央处理程序和恢复代理。 再次使用 Java,它的分配器并不是特别适合应用程序。

这就是像 Java 这样的东西很容易令人沮丧的地方。 您无法覆盖分配器。 因此,虽然您可以在自己的代码中捕获 OOM 异常,但并不能说明您正在使用的某些库已正确捕获,甚至正确抛出了 OOM 异常。 创建一个永远被 OOM 异常破坏的类是微不足道的,因为某些对象被设置为 null 并且“永远不会发生”,并且它永远无法恢复。

所以,是的,OOM 是可以恢复的,但它可能非常困难,特别是在 Java 等现代环境中,并且有大量不同质量的第三方库。

Yes, OOM is recoverable. As an extreme example, the Unix and Windows operating systems recover quite nicely from OOM conditions, most of the time. The applications fail, but the OS survives (assuming there is enough memory for the OS to properly start up in the first place).

I only cite this example to show that it can be done.

The problem of dealing with OOM is really dependent on your program and environment.

For example, in many cases the place where the OOM happens most likely is NOT the best place to actually recover from an OOM state.

Now, a custom allocator could possibly work as a central point within the code that can handle an OOM. The Java allocator will perform a full GC before is actually throws a OOM exception.

The more "application aware" that your allocator is, the better suited it would be as a central handler and recovery agent for OOM. Using Java again, it's allocator isn't particularly application aware.

This is where something like Java is readily frustrating. You can't override the allocator. So, while you could trap OOM exceptions in your own code, there's nothing saying that some library you're using is properly trapping, or even properly THROWING an OOM exception. It's trivial to create a class that is forever ruined by a OOM exception, as some object gets set to null and "that never happen", and it's never recoverable.

So, yes, OOM is recoverable, but it can be VERY hard, particularly in modern environments like Java and it's plethora of 3rd party libraries of various quality.

苍风燃霜 2024-07-16 02:15:25

这是一个很难的问题。 乍一看,失去记忆似乎意味着“运气不好”,但是,您还必须看到,如果一个人真正坚持的话,可以摆脱许多与记忆相关的东西。 让我们以其他方式来看损坏的函数 strtok,一方面它在内存方面没有问题。 然后将 Glib 库中的 g_string_split 作为对应项,它很大程度上依赖于内存分配,就像基于 glib 或 GObject 的程序中的几乎所有内容一样。 可以肯定地说,在更动态的语言中,内存分配的使用比在更不灵活的语言中要多得多,尤其是 C。但让我们看看替代方案。 如果内存不足而结束程序,那么即使是精心开发的代码也可能会停止工作。 但如果您有可恢复的错误,您可以采取一些措施。 因此,使其可恢复的论点意味着人们可以选择以不同的方式“处理”这种情况(例如,为紧急情况留出一个内存块,或者降级为内存占用较少的程序)。

那么最有说服力的理由就是。 如果您提供一种恢复方法,可以尝试恢复,如果您没有选择,则一切都取决于始终获得足够的内存...

问候

This is a difficult question. On first sight it seems having no more memory means "out of luck" but, you must also see that one can get rid of many memory related stuff if one really insist. Let's just take the in other ways broken function strtok which on one hand has no problems with memory stuff. Then take as counterpart g_string_split from the Glib library, which heavily depends on allocation of memory as nearly everything in glib or GObject based programs. One can definitly say in more dynamic languages memory allocation is much more used as in more inflexible languages, especially C. But let us see the alternatives. If you just end the program if you run out of memory, even careful developed code may stop working. But if you have a recoverable error, you can do something about it. So the argument, making it recoverable means that one can choose to "handle" that situation differently (e.g putting aside a memory block for emergencies, or degradation to a less memory extensive program).

So the most compelling reason is. If you provide a way of recovering one can try the recoverying, if you do not have the choice all depends on always getting enough memory...

Regards

〆凄凉。 2024-07-16 02:15:25

现在我很困惑。

在工作中,我们有一堆应用程序一起工作,而内存不足。 虽然问题是要么使应用程序包成为 64 位(因此,能够超出我们在普通 Win32 操作系统上的 2 个 Go 限制),和/或减少内存的使用,但“如何从 OOM 中恢复”一直困扰着我。

当然,我没有解决方案,但仍然尝试为 C++ 寻找一个解决方案(主要是因为 RAII 和异常)。

也许一个应该正常恢复的进程应该将其处理分解为原子/可回滚任务(即仅使用提供强/不抛出异常保证的函数/方法),并保留用于恢复目的的“内存缓冲区/池”。

如果其中一项任务失败,C++ bad_alloc 将展开堆栈,通过 RAII 释放一些堆栈/堆内存。 然后,恢复功能将尽可能多地进行挽救(将任务的初始数据保存在磁盘上,以供以后尝试使用),并且可能注册任务数据以供以后尝试。

我确实相信使用 C++ 强/无抛出保证可以帮助进程在低可用内存条件下生存,即使它类似于内存交换(即缓慢、有些无响应等),但当然,这是只是理论。 在尝试模拟这个之前,我只需要在这个主题上变得更聪明(即创建一个 C++ 程序,使用内存有限的自定义 new/delete 分配器,然后尝试在这些压力条件下做一些工作)。

出色地...

It's just puzzling me now.

At work, we have a bundle of applications working together, and memory is running low. While the problem is either make the application bundle go 64-bit (and so, be able to work beyond the 2 Go limits we have on a normal Win32 OS), and/or reduce our use of memory, this problem of "How to recover from a OOM" won't quit my head.

Of course, I have no solution, but still play at searching for one for C++ (because of RAII and exceptions, mainly).

Perhaps a process supposed to recover gracefully should break down its processing in atomic/rollback-able tasks (i.e. using only functions/methods giving strong/nothrow exception guarantee), with a "buffer/pool of memory" reserved for recovering purposes.

Should one of the task fails, the C++ bad_alloc would unwind the stack, free some stack/heap memory through RAII. The recovering feature would then salvage as much as possible (saving the initial data of the task on the disk, to use on a later try), and perhaps register the task data for later try.

I do believe the use of C++ strong/nothrow guanrantees can help a process to survive in low-available-memory conditions, even if it would be akin memory swapping (i.e. slow, somewhat unresponding, etc.), but of course, this is only theory. I just need to get smarter on the subject before trying to simulate this (i.e. creating a C++ program, with a custom new/delete allocator with limited memory, and then try to do some work under those stressful condition).

Well...

×纯※雪 2024-07-16 02:15:25

内存不足通常意味着您必须放弃正在做的事情。 不过,如果您小心清理,它可以使程序本身保持运行并能够响应其他请求。 让程序说“抱歉,内存不足,无法执行”比说“抱歉,内存不足,正在关闭”要好。

Out of memory normally means you have to quit whatever you were doing. If you are careful about cleanup, though, it can leave the program itself operational and able to respond to other requests. It's better to have a program say "Sorry, not enough memory to do " than say "Sorry, out of memory, shutting down."

养猫人 2024-07-16 02:15:25

内存不足可能是由于可用内存耗尽或尝试分配不合理的大块(例如一个演出)引起的。 在“耗尽”情况下,内存短缺对系统来说是全局性的,通常会影响其他应用程序和系统服务,并且整个系统可能会变得不稳定,因此明智的做法是忘记并重新启动。 在“不合理的大块”情况下,实际上不会发生短缺,并且可以安全地继续。 问题是您无法自动检测您所处的情况。因此,更安全的做法是使错误不可恢复,并为遇到此错误的每种情况找到解决方法 - 让您的程序使用更少的内存,或者在某些情况下只需修复调用内存分配的代码中的错误。

Out of memory can be caused either by free memory depletion or by trying to allocate an unreasonably big block (like one gig). In "depletion" cases memory shortage is global to the system and usually affects other applications and system services and the whole system might become unstable so it's wise to forget and reboot. In "unreasonably big block" cases no shortage actually occurs and it's safe to continue. The problem is you can't automatically detect which case you're in. So it's safer to make the error non-recoverable and find a workaround for each case you encounter this error - make your program use less memory or in some cases just fix bugs in code that invokes memory allocation.

魔法唧唧 2024-07-16 02:15:25

这里已经有很多好的答案了。 但我想从另一个角度做出贡献。

几乎任何可重复使用的资源的耗尽一般都应该是可以恢复的。 原因是程序的每个部分基本上都是子程序。 仅仅因为一个子程序在这一时间点无法完成它的结束,并不意味着程序的整个状态都是垃圾。 仅仅因为停车场停满了汽车并不意味着您就扔掉了您的汽车。 您要么等待一段时间等待有空位,要么开车到更远的商店购买饼干。

在大多数情况下,还有另一种方法。 使错误变得不可恢复,实际上会消除很多选择,而且我们都不喜欢让任何人来决定我们能做什么和不能做什么。

这同样适用于磁盘空间。 确实是同一个道理。 与您关于堆栈溢出是不可恢复的暗示相反,我想说这是任意限制。 没有充分的理由表明您不应该抛出异常(弹出大量帧),然后使用另一种效率较低的方法来完成工作。

我的两分钱:-)

There are already many good answers here. But I'd like to contribute with another perspective.

Depletion of just about any reusable resource should be recoverable in general. The reasoning is that each and every part of a program is basically a sub program. Just because one sub cannot complete to it's end at this very point in time, does not mean that the entire state of the program is garbage. Just because the parking lot is full of cars does not mean that you trash your car. Either you wait a while for a booth to be free, or you drive to a store further away to buy your cookies.

In most cases there is an alternative way. Making an out of error unrecoverable, effectively removes a lot of options, and none of us like to have anyone decide for us what we can and cannot do.

The same applies to disk space. It's really the same reasoning. And contrary to your insinuation about stack overflow is unrecoverable, i would say that it's and arbitrary limitation. There is no good reason that you should not be able to throw an exception (popping a lot of frames) and then use another less efficient approach to get the job done.

My two cents :-)

影子的影子 2024-07-16 02:15:25

如果你真的失去了记忆,你就注定失败,因为你无法再释放任何东西了。

如果你的内存不足,但是像垃圾收集器这样的东西可以启动并释放一些内存,那么你还没有死。

另一个问题是碎片化。 尽管您可能没有内存不足(碎片),但您可能仍然无法分配您想要的大块内存。

If you are really out of memory you are doomed, since you can not free anything anymore.

If you are out of memory, but something like a garbage collector can kick in and free up some memory you are non dead yet.

The other problem is fragmentation. Although you might not be out of memory (fragmented), you might still not be able to allocate the huge chunk you wanna have.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文