如何估计给定任务是否有足够的内存在 Java 中运行
我正在开发一个应用程序,允许用户设置他们希望我运行他们的算法的最大数据集大小。
很明显,大小约为 20,000,000 的数组大小会导致“内存不足”错误。因为我是通过反射来调用这个的,所以我对此无能为力。
我只是想知道,有什么方法可以根据用户堆空间设置检查/计算最大数组大小,从而在运行应用程序之前验证用户输入?
如果没有,有没有更好的解决方案?
用例:
用户提供了他们想要运行算法的数据大小,我们生成一个数字范围来测试它是否达到他们提供的限制。
我们记录运行和测量值所需的时间(以便计算出 o 符号)。
我们需要以某种方式限制用户输入,以免超出或出现此错误。理想情况下,我们希望在尽可能大的数组大小上测量 n^2 算法(这可能会持续几天的运行时间),因此我们真的不希望它运行 2 天然后失败,因为这会造成浪费
I am developing an application that allows users to set the maximum data set size they want me to run their algorithm against
It has become apparent that array sizes around 20,000,000 in size causes an 'out of memory' error. Because I am invoking this via reflection, there is not really a great deal I can do about this.
I was just wondering, is there any way I can check / calculate what the maximum array size could be based on the users heap space settings and therefore validate user entry before running the application?
If not, are there any better solutions?
Use Case:
The user provides a data size they want to run their algorithm against, we generate a scale of numbers to test it against up to the limit they provided.
We record the time it takes to run and measure the values (in order to work out the o-notation).
We need to somehow limit the users input so as to not exceed or get this error. Ideally we want to measure n^2 algorithms on as bigger array sizes as we can (which could last in terms of runtime for days) therefore we really don't want it running for 2 days and then failing as it would have been a waste of time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用 Runtime.freeMemory() 的结果来估计可用内存量。然而,实际上可能有大量内存被不可访问的对象占用,这些内存很快就会被 GC 回收。因此,您实际上可能可以使用比这更多的内存。您可以尝试之前调用 GC,但这并不能保证执行任何操作。
第二个困难是估计用户给出的数字所需的内存量。虽然计算具有如此多条目的 ArrayList 的大小很容易,但这可能还不是全部。例如,这个列表中存储了哪些对象?我希望每个条目至少有一个对象,因此您也需要添加此内存。计算任意 Java 对象的大小要困难得多(实际上,只有了解对象背后的数据结构和算法才可能实现)。然后,在算法运行期间可能会创建大量临时对象(例如装箱基元、迭代器、StringBuilder 等)。
第三,即使可用内存理论上足以运行给定任务,但实际上也可能不够。如果堆中反复填充对象,然后释放一些对象,创建一些新对象等等,那么 Java 程序会变得非常慢,这是由于大量的垃圾收集造成的。
所以在实践中,你想要实现的目标是非常困难的,而且可能几乎是不可能的。我建议尝试运行该算法并捕获 OutOfMemoryError。
通常,捕获错误是你不应该做的事情,但这似乎是一个可以的场合(我在一些类似的情况下这样做)。您应该确保一旦抛出 OutOfMemoryError,一些内存就可以被 GC 回收。这通常不是问题,因为算法中止,调用堆栈被展开,并且一些(希望是很多)对象不再可达。在您的情况下,您可能应该确保大列表是这些对象的一部分,在 OOM 的情况下这些对象立即变得无法访问。那么您很有可能在错误发生后继续您的申请。
但请注意,这并不能保证。例如,如果您有多个线程并行工作并消耗内存,则其他线程也可能会收到 OutOfMemoryError 并且无法处理此问题。此外,该算法还需要支持这样一个事实:它可能会在任意点被中断。因此,它应该确保执行必要的清理操作(当然,如果这些操作需要大量内存,那么您就会遇到麻烦!)。
You can use the result of
Runtime.freeMemory()
to estimate the amount of available memory. However, it might be that actually a lot of memory is occupied by unreachable objects, which will be reclaimed by GC soon. So you might actually be able to use more memory than this. You can try invoking the GC before, but this is not guaranteed to do anything.The second difficulty is to estimate the amount of memory needed for a number given by the user. While it is easy to calculate the size of an ArrayList with so many entries, this might not be all. For example, which objects are stored in this list? I would expect that there is at least one object per entry, so you need to add this memory too. Calculating the size of an arbitrary Java object is much more difficult (and in practice only possible if you know the data structures and algorithms behind the objects). And then there might be a lot of temporary objects creating during the run of the algorithm (for example boxed primitives, iterators, StringBuilders etc.).
Third, even if the available memory is theoretically sufficient for running a given task, it might be practically insufficient. Java programs can get very slow if the heap is repeatedly filled with objects, then some are freed, some new ones are created and so on, due to a large amount of Garbage Collection.
So in practice, what you want to achieve is very difficult and probably next to impossible. I suggest just try running the algorithm and catch the OutOfMemoryError.
Usually, catching errors is something you should not do, but this seems like an occasion where its ok (I do this in some similar cases). You should make sure that as soon as the OutOfMemoryError is thrown, some memory becomes reclaimable for GC. This is usually not a problem, as the algorithm aborts, the call stack is unwound and some (hopefully a lot of) objects are not reachable anymore. In your case, you should probably ensure that the large list is part of these objects which immediately become unreachable in the case of an OOM. Then you have a good chance of being able to continue your application after the error.
However, note that this is not a guarantee. For example, if you have multiple threads working and consuming memory in parallel, the other threads might as well receive an OutOfMemoryError and not be able to cope with this. Also the algorithm needs to support the fact that it might get interrupted at any arbitrary point. So it should make sure that the necessary cleanup actions are executed nevertheless (and of course you are in trouble if those need a lot of memory!).