运行 Java 代码时内存不足

发布于 2024-10-17 16:17:10 字数 536 浏览 3 评论 0 原文

我有一个保存为文本文件的数据集,其中基本上包含逐行存储的向量。我的向量的维度是 10k,我有 250 个这样的向量。每个向量条目都是双精度的。这是一个例子:

向量 1 -> 0.0 0.0 0.0 0.439367 0.0 .....10k 这样的条目

向量 2 -> 0.0 0.0 0.0 0.0 .....10k 这样的条目

......

0.0 0.0 0.0 0.439367

向量 250 -> 0.0 1.203973 0.0 0.0 0.0 .....10k 这样的条目

现在如果我计算一下,这应该占用 10k X 16bytes X 250 空间(假设每个向量条目是一个双精度占用 16bytes 的空间),大约 40MB 的空间。但是我看到文件大小仅显示为 9.8MB。我是不是哪里出错了?

问题是我在我的 Java 代码中使用了这些数据。我的算法的空间复杂度是 O(向量中的条目数 X 条目数)。即使当我通过分配 4GB 内存来运行代码时,我仍然会用完堆空间。我缺少什么?

谢谢。 安迪

I have a data set saved as a text file that basically contains a vectors stored line by line. My vector is 10k in dimensions and I have 250 such vectors. Each vector entry is a double. Here's an example:

Vector 1 -> 0.0 0.0 0.0 0.439367 0.0 .....10k such entries

Vector 2 -> 0.0 0.0 0.0 0.439367 0.0 0.0 0.0 0.0 .....10k such entries

...

...

Vector 250 -> 0.0 1.203973 0.0 0.0 0.0 .....10k such entries

Now if I do the math, this should take up 10k X 16bytes X 250 space (assuming each vector entry is a double taking up 16bytes of space) which is ~40MB of space. However I see that the file size is shown as 9.8MB only. Am I going wrong somewhere?

The thing is I am using this data in my Java code. The space complexity of my algorithm is O(no of entries in the vector X no of entries). Even when I run my code by allocating like 4GB of memory, I still run out of heap space. What am I missing?

Thanks.
Andy

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

余厌 2024-10-24 16:17:10

在这么多人猜测大小之后,我做了3个简单的测试,并使用Eclipse Memory Analyzer来确定大小。 (Win7、1.6.0_21 Java HotSpot (TM) 64 位服务器 VM)

  • double[][] = 大小:19,2 MB 类:328 个对象:2,7k
  • Double[] [] 结构 = 大小:76,5 MB 类:332 个对象:2,5m
  • ArrayList> = 大小:79,6 MB 类:330 个对象:2 ,5m

256MB (java -Xmx256m Huge) 足以运行测试。

所以我想问题不在于大小,可能有两件事:

  • 算法中有一个错误,
  • jvm 不能以 4GB 运行

如果有人对代码感兴趣:

import java.util.ArrayList;
import java.util.List;

public class Huge {

    private static final int NUMBER_OF_VECTORS = 250;
    private static final int VECTOR_SIZE = 10000;

    //Size: 19,2 MB Classes: 328 Objects: 2,7k 
    public static void doulbeArray() {

        double[][] structure = new double[NUMBER_OF_VECTORS][];

        for(int i = 0; i < NUMBER_OF_VECTORS; i++) {
            structure[i] = new double[VECTOR_SIZE];
        }
    }

    //Size: 76,5 MB Classes: 332 Objects: 2,5m
    public static void doubleWrapperArray() {

        Double[][] structure = new Double[NUMBER_OF_VECTORS][];

        for(int i = 0; i < NUMBER_OF_VECTORS; i++) {
            structure[i] = new Double[VECTOR_SIZE];
            for (int k = 0; k < VECTOR_SIZE; k++) {
                structure[i][k] = Double.valueOf(Math.random());
            }
        }
    }

    //Size: 79,6 MB Classes: 330 Objects: 2,5m 
    public static void list() {

        List<List<Double>> structure = new ArrayList<List<Double>>(); 

        for(int i = 0; i < NUMBER_OF_VECTORS; i++) {
            List<Double> vector = new ArrayList<Double>();            
            for (int k = 0; k < VECTOR_SIZE; k++) {
                vector.add(Double.valueOf(Math.random()));
            }
            structure.add(vector);
        }
    }
}

After so many people guessing about the size, I have done 3 simple test, and used the Eclipse Memory Analyzer to determine the size. (Win7, 1.6.0_21 Java HotSpot (TM) 64-Bit Server VM)

  • double[][] = Size: 19,2 MB Classes: 328 Objects: 2,7k
  • Double[][] structure = Size: 76,5 MB Classes: 332 Objects: 2,5m
  • ArrayList<ArrayList<Double>> = Size: 79,6 MB Classes: 330 Objects: 2,5m

256MB (java -Xmx256m Huge) was enough to run the tests.

So I guess the problem is not the size, it could be two things:

  • there is a bug in the algorithm
  • the jvm does not run with 4GB

If somebody is interessed in the code:

import java.util.ArrayList;
import java.util.List;

public class Huge {

    private static final int NUMBER_OF_VECTORS = 250;
    private static final int VECTOR_SIZE = 10000;

    //Size: 19,2 MB Classes: 328 Objects: 2,7k 
    public static void doulbeArray() {

        double[][] structure = new double[NUMBER_OF_VECTORS][];

        for(int i = 0; i < NUMBER_OF_VECTORS; i++) {
            structure[i] = new double[VECTOR_SIZE];
        }
    }

    //Size: 76,5 MB Classes: 332 Objects: 2,5m
    public static void doubleWrapperArray() {

        Double[][] structure = new Double[NUMBER_OF_VECTORS][];

        for(int i = 0; i < NUMBER_OF_VECTORS; i++) {
            structure[i] = new Double[VECTOR_SIZE];
            for (int k = 0; k < VECTOR_SIZE; k++) {
                structure[i][k] = Double.valueOf(Math.random());
            }
        }
    }

    //Size: 79,6 MB Classes: 330 Objects: 2,5m 
    public static void list() {

        List<List<Double>> structure = new ArrayList<List<Double>>(); 

        for(int i = 0; i < NUMBER_OF_VECTORS; i++) {
            List<Double> vector = new ArrayList<Double>();            
            for (int k = 0; k < VECTOR_SIZE; k++) {
                vector.add(Double.valueOf(Math.random()));
            }
            structure.add(vector);
        }
    }
}
病毒体 2024-10-24 16:17:10

在没有看到代码的情况下,我不能肯定地说,但是当您a)从文件中读取数据或b)算法中的某个位置时,听起来您正在过度分配。我建议您使用诸如 VisualVM 之类的工具来检查您的对象分配 - 它将能够告诉您如何分配以及犯了哪些错误。

Without seeing the code, I can't say for certain, but it sounds like you're over-allocating when you either a) read the data from the file or b) somewhere in your algorithm. I would advise that you use a tool such as visualVM to review your object allocation- it will be able to tell you how you're allocating and what mistakes you're making.

泼猴你往哪里跑 2024-10-24 16:17:10

现在如果我算一下,这应该需要
最多 10k X 16bytes X 250 空间(假设
每个向量条目都是双重获取
最多 16 字节的空间),大约 40MB
空间。但是我看到该文件
大小仅显示为 9.8MB。我是吗
哪里出了问题?

错误之处在于假设每个 double 在保存为文本时占用 16 个字节的空间。您似乎有很多 0 值,它们仅占用字符串形式的 4 个字节(包括分隔符)。

即使我通过分配来运行我的代码
比如4GB内存,我还是用完了
堆空间。我错过了什么?

这取决于你的代码。原因之一可能是您将数据存储在 ArrayList 或(更糟糕的)TreeSet - Double 包装器中对象很容易导致 200% 的内存开销 - 而 Set/Map 结构更糟糕。

Now if I do the math, this should take
up 10k X 16bytes X 250 space (assuming
each vector entry is a double taking
up 16bytes of space) which is ~40MB of
space. However I see that the file
size is shown as 9.8MB only. Am I
going wrong somewhere?

Where you're going wrong is the assumption that every double takes 16 bytes of space when saved as text. You seem to have lots of 0 values, which take only 4 bytes in string form (including separator).

Even when I run my code by allocating
like 4GB of memory, I still run out of
heap space. What am I missing?

That depends on your code. One reason might be that you're storing your data in an ArrayList<Double> or (worse) TreeSet<Double> - the Double wrapper objects will cause a memory overhead of easily 200% - and the Set/Map structures are much worse.

感情旳空白 2024-10-24 16:17:10

如果没有看到代码和 VM 参数,很难说。但请注意,算法中的变量也会消耗内存。文件大小与内存使用情况取决于您构建内存中对象的方式,例如,没有 double 的简单对象会自行占用空间。

获取合适的工具来对内存使用情况进行基准测试。查看 TPTP Eclipse 发行版

另外,您可能想查看稀疏矩阵

Hard to say without seeing the code and VM arguments. But note that variables in your algorithm also consume memory. And that file size vs memory usage depends on how you construct your in-memory objects, for example a simple object without a double takes up space on its own.

Get a proper tool for benchmarking memory usage. Check out the TPTP Eclipse distribution.

Also, do you might want to check out sparce matrixes.

感性不性感 2024-10-24 16:17:10

如果我们看不到代码(这很公平),我只能说在启动应用程序时使用 -XX:+HeapDumpOnOutOfMemoryError 命令行选项,然后分析生成的堆转储与jhat

If we can't see the code (which is fair enough), all I can say is to use the -XX:+HeapDumpOnOutOfMemoryError command line option when you start your application, then analyse the resulting heap dump with jhat.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文