内存不足错误：Java 堆空间。如何修复递归方法中发生的错误？

发布于 2024-11-27 04:34:44 字数 1309 浏览 2 评论 0原文

我有一个 Java 应用程序，它解析目录及其子目录中的 pdf 文件，并使用文件中找到的信息创建数据库。

当我在大约 900 个文件上使用该程序时，一切都很好（这创建了一个包含多个表的 SQLite 数据库，其中一些包含 150k 行）。

现在，我尝试在更大的数据集（大约 2000 个文件）上运行我的程序，在某些时候我收到“OutOfMemoryError: Java Heap space”。我将 jdev.conf 文件中的以下行更改

AddVMOption  -XX:MaxPermSize=256M

为 512M，并且遇到了相同的错误（尽管我认为是后来的）。我打算再次将其更改为更大的东西，但问题是使用该程序的计算机要旧得多，因此没有那么多内存。通常，用户一次不会添加超过 30 个文件，但我想知道应该限制它们的文件数量。理想情况下，我希望我的程序无论要解析多少个文件都不会抛出错误。

起初，我认为是我的 SQLite 查询导致了错误，但在 Google 上阅读后，这可能是一些递归函数。我将它（我认为它至少是正确的）隔离到这个函数：

 public static void visitAllDirsAndFiles(File dir) {
      if(dir.isDirectory()) 
      {
        String[] children = dir.list();
        for (int i=0; i<children.length; i++) 
        {
          visitAllDirsAndFiles(new File(dir, children[i]));
        }
      }
      else
      {
        try
        {          
          BowlingFilesReader.readFile(dir);
        }
        catch(Exception exc)
        {
          exc.printStackTrace();
          System.out.println("Other Exception in file: " + dir);
        }
      }
  }

我认为问题可能是它为每个后续目录递归地调用这个函数，但我真的不确定这可能是问题。你怎么认为？如果可能的话，我该如何做才能不再出现此错误？如果您认为这部分不可能单独导致问题，我将尝试找出程序的其他部分可能导致该问题。

我能看到的唯一另一件事是我在调用上述方法之前连接到数据库，并在它返回后断开连接。原因是，如果我在每个文件后连接和断开连接，我的程序需要更长的时间来解析数据，所以我真的不想改变它。

原文

I have a Java application that parses pdf files in a directory and its subdirectories and creates a database using the information found in the files.

Everything was fine when I was using the program on around 900 files or so (which create a SQLite database with multiple tables, some of wihch contain 150k rows).

Now I'm trying to run my program on a larger set of data (around 2000 files) and at some point I get "OutOfMemoryError: Java Heap space". I changed the following line in my jdev.conf file:

AddVMOption  -XX:MaxPermSize=256M

to 512M and I got the same error (though later, I think). I'm going to change it to something bigger again, but the thing is the computers this program will be used on are much older and thus don't have as much memory. Normally, the users are not going to add more than 30 files at a time, but I want to know at how many files I'm supposed to limit them to. Ideally, I'd like my program not to throw an error regardless of how many files are to be parsed.

At first, I thought it was my SQLite queries that were causing the error, but after reading up on Google, it's probably some recursive function. I isolated it (I think it's the correct one at least), to this function:

 public static void visitAllDirsAndFiles(File dir) {
      if(dir.isDirectory()) 
      {
        String[] children = dir.list();
        for (int i=0; i<children.length; i++) 
        {
          visitAllDirsAndFiles(new File(dir, children[i]));
        }
      }
      else
      {
        try
        {          
          BowlingFilesReader.readFile(dir);
        }
        catch(Exception exc)
        {
          exc.printStackTrace();
          System.out.println("Other Exception in file: " + dir);
        }
      }
  }

I think the problem might be that it recursively calls this function for each subsequent directory, but I'm really not sure that could be the problem. What do you think? If it might be, how can I make it so I don't get this error again? If you think it is impossible that this section alone causes the problem, I'll try to find which other part of the program can cause it.

The only other thing I can see causing that is that I connect to the database before calling the above method and I disconnect after it returns. The reason for that is that if I connect and disconnect after each file, my programs takes a lot longer to parse the data, so I'd really like not to have to change that.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

死开点丶别碍眼 2024-12-04 04:34:44

MaxPermSize 只会改变你的永久空间。您的堆空间即将耗尽。使用 -Xmx 属性增加最大堆大小

回复收藏 0 原文

余生一个溪 2024-12-04 04:34:44

如果问题的根源是递归，您将收到与堆栈而不是堆相关的错误。似乎您在 BowlingFilesReader 中存在某种内存泄漏...

回复收藏 0 原文

暗地喜欢 2024-12-04 04:34:44

我建议您尝试使用类似的方法增加堆空间。

-mx1000m

如果您有 64 位 JVM，则最多可以使用机器总内存的 80% 左右。如果您有 32 位 JVM，则可能会限制在 1200 到 1400 MB 左右，具体取决于操作系统。

I suggest you try increasing the heap space with something like

-mx1000m

If you have a 64-bit JVM you can use up to about 80% of the total memory of the machine. If you have a 32-bit JVM you may be limited to around 1200 to 1400 MB depending on the OS.

回复收藏 0 原文

夜司空 2024-12-04 04:34:44

BowlingFilesReader.readFile(dir); 是可疑的。它加载到内存中的量是多少，为什么？如果它将相当大的目录中的所有文件加载到内存中，那就是一个问题。

您还可以尝试

java -Xmx 1G 或更多，具体取决于您的 RAM 情况。

您始终可以尝试使用堆栈而不是递归函数。

S = []
while( !S.isEmpty() ){
   S.pop()
   //operate
   S.push( all of the current item's children )
}

BowlingFilesReader.readFile(dir); is suspicious. How much is it loading into memory, and why? If it's loading all files in a rather large directory into memory, that's an issue.

You also might try

java -Xmx 1G or more, depending on your RAM situation.

You could always try using a stack instead of a recursive function.

S = []
while( !S.isEmpty() ){
   S.pop()
   //operate
   S.push( all of the current item's children )
}

回复收藏 0 原文

输什么也不输骨气 2024-12-04 04:34:44

我认为您应该下载内存分析器工具 MAT 的副本。一旦您进行了堆转储，将其加载到 MAT 中，运行 Leak Suspect 报告，您应该能够很快找出问题所在。

回复收藏 0 原文

陌路黄昏 2024-12-04 04:34:44

@Adam Smith 对于你的问题，

The same problem happened... I'm going to close my ResultSets, 
PreparedStatements and Statements now, but can you explain 
why I have to close them? Don't they get de-allocated when 
the method returns (thus they're no longer in the scope of any methods)?

大多数 Jave IDE 都有内置的 JProfiler 或可用插件，集成你的项目，使用探查器运行，然后你会看到运行时中存在的所有对象，没有什么复杂的，

然后你必须关闭：

文件 I/O 示例此处，JDBC 介绍（页面底部的示例），并检查并避免打开大量连接（不仅是 JDBC Conn），创建一个并重用它，如果一切完成，您也可以关闭此 Conn，（连接在 PC 和服务器上都是困难且缓慢的动作），所有流对象都必须在 Thefinally 块，因为

正如我提到的，这些对象永远不会从 JVM 使用的内存和大多数中消失，所以总是有效...从来没有GC'ed（有关更多详细信息，请在此论坛上搜索），GC 永远不会立即工作

    Runtime runtime = Runtime.getRuntime();
    long total = runtime.totalMemory();
    long free = runtime.freeMemory();
    long max = runtime.maxMemory();
    long used = total - free;   
    System.out.println(Math.round(max / 1e6) + " MB available before Cycle");
    System.out.println(Math.round(total / 1e6) + " MB allocated before Cycle");
    System.out.println(Math.round(free / 1e6) + " MB free before Cycle");
    System.out.println(Math.round(used / 1e6) + " MB used before Cycle");
    //.... your code with 
    //.....
    runtime = Runtime.getRuntime();
    long total = runtime.totalMemory();
    long free = runtime.freeMemory();
    long max = runtime.maxMemory();
    long used = total - free;
    System.out.println(Math.round(max / 1e6) + " MB available past Cycle");
    System.out.println(Math.round(total / 1e6) + " MB allocated past Cycle");
    System.out.println(Math.round(free / 1e6) + " MB free past Cycle");
    System.out.println(Math.round(used / 1e6) + " MB used past Cycle");        

    runtime = Runtime.getRuntime();
    runtime.gc();

    //dealyed with some Timer ... 
    long total = runtime.totalMemory();
    long free = runtime.freeMemory();
    long max = runtime.maxMemory();
    long used = total - free;
    System.out.println(Math.round(max / 1e6) + " MB available after GC");
    System.out.println(Math.round(total / 1e6) + " MB allocated after GC");
    System.out.println(Math.round(free / 1e6) + " MB free after GC");
    System.out.println(Math.round(used / 1e6) + " MB used after GC");

此论坛上的更多信息，并且:-)用英语描述:-)

@Adam Smith to your question(s)

The same problem happened... I'm going to close my ResultSets, 
PreparedStatements and Statements now, but can you explain 
why I have to close them? Don't they get de-allocated when 
the method returns (thus they're no longer in the scope of any methods)?

most Jave IDE has built-in JProfiler or available plugin, integrate your project, run with profiler and then you'll see all Objects which are presents in Runtime, nothing complicated

then you have to close:

File I/O example here , JDBC Introduction (example on page's bottom), and to check and avoids to opening lots of Connections (not only JDBC Conn), create one and reuse that, if everything done you can Close this Conn too, (Connection is hard and slower action on both sides, on PC and Server too), all Streamed Object must be closed in The finally Block, because always works

as I mentioned these Object never gone from JVM UsedMemory and majorities ... never are GC'ed (for more details search on this forum), GC never works immediatelly

    Runtime runtime = Runtime.getRuntime();
    long total = runtime.totalMemory();
    long free = runtime.freeMemory();
    long max = runtime.maxMemory();
    long used = total - free;   
    System.out.println(Math.round(max / 1e6) + " MB available before Cycle");
    System.out.println(Math.round(total / 1e6) + " MB allocated before Cycle");
    System.out.println(Math.round(free / 1e6) + " MB free before Cycle");
    System.out.println(Math.round(used / 1e6) + " MB used before Cycle");
    //.... your code with 
    //.....
    runtime = Runtime.getRuntime();
    long total = runtime.totalMemory();
    long free = runtime.freeMemory();
    long max = runtime.maxMemory();
    long used = total - free;
    System.out.println(Math.round(max / 1e6) + " MB available past Cycle");
    System.out.println(Math.round(total / 1e6) + " MB allocated past Cycle");
    System.out.println(Math.round(free / 1e6) + " MB free past Cycle");
    System.out.println(Math.round(used / 1e6) + " MB used past Cycle");        

    runtime = Runtime.getRuntime();
    runtime.gc();

    //dealyed with some Timer ... 
    long total = runtime.totalMemory();
    long free = runtime.freeMemory();
    long max = runtime.maxMemory();
    long used = total - free;
    System.out.println(Math.round(max / 1e6) + " MB available after GC");
    System.out.println(Math.round(total / 1e6) + " MB allocated after GC");
    System.out.println(Math.round(free / 1e6) + " MB free after GC");
    System.out.println(Math.round(used / 1e6) + " MB used after GC");

more infos on this forum and :-) described in English language :-)

回复收藏 0 原文

~没有更多了~