识别 Perl/DBI 代码中的内存问题

发布于 2024-10-27 23:06:13 字数 1269 浏览 8 评论 0原文

首先 - 这不是我的代码 - 代码存在问题,我正在尝试找出如何调试该问题。如果有机会的话,我会对代码进行很多更改(过多的大括号、全局变量、使用 join 函数而不是 foreach 等等)。它充满了不好的做法,但这不是我需要帮助的。

这是 Perl 代码的片段(没有子例程,没有什么特别的 - 基本上打开一个文件来获取查询结果,执行查询,并将结果转储到文件中):

# earlier in the program, @row, $field, and $output are all declared globally, like this:
my @row;
my $field;
my $output;

# a file is opened for output, with filehandle ROWOUT
# a database statement handle (DBD::DB2) is executed

while ( @{row} = ${sth}->fetchrow_array ) {
    foreach ${field}( @{row} ) {
        ${field} =~ s/\s+$//;
        ${output} = "${output}\~${field}";
    }

    ${output} =~ s/\~//;
    print ROWOUT "${output}\n";
    undef ${output};
}

在 while 循环中的某个位置,Perl 脚本因内存不足!错误而崩溃(不是彻底的崩溃 - 它只是停止运行并显示该消息。)

在大多数运行中,此查询的量非常小。这次脚本崩溃时的查询结果要大得多(仍然不是很大):150,000 行,每行大约 1200 字节宽。

我想到的事情:

  1. DBI 的 fetchrow_array 函数足够聪明,不会将完整的数据集拉入内存,对吗?我的假设是数据位于数据库上,并且 fetchrow_array 一次检索一行,因此即使您有 100 亿行,也不应该出现内存问题 - 这是正确的吗?
  2. $output 变量调用 undef 将释放它正在使用的内存,对吗?如果没有,那可能是另一个可能存在内存问题的地方。
  3. 每次检索新行时,@row 变量正在使用的内存都会被重新使用(?),对吗?如果没有,我可以看到使用全局数组来存储每一行​​可能会耗尽内存。

我希望有一些明显的事情是我不理解的。如果通过查看代码没有明显的东西,我可以使用哪些技术来调试这个问题?

提前致谢!

To start out with - this is not my code - there was a problem with the code and I'm trying to find out how to debug the problem. There are plenty of changes I would make to the code if I was given the chance (excessive braces, global variables, use the join function instead of foreach, etc., etc. etc.). It is full of bad practice, but that is not what I need help with.

Here is a snippet of the Perl code (There are no subroutines, nothing special - basically open a file for the results of the query, execute the query, and dump the results to a file):

# earlier in the program, @row, $field, and $output are all declared globally, like this:
my @row;
my $field;
my $output;

# a file is opened for output, with filehandle ROWOUT
# a database statement handle (DBD::DB2) is executed

while ( @{row} = ${sth}->fetchrow_array ) {
    foreach ${field}( @{row} ) {
        ${field} =~ s/\s+$//;
        ${output} = "${output}\~${field}";
    }

    ${output} =~ s/\~//;
    print ROWOUT "${output}\n";
    undef ${output};
}

Somewhere in the while loop, the Perl script is crashing with an Out of Memory! error (not a clean crash - it just stops running with that message.)

In most runs, the volume on this query is very small. The results of the query this time when the script crashed is a lot bigger (still not huge): 150,000 rows, and each row is about 1200 bytes wide.

Things that I have thought of:

  1. The fetchrow_array function of DBI is smart enough to not pull the complete dataset into memory, correct? My assumption is that the data is on the database, and fetchrow_array retrieves one row at a time, so that even if you had 10 billion rows, you should not have a memory problem - is that correct?
  2. Calling undef on the $output variable will free the memory that it was using, correct? If it doesn't, that could be another place where a memory problem could exist.
  3. The memory the @row variable is using will get re-used(?) each time a new row is retrieved, correct? If not, I could see how using a global array to store each row could blow out the memory.

I am hoping there is something obvious that I am just not understanding. If there is not something obvious by looking at the code, what are some techniques I can use to debug this problem?

Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

初与友歌 2024-11-03 23:06:13

可能是您(可能无意中)缓存了太多行。您可以通过检查$sth->{RowsInCache}了解已引入的数量。如果是 undef,则没有缓存,否则您将获得行数。

您还可以通过如下重写来摆脱与 $output 相关的麻烦:

while ( my @this_row = $sth->fetchrow_array ) {
    # Get rid of this line once you figure out your memory problem.
    print STDERR "Using ", ($sth->{RowsInCache} || 0), " rows in cache\n";

    print ROWOUT join('~', map { s/\s+$// } @this_row), "\n";
}

因此,假设缓存中有太多行,您可以通过以下方式限制它

my $dbh = DBI->connect($dsn, $user, $pass, { RowCacheSize => 20 })
    or die "Cannot connect to $dsn: $DBI::errstr\n";

: DBI 文档中,您可以通过使用如下值来控制缓存(假设您的驱动程序支持它):

 0 - Automatically determine a reasonable cache size for each C<SELECT>
 1 - Disable the local row cache
>1 - Cache this many rows
<0 - Cache as many rows that will fit into this much memory for each C<SELECT>.

It might be that you're (perhaps inadvertently) caching too many rows. You can find out how many have been brought in by checking $sth->{RowsInCache}. If it's undef, then there is no cache, otherwise you'll be given the number of rows.

You can also get away from the gymnastics you're having to do with $output by rewriting it as follows:

while ( my @this_row = $sth->fetchrow_array ) {
    # Get rid of this line once you figure out your memory problem.
    print STDERR "Using ", ($sth->{RowsInCache} || 0), " rows in cache\n";

    print ROWOUT join('~', map { s/\s+$// } @this_row), "\n";
}

So, assuming you have too many rows in your cache, you can limit it via:

my $dbh = DBI->connect($dsn, $user, $pass, { RowCacheSize => 20 })
    or die "Cannot connect to $dsn: $DBI::errstr\n";

From the DBI documentation, you can control the cache (assuming your driver supports it) by using a value as follows:

 0 - Automatically determine a reasonable cache size for each C<SELECT>
 1 - Disable the local row cache
>1 - Cache this many rows
<0 - Cache as many rows that will fit into this much memory for each C<SELECT>.
百思不得你姐 2024-11-03 23:06:13

提高跟踪级别,并在 Perl 和 GDB 调试器下运行代码。您需要找出流程到底在哪里失控。

如果您没有运行最新版本的相关模块和数据库,请考虑您可能发现了已修复的旧错误。

Increase trace level, and run the code under the Perl and GDB debuggers. You need to find out where exactly the process goes out of control.

If you are not running the latest version of the relevant modules and DB, consider the possibility that you have found an old bug that has already been fixed.

淡淡绿茶香 2024-11-03 23:06:13

就#1而言,我确实相信它将整个结果加载到内存中
编辑:我记得这是 DBI 中的一个选项

对于 #2 和 #3,您确实应该将变量本地化到它们使用的范围。

我怀疑您在执行后实际上耗尽了内存,尽管我知道您说过否则。您似乎不太可能在该循环中使用大量内存。当然,除非 ROWOUT 实际上是对内存中变量的引用,但如果您不提供完整的脚本,我们不知道这一点。

As far as #1 goes, I do believe it loads the entire result into memory
Edit: I recall this being an option in DBI

For #2 and #3, you should really be localizing your variables to the scope they are used in.

I suspect you are actually running out of memory after your execute, though I know you said otherwise. It seems unlikely you are using up much memory in that loop. Unless of course ROWOUT is actually a reference to a variable in memory, but we don't know that if you don't provide a complete script.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文