识别 Perl/DBI 代码中的内存问题

发布于 2024-10-27 23:06:13 字数 1269 浏览 8 评论 0原文

首先 - 这不是我的代码 - 代码存在问题，我正在尝试找出如何调试该问题。如果有机会的话，我会对代码进行很多更改（过多的大括号、全局变量、使用 join 函数而不是 foreach 等等）。它充满了不好的做法，但这不是我需要帮助的。

这是 Perl 代码的片段（没有子例程，没有什么特别的 - 基本上打开一个文件来获取查询结果，执行查询，并将结果转储到文件中）：

# earlier in the program, @row, $field, and $output are all declared globally, like this:
my @row;
my $field;
my $output;

# a file is opened for output, with filehandle ROWOUT
# a database statement handle (DBD::DB2) is executed

while ( @{row} = ${sth}->fetchrow_array ) {
    foreach ${field}( @{row} ) {
        ${field} =~ s/\s+$//;
        ${output} = "${output}\~${field}";
    }

    ${output} =~ s/\~//;
    print ROWOUT "${output}\n";
    undef ${output};
}

在 while 循环中的某个位置，Perl 脚本因内存不足！错误而崩溃（不是彻底的崩溃 - 它只是停止运行并显示该消息。）

在大多数运行中，此查询的量非常小。这次脚本崩溃时的查询结果要大得多（仍然不是很大）：150,000 行，每行大约 1200 字节宽。

我想到的事情：

DBI 的 fetchrow_array 函数足够聪明，不会将完整的数据集拉入内存，对吗？我的假设是数据位于数据库上，并且 fetchrow_array 一次检索一行，因此即使您有 100 亿行，也不应该出现内存问题 - 这是正确的吗？
对 $output 变量调用 undef 将释放它正在使用的内存，对吗？如果没有，那可能是另一个可能存在内存问题的地方。
每次检索新行时，@row 变量正在使用的内存都会被重新使用（？），对吗？如果没有，我可以看到使用全局数组来存储每一行可能会耗尽内存。

我希望有一些明显的事情是我不理解的。如果通过查看代码没有明显的东西，我可以使用哪些技术来调试这个问题？

提前致谢！

原文

To start out with - this is not my code - there was a problem with the code and I'm trying to find out how to debug the problem. There are plenty of changes I would make to the code if I was given the chance (excessive braces, global variables, use the join function instead of foreach, etc., etc. etc.). It is full of bad practice, but that is not what I need help with.

Here is a snippet of the Perl code (There are no subroutines, nothing special - basically open a file for the results of the query, execute the query, and dump the results to a file):

# earlier in the program, @row, $field, and $output are all declared globally, like this:
my @row;
my $field;
my $output;

# a file is opened for output, with filehandle ROWOUT
# a database statement handle (DBD::DB2) is executed

while ( @{row} = ${sth}->fetchrow_array ) {
    foreach ${field}( @{row} ) {
        ${field} =~ s/\s+$//;
        ${output} = "${output}\~${field}";
    }

    ${output} =~ s/\~//;
    print ROWOUT "${output}\n";
    undef ${output};
}

Somewhere in the while loop, the Perl script is crashing with an Out of Memory! error (not a clean crash - it just stops running with that message.)

In most runs, the volume on this query is very small. The results of the query this time when the script crashed is a lot bigger (still not huge): 150,000 rows, and each row is about 1200 bytes wide.

Things that I have thought of:

The fetchrow_array function of DBI is smart enough to not pull the complete dataset into memory, correct? My assumption is that the data is on the database, and fetchrow_array retrieves one row at a time, so that even if you had 10 billion rows, you should not have a memory problem - is that correct?
Calling undef on the $output variable will free the memory that it was using, correct? If it doesn't, that could be another place where a memory problem could exist.
The memory the @row variable is using will get re-used(?) each time a new row is retrieved, correct? If not, I could see how using a global array to store each row could blow out the memory.

I am hoping there is something obvious that I am just not understanding. If there is not something obvious by looking at the code, what are some techniques I can use to debug this problem?

Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初与友歌 2024-11-03 23:06:13

可能是您（可能无意中）缓存了太多行。您可以通过检查$sth->{RowsInCache}了解已引入的数量。如果是 undef，则没有缓存，否则您将获得行数。

您还可以通过如下重写来摆脱与 $output 相关的麻烦：

while ( my @this_row = $sth->fetchrow_array ) {
    # Get rid of this line once you figure out your memory problem.
    print STDERR "Using ", ($sth->{RowsInCache} || 0), " rows in cache\n";

    print ROWOUT join('~', map { s/\s+$// } @this_row), "\n";
}

因此，假设缓存中有太多行，您可以通过以下方式限制它

my $dbh = DBI->connect($dsn, $user, $pass, { RowCacheSize => 20 })
    or die "Cannot connect to $dsn: $DBI::errstr\n";

： DBI 文档中，您可以通过使用如下值来控制缓存（假设您的驱动程序支持它）：

 0 - Automatically determine a reasonable cache size for each C<SELECT>
 1 - Disable the local row cache
>1 - Cache this many rows
<0 - Cache as many rows that will fit into this much memory for each C<SELECT>.

It might be that you're (perhaps inadvertently) caching too many rows. You can find out how many have been brought in by checking $sth->{RowsInCache}. If it's undef, then there is no cache, otherwise you'll be given the number of rows.

You can also get away from the gymnastics you're having to do with $output by rewriting it as follows:

while ( my @this_row = $sth->fetchrow_array ) {
    # Get rid of this line once you figure out your memory problem.
    print STDERR "Using ", ($sth->{RowsInCache} || 0), " rows in cache\n";

    print ROWOUT join('~', map { s/\s+$// } @this_row), "\n";
}

So, assuming you have too many rows in your cache, you can limit it via:

my $dbh = DBI->connect($dsn, $user, $pass, { RowCacheSize => 20 })
    or die "Cannot connect to $dsn: $DBI::errstr\n";

From the DBI documentation, you can control the cache (assuming your driver supports it) by using a value as follows:

 0 - Automatically determine a reasonable cache size for each C<SELECT>
 1 - Disable the local row cache
>1 - Cache this many rows
<0 - Cache as many rows that will fit into this much memory for each C<SELECT>.

回复收藏 0 原文