Perl DBI 替代 LongReadLen

发布于 2024-12-20 06:36:58 字数 1478 浏览 1 评论 0原文

我想知道使用 Perl DBI 从 Oracle 数据库中提取任意大数据字段的最节省内存的方法。我知道使用的方法是将数据库句柄上的“LongReadLen”属性设置为足够大的值。但是,我的应用程序需要提取数千条记录,因此任意执行此操作的内存效率极低。

doc 建议预先进行查询以找到最大的潜力值,并进行设置。

$dbh->{LongReadLen} = $dbh->selectrow_array(qq{
    SELECT MAX(OCTET_LENGTH(long_column_name))
    FROM table WHERE ...
});
$sth = $dbh->prepare(qq{
    SELECT long_column_name, ... FROM table WHERE ...
});

然而,这仍然是低效的,因为外围数据并不代表每条记录。最大值超过 1 MB,但平均记录小于 1 KB。我希望能够提取所有信息(即不截断),同时在未使用的缓冲区上浪费尽可能少的内存。

我考虑过的一种方法是将数据分块提取,例如一次 50 条记录,然后根据该块的记录的最大长度设置 LongReadLen。另一种解决方法可以但不必建立在块思想的基础上,即分叉一个子进程,检索数据,然后杀死子进程(带走浪费的内存)。最美妙的事情是能够强制释放 DBI 缓冲区,但我认为这是不可能的。

有没有人成功解决过类似的问题?感谢您的帮助!

编辑

Perl v5.8.8,DBI v1.52

澄清一下:内存效率低下是由于将 'LongReadLen' 与 {ora_pers_lob =>; 1}在准备中。使用此代码:

my $sql = "select myclob from my table where id = 68683";
my $dbh = DBI->connect( "dbi:Oracle:$db", $user, $pass ) or croak $DBI::errstr;

print "before";
readline( *STDIN );

$dbh->{'LongReadLen'} = 2 * 1024 * 1024;
my $sth = $dbh->prepare( $sql, {'ora_pers_lob' => 1} ) or croak $dbh->errstr;
$sth->execute() or croak( 'Cant execute_query '. $dbh->errstr . ' sql: ' . $sql );
my $row = $sth->fetchrow_hashref;

print "after";
readline( *STDIN );

“之前”的常驻内存使用量为 18MB,“之后”的使用量为 30MB。这对于大量查询来说是不可接受的。

I'd like to know the most memory-efficient way to pull arbitrarily large data fields from an Oracle db with Perl DBI. The method I know to use is to set the 'LongReadLen' attribute on the database handle to something sufficiently large. However, my application needs to pull several thousand records, so doing this arbitarily is extremely memory inefficient.

The doc suggests doing a query upfront to find the largest potential value, and setting that.

$dbh->{LongReadLen} = $dbh->selectrow_array(qq{
    SELECT MAX(OCTET_LENGTH(long_column_name))
    FROM table WHERE ...
});
$sth = $dbh->prepare(qq{
    SELECT long_column_name, ... FROM table WHERE ...
});

However, this is still inefficient, since the outlying data is not representative of every record. The largest values are in excess of a MB, but the average record is less than a KB. I want to be able to pull all of the informatoin (i.e., no truncation) while wasting as little memory on unused buffers as possible.

A method I've considered is to pull the data in chunks, say 50 records a time, and set LongReadLen against the max length of records of that chunk. Another work around, which could, but doesn't have to, build on the chunk idea, would be to fork a child process, retrieve the data, and then kill the child (taking the wasted memory with it). The most wonderful thing would be the ability to force-free the DBI buffers, but I don't think that's possible.

Has anyone addressed a similar problem with any success? Thanks for the help!

EDIT

Perl v5.8.8, DBI v1.52

To clarify: the memory inefficiency is coming from using 'LongReadLen' together with {ora_pers_lob => 1} in the prepare. Using this code:

my $sql = "select myclob from my table where id = 68683";
my $dbh = DBI->connect( "dbi:Oracle:$db", $user, $pass ) or croak $DBI::errstr;

print "before";
readline( *STDIN );

$dbh->{'LongReadLen'} = 2 * 1024 * 1024;
my $sth = $dbh->prepare( $sql, {'ora_pers_lob' => 1} ) or croak $dbh->errstr;
$sth->execute() or croak( 'Cant execute_query '. $dbh->errstr . ' sql: ' . $sql );
my $row = $sth->fetchrow_hashref;

print "after";
readline( *STDIN );

Resident memory usage "before" is at 18MB and usage "after" is at 30MB. This is unacceptable over a large number of queries.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不一样的天空 2024-12-27 06:36:58

您的列是否包含大型数据 LOB(CLOB 或 BLOB)?如果是这样,则根本不需要使用 LongReadLen; DBD::Oracle 提供了 LOB 流接口。

您要做的是将绑定参数作为类型ORA_CLOBORA_BLOB,这将为您提供从查询返回的“LOB 定位器”,而不是 tex。然后,将 ora_lob_read 与 LOB 定位器一起使用来获取数据。这是对我有用的代码示例:

sub read_lob {
  my ( $dbh, $clob ) = @_;

  my $BLOCK_SIZE = 16384;

  my $out;
  my $offset = 1;

  while ( my $data = $dbh->ora_lob_read( $clob, $offset, $BLOCK_SIZE ) ) {
    $out .= $data;
    $offset += $BLOCK_SIZE;
  }
  return $out;
}

Are your columns with large data LOBs (CLOBs or BLOBs)? If so, you don't need to use LongReadLen at all; DBD::Oracle provides a LOB streaming interface.

What you want to do is to bind the param as type ORA_CLOB or ORA_BLOB, which will get you a "LOB locator" returned from the query, instead of tex. Then you use ora_lob_read together with the LOB locator to get data. Here's an example of code that's worked for me:

sub read_lob {
  my ( $dbh, $clob ) = @_;

  my $BLOCK_SIZE = 16384;

  my $out;
  my $offset = 1;

  while ( my $data = $dbh->ora_lob_read( $clob, $offset, $BLOCK_SIZE ) ) {
    $out .= $data;
    $offset += $BLOCK_SIZE;
  }
  return $out;
}
寻梦旅人 2024-12-27 06:36:58

我这样想:

use Parallel::ForkManager
use strict;

# Max 50 processes for parallel data retrieving
my $pm = new Parallel::ForkManager(50);

# while loop goes here
while (my @row = $sth->fetchrow_array) {

# do the fork
$pm->start and next;

#
# Data retreiving goes here
#

# do the exit in the child process
$pm->finish;
}
$pm->wait_all_children;

检查 Parallel ::ForkManager in CPAN 了解更多信息。

I think of it in this way :

use Parallel::ForkManager
use strict;

# Max 50 processes for parallel data retrieving
my $pm = new Parallel::ForkManager(50);

# while loop goes here
while (my @row = $sth->fetchrow_array) {

# do the fork
$pm->start and next;

#
# Data retreiving goes here
#

# do the exit in the child process
$pm->finish;
}
$pm->wait_all_children;

check Parallel::ForkManager in CPAN to know more.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文