Perl DBI 获取数据库的一部分？

发布于 2024-09-13 05:54:26 字数 729 浏览 3 评论 0原文

我正在开发一个数据库，其行数超过 1,000,000。我有我的 select 语句，但如果我一开始就获取所有内容，我很快就会耗尽内存。这是我的两个问题：

由于我不知道要启动的数据库的确切大小，有什么方法可以在不执行 fetchall 的情况下找出数据库的大小？计算机确实无法支持它。
有什么方法可以获取数据库的某个块（例如一次处理 5,000 个块），而不是为每一行执行单独的 fetchrow 操作？我刚刚完成一个测试，逐行进行，每 1000 行工作大约需要 4 分钟，而老板并不看好一个需要近 3 天才能完成的程序。

这是我的代码：

while ($i < $rows)
{
    if ($i + $chunkRows < $rows)
    {
        for ($j = 0; $j < $chunkRows; $j++)
        {
            @array = $sth->fetchrow();
            ($nameOne, $numberOne, $numberTwo) = someFunction($lineCount,@array,$nameOne,$numberOne, $numberTwo);
        }
    }
    else #run for loop for j < rows % chunkrows
    $i = $i + $j
}

原文

I'm working on a database, where the number of rows is somewhere above 1,000,000. I have my select statement, but if I fetchall to begin with, I run out of memory quickly. Here are my 2 questions:

Since I dont know the exact size of the database to start, is there any way to find out the size of the database without doing a fetchall? The computer literally cannot support it.
Is there any way to fetch say a certain chunk of the database, maybe like 5,000 at a time to process, instead of doing an individual fetchrow for each and every line? I just finished running a test, and to do it row by row, its looking to be almost 4 minutes per 1000 rows worked on, and the boss isnt looking favorably on a program that is going to take almost 3 days to complete.

This is my code:

while ($i < $rows)
{
    if ($i + $chunkRows < $rows)
    {
        for ($j = 0; $j < $chunkRows; $j++)
        {
            @array = $sth->fetchrow();
            ($nameOne, $numberOne, $numberTwo) = someFunction($lineCount,@array,$nameOne,$numberOne, $numberTwo);
        }
    }
    else #run for loop for j < rows % chunkrows
    $i = $i + $j
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

○愚か者の日 2024-09-20 05:54:26

显示您的 fetchrow 循环代码；可能有一些方法可以改进它，具体取决于您如何调用它以及您对数据的处理方式。

我相信大多数数据库的数据库驱动程序都会从服务器一次获取多行；您必须说明您正在使用什么基础类型的数据库才能在那里获得好的建议。如果它确实与服务器通信每一行，您将必须修改 SQL 以一次获取多组行，但具体操作方法取决于您使用的数据库。

啊，DB2。我不确定，但我认为你必须做这样的事情：

SELECT *
FROM (SELECT col1, col2, col3, ROW_NUMBER() OVER () AS RN FROM table) AS cols
WHERE RN BETWEEN 1 AND 10000;

并调整每个查询的数字，直到得到空结果。显然这个
数据库端需要做更多的工作才能让它重复查询多次；我不
知道是否有 DB2 方法来优化它（即临时表）。

Show your fetchrow looping code; there may be ways to improve it, depending on how you are calling it and just what you are doing with the data.

I believe the database drivers for most databases will fetch multiple rows at once from the server; you are going to have to say what underlying type of database you are using to get good advice there. If indeed it is communicating with the server for each row, you are going to have to modify the SQL to get sets of rows at a time, but how to do that varies depending on what database you are using.

Ah, DB2. I'm not sure, but I think you have to do something like this:

SELECT *
FROM (SELECT col1, col2, col3, ROW_NUMBER() OVER () AS RN FROM table) AS cols
WHERE RN BETWEEN 1 AND 10000;

and adjust the numbers for each query until you get an empty result. Obviously this
is more work on the database side to have it repeat the query multiple times; I don't
know if there are DB2 ways to optimize this (i.e. temporary tables).

回复收藏 0 原文