Perl DBI 获取数据库的一部分?

发布于 2024-09-13 05:54:26 字数 729 浏览 3 评论 0原文

我正在开发一个数据库,其行数超过 1,000,000。我有我的 select 语句,但如果我一开始就获取所有内容,我很快就会耗尽内存。这是我的两个问题:

  1. 由于我不知道要启动的数据库的确切大小,有什么方法可以在不执行 fetchall 的情况下找出数据库的大小?计算机确实无法支持它。

  2. 有什么方法可以获取数据库的某个块(例如一次处理 5,000 个块),而不是为每一行执行单独的 fetchrow 操作?我刚刚完成一个测试,逐行进行,每 1000 行工作大约需要 4 分钟,而老板并不看好一个需要近 3 天才能完成的程序。

这是我的代码:

while ($i < $rows)
{
    if ($i + $chunkRows < $rows)
    {
        for ($j = 0; $j < $chunkRows; $j++)
        {
            @array = $sth->fetchrow();
            ($nameOne, $numberOne, $numberTwo) = someFunction($lineCount,@array,$nameOne,$numberOne, $numberTwo);
        }
    }
    else #run for loop for j < rows % chunkrows
    $i = $i + $j
}

I'm working on a database, where the number of rows is somewhere above 1,000,000. I have my select statement, but if I fetchall to begin with, I run out of memory quickly. Here are my 2 questions:

  1. Since I dont know the exact size of the database to start, is there any way to find out the size of the database without doing a fetchall? The computer literally cannot support it.

  2. Is there any way to fetch say a certain chunk of the database, maybe like 5,000 at a time to process, instead of doing an individual fetchrow for each and every line? I just finished running a test, and to do it row by row, its looking to be almost 4 minutes per 1000 rows worked on, and the boss isnt looking favorably on a program that is going to take almost 3 days to complete.

This is my code:

while ($i < $rows)
{
    if ($i + $chunkRows < $rows)
    {
        for ($j = 0; $j < $chunkRows; $j++)
        {
            @array = $sth->fetchrow();
            ($nameOne, $numberOne, $numberTwo) = someFunction($lineCount,@array,$nameOne,$numberOne, $numberTwo);
        }
    }
    else #run for loop for j < rows % chunkrows
    $i = $i + $j
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

○愚か者の日 2024-09-20 05:54:26

显示您的 fetchrow 循环代码;可能有一些方法可以改进它,具体取决于您如何调用它以及您对数据的处理方式。

我相信大多数数据库的数据库驱动程序都会从​​服务器一次获取多行;您必须说明您正在使用什么基础类型的数据库才能在那里获得好的建议。如果它确实与服务器通信每一行,您将必须修改 SQL 以一次获取多组行,但具体操作方法取决于您使用的数据库。

啊,DB2。我不确定,但我认为你必须做这样的事情:

SELECT *
FROM (SELECT col1, col2, col3, ROW_NUMBER() OVER () AS RN FROM table) AS cols
WHERE RN BETWEEN 1 AND 10000;

并调整每个查询的数字,直到得到空结果。显然这个
数据库端需要做更多的工作才能让它重复查询多次;我不
知道是否有 DB2 方法来优化它(即临时表)。

Show your fetchrow looping code; there may be ways to improve it, depending on how you are calling it and just what you are doing with the data.

I believe the database drivers for most databases will fetch multiple rows at once from the server; you are going to have to say what underlying type of database you are using to get good advice there. If indeed it is communicating with the server for each row, you are going to have to modify the SQL to get sets of rows at a time, but how to do that varies depending on what database you are using.

Ah, DB2. I'm not sure, but I think you have to do something like this:

SELECT *
FROM (SELECT col1, col2, col3, ROW_NUMBER() OVER () AS RN FROM table) AS cols
WHERE RN BETWEEN 1 AND 10000;

and adjust the numbers for each query until you get an empty result. Obviously this
is more work on the database side to have it repeat the query multiple times; I don't
know if there are DB2 ways to optimize this (i.e. temporary tables).

只是一片海 2024-09-20 05:54:26

要获取表中的行数,您可以使用

Select count(*) from Table

来限制返回的行数,这可能特定于您的数据库。例如,MySQL 有一个 Limit 关键字,它可以让您仅拉回一定数量的行。

话虽这么说,如果您要撤回所有行,您可能需要在此处添加一些其他问题,具体描述您正在做什么,因为这在大多数应用程序中并不常见。

如果数据库中没有可用的限制,您可以执行以下操作:使用布尔值标记列以指示已处理行,然后针对有限数量的行重新运行查询,跳过那些已经完成的。或者记录最后处理的行 ID,然后将下一个查询限制为具有更大 ID 的行。有很多方法可以解决这个问题。

To get the number of rows in a table, you can use

Select count(*) from Table

To limit the number of rows returned, this may be specific to your database. MySQL, for example, has a Limit keyword which will let you pull back only a certain number of rows.

That being said, if you are pulling back all rows, you may want to add some other questions here describing specifically what you are doing, because thats not a common thing in most applications.

If you dont have a limit available in your database, you can do things like flag a column with a boolean to indicate that a row was processed, and then re-run your query for a limited number of rows, skipping those that have been completed. Or record the last row id processed, and then limit your next query to rows with a greater id. There's a lot of ways around that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文