Python 在迭代大型列表时速度很慢

发布于 2025-01-08 03:44:02 字数 637 浏览 0 评论 0原文

我目前正在使用 pyodbc 从数据库中选择大量行。然后将结果复制到一个大列表中，然后我尝试迭代该列表。在我放弃 python 并尝试用 C# 创建它之前，我想知道我是否做错了什么。

clientItems.execute("Select ids from largetable where year =?", year);
allIDRows = clientItemsCursor.fetchall() #takes maybe 8 seconds.

for clientItemrow in allIDRows:
    aID = str(clientItemRow[0])
    # Do something with str -- Removed because I was trying to determine what was slow
    count = count+1

更多信息：

for 循环当前以每秒大约 5 次循环的速度运行，这对我来说似乎非常慢。
选择的总行数约为 489,000 行。
运行它的机器有大量的 RAM 和 CPU。它似乎只运行一两个核心，内存是 4GB 的 1.72GB。

谁能告诉我出了什么问题吗？脚本运行得这么慢吗？

谢谢

原文

I am currently selecting a large list of rows from a database using pyodbc. The result is then copied to a large list, and then i am trying to iterate over the list. Before I abandon python, and try to create this in C#, I wanted to know if there was something I was doing wrong.

clientItems.execute("Select ids from largetable where year =?", year);
allIDRows = clientItemsCursor.fetchall() #takes maybe 8 seconds.

for clientItemrow in allIDRows:
    aID = str(clientItemRow[0])
    # Do something with str -- Removed because I was trying to determine what was slow
    count = count+1

Some more information:

The for loop is currently running at about 5 loops per second, and that seems insanely slow to me.
The total rows selected is ~489,000.
The machine its running on has lots of RAM and CPU. It seems to only run one or two cores, and ram is 1.72GB of 4gb.

Can anyone tell me whats wrong? Do scripts just run this slow?

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

辞别 2025-01-15 03:44:02

对于 Python 本机列表来说，这不应该很慢 - 但也许 ODBC 驱动程序正在返回一个“惰性”对象，它试图变得聪明，但只是变得很慢。执行

allIDRows = list(clientItemsCursor.fetchall())

尝试在代码中

并发布进一步的基准测试。（如果你开始在 Python 列表的中间插入东西，Python 列表可能会变慢，但仅仅迭代一个大列表应该会很快）

回复收藏 0 原文

哑 2025-01-15 03:44:02

它可能很慢，因为您首先将所有结果加载到内存中并在列表上执行迭代。尝试迭代光标。

不，脚本不应该那么慢。

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
    aID = str(clientItemrow[0])
    count = count + 1

It's probably slow because you load all result in memory first and performing the iteration over a list. Try iterating the cursor instead.

And no, scripts shouldn't be that slow.

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
    aID = str(clientItemrow[0])
    count = count + 1

回复收藏 0 原文

百合的盛世恋 2025-01-15 03:44:02

这里需要更多的调查...考虑以下脚本：

bigList = range(500000)
doSomething = ""
arrayList = [[x] for x in bigList]  # takes a few seconds
for x in arrayList:
    doSomething += str(x[0])
    count+=1

这与您的脚本几乎相同，减去数据库内容，并且需要几秒钟才能在我的速度不是很快的机器上运行。

More investigation is needed here... consider the following script:

bigList = range(500000)
doSomething = ""
arrayList = [[x] for x in bigList]  # takes a few seconds
for x in arrayList:
    doSomething += str(x[0])
    count+=1

This is pretty much the same as your script, minus the database stuff, and takes a few seconds to run on my not-terribly-fast machine.

回复收藏 0 原文

枕梦 2025-01-15 03:44:02

当您直接连接到数据库时（我的意思是您收到 SQL 提示），运行此查询需要多少秒？

当查询结束时，您会收到如下消息：

NNNNN rows in set (0.01 sec)

因此，如果时间如此之长，并且您的查询像“本机”一样慢，则可能您必须在该表上创建索引。

When you connect to your database directly (I mean you get an SQL prompt), how many secods runs this query?

When query ends, you get a message like this:

NNNNN rows in set (0.01 sec)

So, if that time is so big, and your query is slow as "native", may be you have to create an index on that table.

回复收藏 0 原文

梦中的蝴蝶 2025-01-15 03:44:02

这很慢，因为您正在

获取所有结果
分配内存并将值分配给该内存以创建列表 allIDRows
迭代该列表并计数。

如果执行给你返回一个游标，那么使用游标来发挥它的优势，并在你取回东西时开始计数，并节省内存分配的时间。

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
   count +=1

其他提示：

在年份上创建索引
，使用“select count(*) from ...来获取年份的计数”，这可能会在数据库上进行优化。
如果不需要，请删除 aID 行，这会将行的第一项转换为字符串，即使未使用它。

This is slow because you are

Getting all the results
Allocating memory and assigning the values to that memory to create the list allIDRows
Iterating over that list and counting.

If execute gives you back a cursor then use the cursor to it's advantage and start counting as you get stuff back and save time on the mem allocation.

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
   count +=1

Other hints:

create an index on year
use 'select count(*) from ... to get the count for the year' this will probably be optimised on the db.
Remove the aID line if not needed this is converting the first item of the row to a string even though its not used.

回复收藏 0 原文

~没有更多了~