Python 在迭代大型列表时速度很慢
我目前正在使用 pyodbc 从数据库中选择大量行。然后将结果复制到一个大列表中,然后我尝试迭代该列表。在我放弃 python 并尝试用 C# 创建它之前,我想知道我是否做错了什么。
clientItems.execute("Select ids from largetable where year =?", year);
allIDRows = clientItemsCursor.fetchall() #takes maybe 8 seconds.
for clientItemrow in allIDRows:
aID = str(clientItemRow[0])
# Do something with str -- Removed because I was trying to determine what was slow
count = count+1
更多信息:
- for 循环当前以每秒大约 5 次循环的速度运行,这对我来说似乎非常慢。
- 选择的总行数约为 489,000 行。
- 运行它的机器有大量的 RAM 和 CPU。它似乎只运行一两个核心,内存是 4GB 的 1.72GB。
谁能告诉我出了什么问题吗?脚本运行得这么慢吗?
谢谢
I am currently selecting a large list of rows from a database using pyodbc. The result is then copied to a large list, and then i am trying to iterate over the list. Before I abandon python, and try to create this in C#, I wanted to know if there was something I was doing wrong.
clientItems.execute("Select ids from largetable where year =?", year);
allIDRows = clientItemsCursor.fetchall() #takes maybe 8 seconds.
for clientItemrow in allIDRows:
aID = str(clientItemRow[0])
# Do something with str -- Removed because I was trying to determine what was slow
count = count+1
Some more information:
- The for loop is currently running at about 5 loops per second, and that seems insanely slow to me.
- The total rows selected is ~489,000.
- The machine its running on has lots of RAM and CPU. It seems to only run one or two cores, and ram is 1.72GB of 4gb.
Can anyone tell me whats wrong? Do scripts just run this slow?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
对于 Python 本机列表来说,这不应该很慢 - 但也许 ODBC 驱动程序正在返回一个“惰性”对象,它试图变得聪明,但只是变得很慢。 执行
allIDRows = list(clientItemsCursor.fetchall())
尝试在代码中
并发布进一步的基准测试。 (如果你开始在 Python 列表的中间插入东西,Python 列表可能会变慢,但仅仅迭代一个大列表应该会很快)
This should not be slow with Python native lists - but maybe ODBC's driver is returning a "lazy" object that tries to be smart but just gets slow. Try just doing
allIDRows = list(clientItemsCursor.fetchall())
in your code and post further benchmarks.
(Python lists can get slow if you start inserting things in its middle, but just iterating over a large list should be fast)
它可能很慢,因为您首先将所有结果加载到内存中并在列表上执行迭代。尝试迭代光标。
不,脚本不应该那么慢。
It's probably slow because you load all result in memory first and performing the iteration over a list. Try iterating the cursor instead.
And no, scripts shouldn't be that slow.
这里需要更多的调查...考虑以下脚本:
这与您的脚本几乎相同,减去数据库内容,并且需要几秒钟才能在我的速度不是很快的机器上运行。
More investigation is needed here... consider the following script:
This is pretty much the same as your script, minus the database stuff, and takes a few seconds to run on my not-terribly-fast machine.
当您直接连接到数据库时(我的意思是您收到 SQL 提示),运行此查询需要多少秒?
当查询结束时,您会收到如下消息:
因此,如果时间如此之长,并且您的查询像“本机”一样慢,则可能您必须在该表上创建索引。
When you connect to your database directly (I mean you get an SQL prompt), how many secods runs this query?
When query ends, you get a message like this:
So, if that time is so big, and your query is slow as "native", may be you have to create an index on that table.
这很慢,因为您正在
如果执行给你返回一个游标,那么使用游标来发挥它的优势,并在你取回东西时开始计数,并节省内存分配的时间。
其他提示:
This is slow because you are
If execute gives you back a cursor then use the cursor to it's advantage and start counting as you get stuff back and save time on the mem allocation.
Other hints: