数据库查询优化

发布于 2024-09-03 17:38:36 字数 366 浏览 2 评论 0原文

好吧,我的巨人朋友们,我再次在你们的肩膀上寻求一点空间:P

这是问题,我有一个 python 脚本正在修复一些数据库问题,但它花费的时间太长了,主要的更新语句是这样的

cursor.execute("UPDATE jiveuser SET username = '%s' WHERE userid = %d" % (newName,userId))

:使用不同的 newName 和 userid 对调用了大约 9500 次...

关于如何加快该过程有什么建议吗?也许有一种方法可以让我只用一个查询就可以完成所有更新?

任何帮助将不胜感激!

PS:Postgres是正在使用的数据库。

Ok my Giant friends once again I seek a little space in your shoulders :P

Here is the issue, I have a python script that is fixing some database issues but it is taking way too long, the main update statement is this:

cursor.execute("UPDATE jiveuser SET username = '%s' WHERE userid = %d" % (newName,userId))

That is getting called about 9500 times with different newName and userid pairs...

Any suggestions on how to speed up the process? Maybe somehow a way where I can do all updates with just one query?

Any help will be much appreciated!

PS: Postgres is the db being used.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

不醒的梦 2024-09-10 17:38:36

将所有数据插入另一个空表(例如,称为 userchanges),然后在单个批次中进行更新:

UPDATE jiveuser
SET username = userchanges.username
FROM userchanges
WHERE userchanges.userid = jiveuser.userid
    AND userchanges.username <> jiveuser.username

请参阅 COPY 命令 用于批量加载数据。

还有填充数据库时提高性能的提示

Insert all the data into another empty table (called userchanges, say) then UPDATE in a single batch:

UPDATE jiveuser
SET username = userchanges.username
FROM userchanges
WHERE userchanges.userid = jiveuser.userid
    AND userchanges.username <> jiveuser.username

See this documentation on the COPY command for bulk loading your data.

There are also tips for improving performance when populating a database.

做个ˇ局外人 2024-09-10 17:38:36

首先,不要使用 % 运算符来构建 SQL。相反,将您的参数元组作为第二个参数传递给 cursor.execute,这也不需要引用您的参数,并允许您对所有内容使用 %s:

cursor.execute("UPDATE jiveuser SET username = %s WHERE userid = %s", (newName, userId))

这对于防止 SQL 注入攻击

要回答您的问题,您可以通过在 userid 列上创建索引来加速这些更新,这将允许数据库在 O(1) 恒定时间内更新,而不是必须扫描整个数据库表,这是O(n)。由于您使用的是 PostgreSQL,因此以下是创建索引的语法:

CREATE INDEX username_lookup ON jiveuser (userid);

编辑:由于您的评论表明您已经在 userid 列上有一个索引,因此您无法采取太多措施来加快速度询问。因此,您的主要选择要么是忍受缓慢,因为这听起来像是一次性修复某些损坏的东西,要么遵循 VeeArr 的建议并测试是否 cursor.executemany 会给你足够的提升。

First of all, do not use the % operator to construct your SQL. Instead, pass your tuple of arguments as the second parameter to cursor.execute, which also negates the need to quote your argument and allows you to use %s for everything:

cursor.execute("UPDATE jiveuser SET username = %s WHERE userid = %s", (newName, userId))

This is important to prevent SQL Injection attacks.

To answer your question, you can speed up these updates by creating an index on the userid column, which will allow the database to update in O(1) constant time rather than having to scan the entire database table, which is O(n). Since you're using PostgreSQL, here's the syntax to create your index:

CREATE INDEX username_lookup ON jiveuser (userid);

EDIT: Since your comment reveals that you already have an index on the userid column, there's not much you could possibly do to speed up that query. So your main choices are either living with the slowness, since this sounds like a one-time fix-something-broken thing, or following VeeArr's advice and testing whether cursor.executemany will give you a sufficient boost.

夏至、离别 2024-09-10 17:38:36

花费这么长时间的原因可能是您启用了自动提交并且每个更新都在自己的事务中完成。

这很慢,因为即使您有一个电池支持的 raid 控制器(当然,您绝对应该在所有数据库服务器上都有该控制器),它仍然需要为每个事务提交写入该设备以确保持久性。

解决方案是每个事务执行多行操作。但不要让交易太大,否则你也会遇到问题。作为粗略的猜测,尝试提交每 10,000 行更改。

The reason it's taking so long is probably that you've got autocommit enabled and each update gets done in its own transaction.

This is slow because even if you have a battery-backed raid controller (which you should definitely have on all database servers, of course), it still needs to do a write into that device for every transaction commit to ensure durability.

The solution is to do more than one row per transaction. But don't make transactions TOO big or you run into problems too. Try committing every 10,000 rows of changes as a rough guess.

Smile简单爱 2024-09-10 17:38:36

您可能需要查看 executemany():信息 这里

You might want to look into executemany(): Information here

浅听莫相离 2024-09-10 17:38:36

也许您可以在用户 ID 上创建索引来加快速度。

Perhaps you can create an index on userid to speed things up.

2024-09-10 17:38:36

我会对此做一个解释。如果它正在执行索引查找来查找记录(如果您在 userid 上有索引,则应该这样做),那么我看不出您可以采取哪些措施来提高性能。如果它不使用索引,那么诀窍就是找出不使用索引的原因并修复它。

哦,你可以尝试使用准备好的语句。有了 9500 个插入件,这应该会有所帮助。

I'd do an explain on this. If it's doing an indexed lookup to find the record -- which it should if you have an index on userid -- then I don't see what you could do to improve performance. If it's not using the index, then the trick is figuring out why not and fixing it.

Oh, you could try using a prepared statement. With 9500 inserts, that should help.

寻梦旅人 2024-09-10 17:38:36

将其移至存储过程并从数据库自身执行它。

Move this to a stored procedure and execute it from the database self.

何处潇湘 2024-09-10 17:38:36

首先确保您在“userid”上有一个索引,这将确保 dbms 不必每次都进行表扫描。

CREATE INDEX jiveuser_userid ON jiveuser (userid);

下一步尝试准备语句,然后对其调用执行。这将使优化器不必每次都检查查询

PREPARE update_username(string,integer) AS UPDATE jiveuser SET username = $1 WHERE userid = $2;
EXECUTE update_username("New Name", 123);

最后,通过关闭自动提交可以挤出更多的性能

\set autocommit off

First ensure you have an index on 'userid', this will ensure the dbms doesn't have to do a table scan each time

CREATE INDEX jiveuser_userid ON jiveuser (userid);

Next try preparing the statement, and then calling execute on it. This will stop the optimizer from having to examine the query each time

PREPARE update_username(string,integer) AS UPDATE jiveuser SET username = $1 WHERE userid = $2;
EXECUTE update_username("New Name", 123);

Finally, a bit more performance could be squeezed out by turning off autocommit

\set autocommit off
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文