如何避免使用 python 使用 MySQLdb 进行查询的多次往返?

发布于 2025-01-04 03:26:37 字数 1359 浏览 2 评论 0原文

我正在读取一些原始文件并将它们插入数据库中。这将涉及数百万条记录,对于每条记录,我都有多个插入(许多表)。当我在本地测试时,速度很快,但对于整个数据集,我需要使用远程数据库。这种方式非常慢,我认为这是因为所有通过网络进行删除/插入的行程。

我正在使用 MySQLdb 模块(python),目前,我有如下内容:

# setup connection
con = mdb.connect('remote.host', 'database_user', '123456789', 'database_name');

... read files, loop through records, etc...

# clear out data related to current record
cur.execute("DELETE FROM articles WHERE article_id = %s", article.id)
cur.execute("DELETE FROM authors WHERE article_id = %s", article.id)
cur.execute("DELETE FROM addresses WHERE article_id = %s", article.id)
cur.execute("DELETE FROM citation_references WHERE article_id = %s", article.id)
cur.execute("DELETE FROM citation_patents WHERE article_id = %s", article.id)

# insert the article
cur.execute("INSERT INTO articles (article_id, doctype, keywords, language, title) VALUES (%s, %s, %s, %s, %s, %s)" , (article.id, article.doctype, ';'.join(article.keywords), article.language, article.title))

# insert all the authors
for au in article.authors:
    cur.execute("INSERT INTO isi_authors (article_id, name_first, name_last, email) VALUES (%s, %s, %s, %s)", (article.id, au.first_name, au.last_name, au.email))

... other loops like the authors to insert 10-20 citations per article, multiple addresses, etc ...

据我所知,MySQLdb 不允许我一次发送多个查询。我一定有办法避免网络延迟。有什么想法吗?

I am reading in some raw files and inserting them into a DB. This will involve MILLIONS of records, and for each record, I have multiple inserts (many tables). When I was testing locally, it was going quickly, but for the whole dataset, I need to work with a remote database. It is painfully slow this way, which I assume is because of all the trips over the network to delete/insert.

I am using the MySQLdb module (python) and at the moment, I have things such as the following:

# setup connection
con = mdb.connect('remote.host', 'database_user', '123456789', 'database_name');

... read files, loop through records, etc...

# clear out data related to current record
cur.execute("DELETE FROM articles WHERE article_id = %s", article.id)
cur.execute("DELETE FROM authors WHERE article_id = %s", article.id)
cur.execute("DELETE FROM addresses WHERE article_id = %s", article.id)
cur.execute("DELETE FROM citation_references WHERE article_id = %s", article.id)
cur.execute("DELETE FROM citation_patents WHERE article_id = %s", article.id)

# insert the article
cur.execute("INSERT INTO articles (article_id, doctype, keywords, language, title) VALUES (%s, %s, %s, %s, %s, %s)" , (article.id, article.doctype, ';'.join(article.keywords), article.language, article.title))

# insert all the authors
for au in article.authors:
    cur.execute("INSERT INTO isi_authors (article_id, name_first, name_last, email) VALUES (%s, %s, %s, %s)", (article.id, au.first_name, au.last_name, au.email))

... other loops like the authors to insert 10-20 citations per article, multiple addresses, etc ...

From what I can tell, MySQLdb does not allow me to send multiple queries at once. There must be a way for me to avoid the network delays. Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

执手闯天涯 2025-01-11 03:26:37

至少MySQLdb 1.2.3似乎允许开箱即用的多个查询,您只需调用cursor.nextset()即可循环返回的结果集。

db = conn.cursor()
db.execute('SELECT 1; SELECT 2;')

more = True
while more:
    print db.fetchall()
    more = db.nextset()

如果您想绝对确定对此的支持已启用,和/或禁用该支持,您可以使用类似以下内容:

MYSQL_OPTION_MULTI_STATEMENTS_ON = 0
MYSQL_OPTION_MULTI_STATEMENTS_OFF = 1

conn.set_server_option(MYSQL_OPTION_MULTI_STATEMENTS_ON)
# Multiple statement execution here...
conn.set_server_option(MYSQL_OPTION_MULTI_STATEMENTS_OFF)

如果在执行其中一个查询时出现错误,则 mysql 将不会在该点之后执行任何查询。如果它源自第一个查询,则 db.execute() 调用将抛出异常,否则相应的 db.nextset() 将会执行此操作,因此您可以获取获取异常之前成功查询的结果集。

At least MySQLdb 1.2.3 seems to allow multiple queries out of the box, you just have to call cursor.nextset() to cycle through the returned result sets.

db = conn.cursor()
db.execute('SELECT 1; SELECT 2;')

more = True
while more:
    print db.fetchall()
    more = db.nextset()

If you want to be absolutely sure the support for this is enabled, and/or disable the support, you can use something like this:

MYSQL_OPTION_MULTI_STATEMENTS_ON = 0
MYSQL_OPTION_MULTI_STATEMENTS_OFF = 1

conn.set_server_option(MYSQL_OPTION_MULTI_STATEMENTS_ON)
# Multiple statement execution here...
conn.set_server_option(MYSQL_OPTION_MULTI_STATEMENTS_OFF)

If there's an error while executing one of the queries, mysql won't execute any queries after that point. The db.execute() call will throw the exception if it originated from the first query, otherwise the appropriate db.nextset() will do it, so you can fetch the result sets from the successfull queries before getting the exception.

七秒鱼° 2025-01-11 03:26:37

使用executemany。这是 手册 中的一个示例:

c.executemany(
      """INSERT INTO breakfast (name, spam, eggs, sausage, price)
      VALUES (%s, %s, %s, %s, %s)""",
      [
      ("Spam and Sausage Lover's Plate", 5, 1, 8, 7.95 ),
      ("Not So Much Spam Plate", 3, 2, 0, 3.95 ),
      ("Don't Wany ANY SPAM! Plate", 0, 4, 3, 5.95 )
      ] )

在您的情况下,它看起来像这样

sql = "INSERT INTO isi_authors (article_id, name_first, name_last, email) VALUES (%s, %s, %s, %s)"
params = [(article.id, au.first_name, au.last_name, au.email) for au in article.authors]
cur.executemany(sql, params)

executemany 的文档:

此方法提高了多行 INSERT 和
代替。否则它相当于循环 args
执行()。

Use executemany. Here's an example from the manual:

c.executemany(
      """INSERT INTO breakfast (name, spam, eggs, sausage, price)
      VALUES (%s, %s, %s, %s, %s)""",
      [
      ("Spam and Sausage Lover's Plate", 5, 1, 8, 7.95 ),
      ("Not So Much Spam Plate", 3, 2, 0, 3.95 ),
      ("Don't Wany ANY SPAM! Plate", 0, 4, 3, 5.95 )
      ] )

In your case it would look something like this:

sql = "INSERT INTO isi_authors (article_id, name_first, name_last, email) VALUES (%s, %s, %s, %s)"
params = [(article.id, au.first_name, au.last_name, au.email) for au in article.authors]
cur.executemany(sql, params)

From the documentation of executemany:

This method improves performance on multiple-row INSERT and
REPLACE. Otherwise it is equivalent to looping over args with
execute().

情绪 2025-01-11 03:26:37

mySQL INSERT 语法确实允许这样做。比较 1) 和 2)

1. INSERT INTO tbl_name (a,b,c) VALUES(1,2,3);
2. INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

在第二种情况下,您一次插入三行。

http://dev.mysql.com/doc/refman/5.5/en /insert.html

希望它能给您带来想法。

PS:这是与语言无关的方式

mySQL INSERT syntax does allow it. Compare 1) and 2)

1. INSERT INTO tbl_name (a,b,c) VALUES(1,2,3);
2. INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

In the second case you're inserting three rows at once.

http://dev.mysql.com/doc/refman/5.5/en/insert.html

Hopefully it will give you ideas.

PS: this is language-independent way

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文