如何避免使用 python 使用 MySQLdb 进行查询的多次往返?
我正在读取一些原始文件并将它们插入数据库中。这将涉及数百万条记录,对于每条记录,我都有多个插入(许多表)。当我在本地测试时,速度很快,但对于整个数据集,我需要使用远程数据库。这种方式非常慢,我认为这是因为所有通过网络进行删除/插入的行程。
我正在使用 MySQLdb 模块(python),目前,我有如下内容:
# setup connection
con = mdb.connect('remote.host', 'database_user', '123456789', 'database_name');
... read files, loop through records, etc...
# clear out data related to current record
cur.execute("DELETE FROM articles WHERE article_id = %s", article.id)
cur.execute("DELETE FROM authors WHERE article_id = %s", article.id)
cur.execute("DELETE FROM addresses WHERE article_id = %s", article.id)
cur.execute("DELETE FROM citation_references WHERE article_id = %s", article.id)
cur.execute("DELETE FROM citation_patents WHERE article_id = %s", article.id)
# insert the article
cur.execute("INSERT INTO articles (article_id, doctype, keywords, language, title) VALUES (%s, %s, %s, %s, %s, %s)" , (article.id, article.doctype, ';'.join(article.keywords), article.language, article.title))
# insert all the authors
for au in article.authors:
cur.execute("INSERT INTO isi_authors (article_id, name_first, name_last, email) VALUES (%s, %s, %s, %s)", (article.id, au.first_name, au.last_name, au.email))
... other loops like the authors to insert 10-20 citations per article, multiple addresses, etc ...
据我所知,MySQLdb 不允许我一次发送多个查询。我一定有办法避免网络延迟。有什么想法吗?
I am reading in some raw files and inserting them into a DB. This will involve MILLIONS of records, and for each record, I have multiple inserts (many tables). When I was testing locally, it was going quickly, but for the whole dataset, I need to work with a remote database. It is painfully slow this way, which I assume is because of all the trips over the network to delete/insert.
I am using the MySQLdb module (python) and at the moment, I have things such as the following:
# setup connection
con = mdb.connect('remote.host', 'database_user', '123456789', 'database_name');
... read files, loop through records, etc...
# clear out data related to current record
cur.execute("DELETE FROM articles WHERE article_id = %s", article.id)
cur.execute("DELETE FROM authors WHERE article_id = %s", article.id)
cur.execute("DELETE FROM addresses WHERE article_id = %s", article.id)
cur.execute("DELETE FROM citation_references WHERE article_id = %s", article.id)
cur.execute("DELETE FROM citation_patents WHERE article_id = %s", article.id)
# insert the article
cur.execute("INSERT INTO articles (article_id, doctype, keywords, language, title) VALUES (%s, %s, %s, %s, %s, %s)" , (article.id, article.doctype, ';'.join(article.keywords), article.language, article.title))
# insert all the authors
for au in article.authors:
cur.execute("INSERT INTO isi_authors (article_id, name_first, name_last, email) VALUES (%s, %s, %s, %s)", (article.id, au.first_name, au.last_name, au.email))
... other loops like the authors to insert 10-20 citations per article, multiple addresses, etc ...
From what I can tell, MySQLdb does not allow me to send multiple queries at once. There must be a way for me to avoid the network delays. Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
至少MySQLdb 1.2.3似乎允许开箱即用的多个查询,您只需调用cursor.nextset()即可循环返回的结果集。
如果您想绝对确定对此的支持已启用,和/或禁用该支持,您可以使用类似以下内容:
如果在执行其中一个查询时出现错误,则 mysql 将不会在该点之后执行任何查询。如果它源自第一个查询,则 db.execute() 调用将抛出异常,否则相应的 db.nextset() 将会执行此操作,因此您可以获取获取异常之前成功查询的结果集。
At least
MySQLdb
1.2.3 seems to allow multiple queries out of the box, you just have to callcursor.nextset()
to cycle through the returned result sets.If you want to be absolutely sure the support for this is enabled, and/or disable the support, you can use something like this:
If there's an error while executing one of the queries, mysql won't execute any queries after that point. The
db.execute()
call will throw the exception if it originated from the first query, otherwise the appropriatedb.nextset()
will do it, so you can fetch the result sets from the successfull queries before getting the exception.使用
executemany
。这是 手册 中的一个示例:在您的情况下,它看起来像这样
:
executemany
的文档:Use
executemany
. Here's an example from the manual:In your case it would look something like this:
From the documentation of
executemany
:mySQL INSERT 语法确实允许这样做。比较 1) 和 2)
在第二种情况下,您一次插入三行。
http://dev.mysql.com/doc/refman/5.5/en /insert.html
希望它能给您带来想法。
PS:这是与语言无关的方式
mySQL INSERT syntax does allow it. Compare 1) and 2)
In the second case you're inserting three rows at once.
http://dev.mysql.com/doc/refman/5.5/en/insert.html
Hopefully it will give you ideas.
PS: this is language-independent way