对数百万数据运行选择查询的有效方法
我想运行各种选择查询 1 亿次,并且我有 aprox。表中有 100 万行。因此,我正在寻找运行所有这些选择查询的最快方法。
到目前为止,我已经尝试了三种不同的方法,结果都是相似的。
当然,以下三种方法并没有做任何有用的事情,而纯粹是为了比较性能。
第一种方法:
for i in range (100000000):
cur.execute("select id from testTable where name = 'aaa';")
第二种方法:
cur.execute("""PREPARE selectPlan AS
SELECT id FROM testTable WHERE name = 'aaa' ;""")
for i in range (10000000):
cur.execute("""EXECUTE selectPlan ;""")
第三种方法:
def _data(n):
cur = conn.cursor()
for i in range (n):
yield (i, 'test')
sql = """SELECT id FROM testTable WHERE name = 'aaa' ;"""
cur.executemany(sql, _data(10000000))
And the table is created like this:
cur.execute("""CREATE TABLE testTable ( id int, name varchar(1000) );""")
cur.execute("""CREATE INDEX indx_testTable ON testTable(name)""")
我认为使用准备好的语句功能确实会加快查询速度,但由于这似乎不会发生,我认为您可以给我一些关于执行此操作的其他方法的提示。
I want to run various select query 100 million times and I have aprox. 1 million rows in a table. Therefore, I am looking for the fastest method to run all these select queries.
So far I have tried three different methods, and the results were similar.
The following three methods are, of course, not doing anything useful, but are purely for comparing performance.
first Method:
for i in range (100000000):
cur.execute("select id from testTable where name = 'aaa';")
second method:
cur.execute("""PREPARE selectPlan AS
SELECT id FROM testTable WHERE name = 'aaa' ;""")
for i in range (10000000):
cur.execute("""EXECUTE selectPlan ;""")
third method:
def _data(n):
cur = conn.cursor()
for i in range (n):
yield (i, 'test')
sql = """SELECT id FROM testTable WHERE name = 'aaa' ;"""
cur.executemany(sql, _data(10000000))
And the table is created like this:
cur.execute("""CREATE TABLE testTable ( id int, name varchar(1000) );""")
cur.execute("""CREATE INDEX indx_testTable ON testTable(name)""")
I thought that using the prepared statement functionality would really speed up the queries, but as it seems like this will not happen, I thought you could give me a hint on other ways of doing this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这种基准不太可能产生任何有用的数据,但第二种方法应该是最快的,因为一旦准备好语句,它就会由数据库服务器存储在内存中。进一步调用重复查询不需要传输查询文本,因此节省了少量时间。
这可能没有实际意义,因为查询非常小(可能与重复发送查询文本的网络数据包数量相同),并且查询缓存将为每个请求提供相同的数据。
This sort of benchmark is unlikely to produce any useful data, but the second method should be fastest, as once the statement is prepared it is stored in memory by the database server. Further calls to repeat the query do not require the text of the query to be transmitted, so saving a small about of time.
This is likely to be moot as the query is very small (likely the same quantity of packets over the wire as repeating sending the query text), and the query cache will serve the same data for every request.
一次性检索这么多数据的目的是什么?我不知道你的情况,但我肯定会使用 limit 和 offset 对结果进行分页。看看:
7.6。限制和偏移
What's the purpose of retrieving such amount of data at once? I don't know your situation, but I'd definitely page the results using limit and offset. Take a look at:
7.6. LIMIT and OFFSET
如果您只想单独对 SQL 进行基准测试,而不是将 Python 混合到方程中,请尝试 pgbench。
http://developer.postgresql.org/pgdocs/postgres/pgbench.html
另外你的目标是什么?
If you just want to benchmark SQL all on it's own and not mix Python into the equation try pgbench.
http://developer.postgresql.org/pgdocs/postgres/pgbench.html
Also what is your goal here?