对数百万数据运行选择查询的有效方法

发布于 2024-10-31 08:05:47 字数 960 浏览 3 评论 0原文

我想运行各种选择查询 1 亿次,并且我有 aprox。表中有 100 万行。因此,我正在寻找运行所有这些选择查询的最快方法。

到目前为止,我已经尝试了三种不同的方法,结果都是相似的。

当然,以下三种方法并没有做任何有用的事情,而纯粹是为了比较性能。

第一种方法:

for i in range (100000000):
    cur.execute("select id from testTable where name = 'aaa';")

第二种方法:

cur.execute("""PREPARE selectPlan  AS
    SELECT id FROM testTable WHERE name = 'aaa' ;""")

for i in range (10000000):
    cur.execute("""EXECUTE selectPlan ;""")

第三种方法:

def _data(n):
    cur = conn.cursor()
    for i in range (n):
    yield (i, 'test')

sql = """SELECT id FROM testTable WHERE name = 'aaa' ;"""   
cur.executemany(sql, _data(10000000))



And the table is created like this: 

cur.execute("""CREATE TABLE testTable ( id int, name varchar(1000) );""")
cur.execute("""CREATE INDEX indx_testTable ON testTable(name)""")

我认为使用准备好的语句功能确实会加快查询速度,但由于这似乎不会发生,我认为您可以给我一些关于执行此操作的其他方法的提示。

I want to run various select query 100 million times and I have aprox. 1 million rows in a table. Therefore, I am looking for the fastest method to run all these select queries.

So far I have tried three different methods, and the results were similar.

The following three methods are, of course, not doing anything useful, but are purely for comparing performance.

first Method:

for i in range (100000000):
    cur.execute("select id from testTable where name = 'aaa';")

second method:

cur.execute("""PREPARE selectPlan  AS
    SELECT id FROM testTable WHERE name = 'aaa' ;""")

for i in range (10000000):
    cur.execute("""EXECUTE selectPlan ;""")

third method:

def _data(n):
    cur = conn.cursor()
    for i in range (n):
    yield (i, 'test')

sql = """SELECT id FROM testTable WHERE name = 'aaa' ;"""   
cur.executemany(sql, _data(10000000))



And the table is created like this: 

cur.execute("""CREATE TABLE testTable ( id int, name varchar(1000) );""")
cur.execute("""CREATE INDEX indx_testTable ON testTable(name)""")

I thought that using the prepared statement functionality would really speed up the queries, but as it seems like this will not happen, I thought you could give me a hint on other ways of doing this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

倾城泪 2024-11-07 08:05:47

这种基准不太可能产生任何有用的数据,但第二种方法应该是最快的,因为一旦准备好语句,它就会由数据库服务器存储在内存中。进一步调用重复查询不需要传输查询文本,因此节省了少量时间。

这可能没有实际意义,因为查询非常小(可能与重复发送查询文本的网络数据包数量相同),并且查询缓存将为每个请求提供相同的数据。

This sort of benchmark is unlikely to produce any useful data, but the second method should be fastest, as once the statement is prepared it is stored in memory by the database server. Further calls to repeat the query do not require the text of the query to be transmitted, so saving a small about of time.

This is likely to be moot as the query is very small (likely the same quantity of packets over the wire as repeating sending the query text), and the query cache will serve the same data for every request.

翻了热茶 2024-11-07 08:05:47

一次性检索这么多数据的目的是什么?我不知道你的情况,但我肯定会使用 limit 和 offset 对结果进行分页。看看:
7.6。限制和偏移

What's the purpose of retrieving such amount of data at once? I don't know your situation, but I'd definitely page the results using limit and offset. Take a look at:
7.6. LIMIT and OFFSET

怂人 2024-11-07 08:05:47

如果您只想单独对 SQL 进行基准测试,而不是将 Python 混合到方程中,请尝试 pgbench。

http://developer.postgresql.org/pgdocs/postgres/pgbench.html

另外你的目标是什么?

If you just want to benchmark SQL all on it's own and not mix Python into the equation try pgbench.

http://developer.postgresql.org/pgdocs/postgres/pgbench.html

Also what is your goal here?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文