准备好的语句中 where = ?, where in (?) 子句的最佳实践?
使用其中有一个问号的准备语句五十次,还是使用其中有五十个问号的准备语句一次,性能更高?
本质上是 Where Person = ?
还是 Where Person IN (?, ?, ?, ...)
更好?
示例
假设您有一个表,其中包含国家/地区列,然后通过几个关系表您可以了解该国家/地区的人口。
给定 1000 个国家/地区的列表,获取人口的最佳方法是什么?
请记住,这是一个假设的示例,维基百科将国家/地区数量列为 223 ,让我们假设这个例子要大得多。
创建一个接受 a 的语句 国家参数并返回人口。 示例:
Where Country = ?
创建准备好的语句 动态地添加一个 ?对于每个 使用
Where in 的国家/地区 (?,?,etc)
子句。示例:Where Country = (?, ?, ...)
创建一个 像选项中的简单语句 一,但循环并重用 每个参数一个准备语句 国家。
最好的方法是什么?
Is it more performant to use a Prepared Statement with one question mark in it fifty times, or to use a Prepared Statement with fifty question marks in it once?
Essentially is Where Person = ?
or Where Person IN (?, ?, ?, ...)
better?
Example
Say you have a table with a column, country, and then a few relational tables away you have the population for that country.
Given a list of 1000 countries, what is the best way to go about getting the population?
Keep in mind this is a hypothetical example, Wikipedia puts the number of countries at 223, let's assume for this example it is much larger.
Create a statement that takes in a
country parameter and returns a population.
Example:Where Country = ?
Create a Prepared Statement
dynamically, adding a ? for each
country using aWhere in
clause. Example:
(?,?,etc)Where Country = (?, ?, ...)
Create a
simple statement like in option
one, but loop through and reuse the
one parameter Prepared Statement for each
country.
What is the preferable method?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我在项目中达到了能够使用一些真实数据进行测试的程度。
基于 1435 个项目,选项 1 需要约 8 分钟,选项 2 需要约 15 秒,选项 3 需要约 3 分钟。
就性能而言,选项 2 显然是赢家。编码有点困难,但性能差异太大,不容忽视。
我认为来回数据库是瓶颈,但我确信此处列出的结果会根据网络、数据库引擎、数据库计算机规格和其他环境因素而有所不同。
I reached a point in my project were I was able to test with some real data.
Based on 1435 items, Option 1 takes ~8 minutes, Option 2 takes ~15 seconds, and Option 3 takes ~3 minutes.
Option 2 is the clear winner in terms of performance. It is a little harder to code around, but the performance difference is too great to ignore.
I makes sense that going back and forth to the database is the bottleneck, though I'm sure the results listed here would vary based on network, database engine, database machine specs, and other environmental factors.
正如人们常说的,“这取决于情况”。如果您只是寻找单个国家/地区的人口,我会使用方法 1。我会避免使用方法 2,因为我不喜欢使用动态构造的 SQL,除非它是完成工作的唯一方法(有效),而这似乎不是其中之一。我对第三点也不感兴趣,因为我认为如果你需要获取所有不同国家的人口,循环效率会很低。
我们添加#4如何:返回所有国家/地区的人口的单个语句,例如
围绕该方法构建一个方法,如果您需要立即获取所有国家/地区的人口,则让它返回国家/地区到人口的地图。
分享并享受。
As is often said, "That depends". If you're just looking for the population of a single country I'd go with method 1. I'd avoid #2 because I don't like to use dynamically-constructed SQL unless it's the only way to get the job done (efficiently), and this doesn't appear to be one of those cases. I'm not big on #3 either because I think that the loop will be inefficient if you need to fetch the population of all the different countries.
How about we add #4: a single statement that returns the population of all the countries, something like
Build a method around that and have it return a Map of country to population if you need to obtain the population of all the countries at once.
Share and enjoy.
内存很便宜。将整个列表加载到缓存的哈希表中并以内存速度工作
如果性能是一个问题,请使用 RAM。您可能需要花费数天或数周的时间来尝试优化适合价值 100 美元 RAM 的内容
RAM is cheap. Load the whole list into a cached hash table and work at memory speed
If performance is an issue use RAM. You could spend days or weeks trying to optimise something that could fit into $100 worth of RAM
执行查询分为两个步骤:
1. 创建执行计划。
2.执行计划。
准备好的语句与步骤 1 相关。在给出的示例中,我认为最多的执行时间将在步骤 2 中,因此我会选择提供最佳执行效果的替代方案。使数据库引擎能够优化的一般规则是为其提供范围问题,而不是在客户端中循环发出几个小问题。可用索引和客户端-服务器延迟当然会影响差异的大小,但我认为您的选项#2(动态创建准备好的语句)通常是最好的选择。
您对不同的替代方案做过测试吗?如果有,它们会显示什么?
There are two steps in executing a query:
1. Create the execution plan.
2. Execute the plan.
Prepared statements are related to step 1. In the example given I think that the most execution time will be in step 2, so I'd pick the alternative that gives best execution. A general rule to enable the DB engine to optimize is to give it range questions rather than looping in the client issuing several small questions. Available indexes and client-server latency of course affects how large the difference is, but I think that your option #2, to create a prepared statement dynamically often is the best alternative.
Have you done any tests of the different alternatives? If you have, what do they show?
正如其他人所说,这取决于参数的数量和数据的大小。根据您在评论中所述,源表可能包含数十万行。如果是这样的话,问题就归结为允许的过滤输入的数量。您的查询是否只允许一小部分输入,或者是否需要允许过滤一千个国家/地区?如果是后者,那么我建议将选择存储到中间表中并将其连接起来。类似于:
在选择时,您将填充此表,然后从中查询,这样
您可以使用 RNGCryptoServiceProvider 生成随机数(如果同一“会话”可能以不同方式并行调用该随机数)。此设置的问题是您需要定期清除选择表。
如果所涉及的实体在某种程度上是不可变的(例如国家/地区、城市等),那么将缓存策略与查询策略结合使用也会有所帮助。
顺便说一句,另一个同样的解决方案是使用临时表。但是,如果您这样做,则需要小心使用完全相同的连接来创建临时表、临时表的填充及其使用。
As others have stated, it depends on the number of parameters and the size of the data. From what you have stated in the comments, the source table could be something that has hundreds of thousands of rows. If that's the case, the question comes down to the number of allowed filtering inputs. Is your query only going to allow for a small set of inputs or does it need to allow for filtering for a thousand countries? If the later, then I'd recommend storing the selections into an intermediate table and joining off that. Something like:
On selection, you would populate this table and then query from it like so
You can use the RNGCryptoServiceProvider to generate a random number if this might be called multiple times in different ways by the same "session" in parallel. The catch to this setup is that you need to clear out the selections table periodically.
If the entities in question are somewhat immutable (e.g. a Country, a City etc.) then using a caching strategy in conjunction with your querying strategy would also help.
BTW, another solution along the same lines is to use a temp table. However, if you do that you need to be careful to use the exact same connection for creation of the temp table, the population of the temp table and its use.
根据所使用的数据库引擎,可能还有另一种选择。
例如,对于 MS SQL,您可以使用 CSV->Table 函数,例如:
http://www.nigelrivett.net/SQLTsql/ParseCSVString.html
然后你可以为您的查询提供以逗号分隔的值字符串,然后加入表:
在这种情况下,将有两个循环:构建 CSV 字符串(如果您还没有这种格式的字符串)并将 CSV 拆分到表中。
但它可能比多次执行 join 以及使用 IN 提供更好的性能(根据我的经验,IN 的性能相当糟糕)。如果性能确实是一个问题,那么您当然应该进行测试。
结果也可能因网络开销等而异......
Depending on database engine used, there might be another alternative.
For MS SQL for example, you could use a CSV->Table function, for example:
http://www.nigelrivett.net/SQLTsql/ParseCSVString.html
Then you can provide your query with a comma separated string of values instead and join the table:
In this case, there will be two loops: building the CSV string (if you do not already have it in this format) and splitting the CSV into a table.
But it may provide better performance than executing join several times as well as using IN (which in my experience has pretty bad performance). If performance is really an issue, you should test of course.
Results may also vary depending on network overhead, etc...