SQLite select语句优化建议
我有一个带有结构的 SQLite 表“详细信息”:
ID Name Category
---------------------
1 Matt 0
2 Shervin 0
3 Bob 0
4 Lee 0
5 Rick 0
6 Suraya 0
7 Susan 0
8 Adam 0
9 Jon 1
10 Lorna 1
... and so on .......
我想随机选择一行,然后从三个不同行中选择三个名称(最好还是随机选择)。我希望这一切都从一个 SQLite 语句返回。例如,
ID Name Category Name1 Name2 Name 3
----------------------------------------
3 Bob 0 Matt Lee Susan
我对此的尝试如下所示,但它有两个问题:
- 三个额外的名称不一定总是不同 - 我似乎无法排除先前选择的名称,因为变量 b/c/d 不在范围除了它们自己的 COALESCE 函数之外。
- 由于每个嵌套选择都使用 Random() 函数,因此效率不是很高。
谁能建议另一种方法来选择我需要的数据(使用 SQLite 数据库)?欢迎任何帮助/建议 - 希望我清楚我想要实现的目标,请随时要求任何澄清。
我目前的尝试:
SELECT a.Id,
a.Name,
a.Category,
COALESCE((SELECT b.Name
FROM Details b
WHERE b.Id NOT IN (a.Id)
AND b.Category IN (0)
ORDER BY Random()
LIMIT 1),'') as "Name1",
COALESCE((SELECT c.Name
FROM Details c
WHERE c.Id NOT IN (a.Id)
AND c.Category IN (0)
ORDER BY Random()
LIMIT 1),'') as "Name2",
COALESCE((SELECT d.Name
FROM Details d
WHERE d.Id NOT IN (a.Id)
AND d.Category IN (0)
ORDER BY Random()
LIMIT 1),'') as "Name3"
FROM Details a
AND a.Category IN (0)
ORDER BY Random()
LIMIT 1
I have a SQLite table 'Details' with structure:
ID Name Category
---------------------
1 Matt 0
2 Shervin 0
3 Bob 0
4 Lee 0
5 Rick 0
6 Suraya 0
7 Susan 0
8 Adam 0
9 Jon 1
10 Lorna 1
... and so on .......
I want to select a row at random, and then three names from three different rows (again preferably at random). I would like this to all be returned from one SQLite statement. E.g.
ID Name Category Name1 Name2 Name 3
----------------------------------------
3 Bob 0 Matt Lee Susan
My attempt at this can be seen below, but it has two problems:
- The three extra names are not necessarily always different - I can't seem to exclude a name that has been previously selected because variables b/c/d are not in scope apart from their own COALESCE function.
- As each nested select uses the Random() function it’s not very efficient.
Can anyone suggest another way to select the data I need (using SQLite database)? Any help/advice is welcome - hope it is clear what I am trying to achieve, feel free to ask for any clarifications.
My current attempt:
SELECT a.Id,
a.Name,
a.Category,
COALESCE((SELECT b.Name
FROM Details b
WHERE b.Id NOT IN (a.Id)
AND b.Category IN (0)
ORDER BY Random()
LIMIT 1),'') as "Name1",
COALESCE((SELECT c.Name
FROM Details c
WHERE c.Id NOT IN (a.Id)
AND c.Category IN (0)
ORDER BY Random()
LIMIT 1),'') as "Name2",
COALESCE((SELECT d.Name
FROM Details d
WHERE d.Id NOT IN (a.Id)
AND d.Category IN (0)
ORDER BY Random()
LIMIT 1),'') as "Name3"
FROM Details a
AND a.Category IN (0)
ORDER BY Random()
LIMIT 1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我和神经病在一起。您没有说为什么需要将随机选择的四个名称放入一行,以及为什么必须在后端完成此操作。
如果您担心性能,请在客户端中生成随机整数(范围 >= min(pkcol) 和 <= max(pkcol) ),直到找到四个不同的行(即实体/名称)。有可能不存在具有生成的 ID 之一的行,但这只需几毫秒即可找到。采用随机密钥方法,您可以避免订单。即使对于具有数十亿行的表,该方法也能快速工作。
PS(发现这是一个iPhone应用程序后)
您需要一次调用来获取最小和最大 ID 值(这是 PK,因此使用索引)。然后,您至少需要再次调用数据库(同样是索引辅助),以使用随机生成的 PK 值获取四个不同的行 [其中 ID in (a, b, c , d) ] 最大调用次数未知;有多少取决于主键序列的密度。我不认为这会产生过多的 I/O,而且它比通过 Random() 进行的排序占用的资源要少得多——尤其是在表有很多行的情况下。您始终可以随机生成 8、12、16 个 id 的 ID 列表,并让您的客户端在返回超过 4 行时仅剔除所需的 4 行。
PPS 通常,数据库连接的实例化是昂贵的,并且您不希望在循环中或比需要的频率更频繁地执行此操作。但是您可以打开一个连接,运行两个或三个有效的选择,每个选择返回几行,然后在完成手头的任务后关闭。
I'm with neurino here. You have not said why you need to put the four names chosen at random into a single row, and why this has to be done in the back-end.
If you're concerned about performance, generate random integers in your client (range >= min(pkcol) and <= max(pkcol) ) until you've found four distinct rows (i.e. entities/names). There's a chance that no row exists with one of the generated ids, but that takes mere milliseconds to find out. Taking that random-key approach you could avoid an order by. The approach would work quickly even for tables with billions of rows.
P.S. (After finding out it was an iPhone app)
You need one call to get the min and max ID values (it's the PK so that uses an index). Then you need at least another call to the DB (again, index-assisted) to get the four distinct rows using your randomly generated PK values [where ID in (a, b, c , d) ] The maximum number of calls is unknown; how many will depend on the density of your primary key sequence. I do not believe this would be an inordinate amount of I/O and it would be considerably less resource-intensive than an order by Random()--especially if the table has many rows. You could always generate an ID list of 8, 12, 16 ids at random and have your client cull only the 4 rows required if more than 4 are returned.
P.P.S. Typically it is the instantiation of the database connection that is expensive, and you don't want to do that in a loop or any more often than you need to. But you can open a connection, run two or three efficient selects that return a few rows each, and then close if you're done with the task at hand.
多语句解决方案,使用临时表:
A multi-statement solution, which uses a temporary table:
做一个完整的外连接x3,然后简单地随机选择一行名称不相等的行怎么样?
How about doing a full outer join x3, then simply choosing a row at random where the names are not equal?
您还可以通过嵌套查询以将名称作为返回值来实现您想要的目的。您基本上首先获得第四个值,然后获得第三个值,依此类推。始终确保它们不匹配。我应该通过 Id 字段并检查 Id 是否不冲突(而不是名称),但这种方式意味着唯一的名称。
除了在代码中生成随机 ID 之外,我没有看到解决 Random() 函数及其生成速度缓慢的方法,但这还有其他问题。
You could also achieve what you want by nesting the queries to have the names as a returned value. You essentially get the fourth value first and then the third and so on. All the while ensuring that they don't match. I should have passed through the Id field and checked that the Id's don't conflict rather than the names, but this way means unique names.
I don't see a way around the Random() function and the slowness it generates other than generating the random Id's in code, but that has other problems.
有两种主要方法可以通过 random() 语句优化 order。
第一个是完全删除整个表步骤的排序,但它不适用于所有平台:
limit 1 offset random()
,而不是order by random() limit 1
。另一个适用于所有平台,但要求您的主键相当密集(一个自动递增的整数,如果有任何删除,则确保它们如此)。从随机起点开始预取一组较小的 ID,并在子查询中使用它们:
There are two main methods to optimize an order by random() statement.
The first is to remove the sorting of the whole table step altogether, but it doesn't work on all platforms:
limit 1 offset random()
, rather thanorder by random() limit 1
.The other works on all platforms but requires that your primary keys be reasonably dense (an auto-incrementing integer with no little if any deletes ensures they are). Pre-fetch a smaller set of IDs starting at a random starting point, and to use them in a subquery: