SQL:元组比较
在我当前的应用程序中,我需要能够执行这种类型的查询:
SELECT MIN((colA, colB, colC))
FROM mytable
WHERE (colA, colB, colC) BETWEEN (200, 'B', 'C') AND (1000, 'E', 'F')
并得到 (333, 'B', 'B')
的答案,给定以下数据:
+------+------+------+
| colA | colB | colC |
+------+------+------+
| 99 | A | A |
| 200 | A | Z |
| 200 | B | B |
| 333 | B | B |
| 333 | C | D |
| 333 | C | E |
| 333 | D | C |
| 1000 | E | G |
| 1000 | F | A |
+------+------+------+
什么是最有效的在实际的 SQL 中如何实现这一点?请记住,这是一个玩具示例,我的实际应用程序具有具有不同列和数据类型的表以及数亿行。我使用 MySQL,如果有帮助的话。您还可以假设这些列具有 PRIMARY 或 UNIQUE 索引。
如果解决方案可以轻松扩展到更多/更少的列,那就更好了。
元组比较:
有几个人问过,所以我应该把这个放在问题中。元组按字典顺序排序,这意味着序列的排序与其第一个不同元素的排序相同。例如,(1,2,x)<0。 (1,2,y) 返回与 x < 相同的结果y。
值得注意的是,SQL(或者至少是 mysql)正确地实现了这一点:
mysql> select (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F');
+--------------------------------------------------------------------------+
| (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F') |
+--------------------------------------------------------------------------+
| 1 |
+--------------------------------------------------------------------------+
1 row in set (0.00 sec)
这是创建示例所需的 SQL:
create table mytable select 333 colA, 'B' colB, 'B' colC;
insert into mytable values (200, 'B', 'B'), (333, 'C', 'D'), (1000, 'E', 'G'),
(200, 'A', 'Z'), (1000, 'F', 'A'), (333, 'C', 'E'), (333, 'D', 'C'),
(99, 'A', 'A');
alter table mytable add unique index myindex (colA, colB, colC);
添加此索引似乎会导致表按字典顺序排序,这很有趣。在我们的生产系统中情况并非如此。
In my current application, I need to be able to do this type of query:
SELECT MIN((colA, colB, colC))
FROM mytable
WHERE (colA, colB, colC) BETWEEN (200, 'B', 'C') AND (1000, 'E', 'F')
and get the answer of (333, 'B', 'B')
, given this data:
+------+------+------+
| colA | colB | colC |
+------+------+------+
| 99 | A | A |
| 200 | A | Z |
| 200 | B | B |
| 333 | B | B |
| 333 | C | D |
| 333 | C | E |
| 333 | D | C |
| 1000 | E | G |
| 1000 | F | A |
+------+------+------+
What is the most efficient way to accomplish this in real SQL? Please keep in mind that this is a toy example, and that my actual application has tables with varying columns and data types, and hundreds of million of rows. I use MySQL, if that helps. You can also assume that these columns have a PRIMARY or UNIQUE index on them.
If the solution is easily extensible to more/less columns, that's even better.
Tuple Comparison:
Several have asked so I should put this in the question. Tuples are ordered lexicographically, meaning that the sequences are ordered the same as their first differing elements. For example, (1,2,x) < (1,2,y) returns the same as x < y.
It's worth noting that SQL (or at least mysql) implements this correctly:
mysql> select (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F');
+--------------------------------------------------------------------------+
| (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F') |
+--------------------------------------------------------------------------+
| 1 |
+--------------------------------------------------------------------------+
1 row in set (0.00 sec)
Here's the necessary SQL to create the example:
create table mytable select 333 colA, 'B' colB, 'B' colC;
insert into mytable values (200, 'B', 'B'), (333, 'C', 'D'), (1000, 'E', 'G'),
(200, 'A', 'Z'), (1000, 'F', 'A'), (333, 'C', 'E'), (333, 'D', 'C'),
(99, 'A', 'A');
alter table mytable add unique index myindex (colA, colB, colC);
Adding this index seems to cause the table to be sorted lexicographically, which is interesting. This isn't true in our production system.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
就这么做:
效果很好。我怀疑它也应该相当快。
这是等效的,但它可能具有更好的性能,具体取决于您的表:
Just do:
It works just fine. And I suspect is should be pretty fast, too.
This is equivalent but it may have better performance, depending on your tables:
---编辑---:(删除了之前的错误试验)
第二次尝试(不是真正的关系代数)。
这有效,但仅当字段为 char(1) 时:
我认为显示
mytable
中小于或等于同一个表的元组的所有元组组合的视图可能会有所帮助,因为它可以用于其他比较:使用类似的技术,这解决了问题。它适用于任何类型的字段(int、float、任何长度的 char)。如果尝试添加更多字段,这将是一种尴尬和复杂的情况。
还定义一个函数:
并用它来解决相同或相似的问题。这又解决了这个问题。查询很优雅,但如果字段的类型或数量发生更改,则必须创建新函数。
在此之前,因为条件
(colA, colB, colC) BETWEEN ('A', 'B', 'C') AND ('D', 'E', 'F')
为MySQL 中不允许,我认为
('A', 'B', 'C') <= (colA, colB, colC)
也是不允许的。但我错了。
---EDIT---: (Previous wrong trials removed)
2nd try (not really relational algebra).
This works but only when the fields are char(1):
I thought that a view that shows all combinations of tuples from
mytable
that are less than or equal to tuples of the same table might be helpful, as it can be used for other comparisons:Using similar technique, this solves the question. It works with any kind of fields (int, float, char of any length). It's going to be kind of awkard and complicated though if one tries to add more fields.
One also define a function:
and use it to solve the same or similar problems. This solves the question again. The query is elegant but a new function has to be created if the type or number of fields is changed.
Until then, and because the condition
(colA, colB, colC) BETWEEN ('A', 'B', 'C') AND ('D', 'E', 'F')
was not allowed in MySQL, I thought that
('A', 'B', 'C') <= (colA, colB, colC)
was not allowed as well. But I was wrong.