如何通过“多”中的某个标准对基于一对多关联的查询结果进行分组?
请原谅这个尴尬的标题。我很难将我的问题提炼成一个短语。如果有人能提出更好的方案,请随意。
我有以下简化模式:
vendors
INT id
locations
INT id
INT vendor_id
FLOAT latitude
FLOAT longitude
我完全能够返回最近供应商的列表,按邻近程度排序,受半径近似值限制:
SELECT * FROM locations
WHERE latitude IS NOT NULL AND longitude IS NOT NULL
AND ABS(latitude - 30) + ABS(longitude - 30) < 50
ORDER BY ABS(latitude - 30) + ABS(longitude - 30) ASC
目前我无法找到重复订单/限制项的方法。我最初尝试将其别名为 SELECT
字段中的“距离”,但 psql 告诉我该别名在 WHERE
子句中不可用。美好的。如果有一些奇特的裤子可以解决这个问题,我会洗耳恭听,但我的主要问题是:
我想做的是返回一个供应商列表,每个供应商都与最近的位置相连,并且将此列表按距离排序并按半径限制。
因此,假设我有 2 个供应商,每个供应商都有两个位置。我想要一个限制半径的查询,以便四个位置中只有一个位于其中,以返回该位置的关联供应商以及供应商本身。如果半径包含所有位置,我希望供应商 1 显示其位置之间最接近的位置,供应商 2 显示其位置之间最接近的位置,最终根据供应商 1 和 2 最近位置的接近程度对供应商进行排序。
在 MySQL 中,我通过使用 GROUP BY
和 MIN(distance)
设法获取每个供应商行中最近的位置。但 PostgreSQL 对 GROUP BY 的使用似乎更严格。
如果可能的话,我想避免干扰 SELECT
子句。如果可能的话,我还想重用上述查询的 WHERE
和 ORDER
部分。但这些绝不是绝对的要求。
我在 DISTINCT ON
和 GROUP BY
方面进行了一些老套的尝试,但是这些给我带来了相当大的麻烦,主要是因为我在其他地方丢失了镜像语句,但我不会这样做。现在不详细阐述。
解决方案
我最终采用了基于 OMG 小马的出色回答。
SELECT vendors.* FROM (
SELECT locations.*,
ABS(locations.latitude - 2.1) + ABS(locations.longitude - 2.1) AS distance,
ROW_NUMBER() OVER(PARTITION BY locations.locatable_id, locations.locatable_type
ORDER BY ABS(locations.latitude - 2.1) + ABS(locations.longitude - 2.1) ASC) AS rank
FROM locations
WHERE locations.latitude IS NOT NULL
AND locations.longitude IS NOT NULL
AND locations.locatable_type = 'Vendor'
) ranked_locations
INNER JOIN vendors ON vendors.id = ranked_locations.locatable_id
WHERE (ranked_locations.rank = 1)
AND (ranked_locations.distance <= 0.5)
ORDER BY ranked_locations.distance;
与 OMG Ponies 解决方案的一些偏差:
- 位置现在通过
_type
进行多态关联。稍微改变一下前提。 - 我将连接移到了子查询之外。我不知道是否会对性能产生影响,但在我看来,将子查询视为获取位置和分区排名,然后将更大的查询视为将所有这些结合在一起的行为是有意义的。
- 次要 取消了表名别名。尽管我已经习惯了别名,但这让我更难跟上。我会等到我对 PostgreSQL 有了更多的经验之后再从事这种工作。
Please forgive the awkward title. I had a hard time distilling my question into one phrase. If anyone can come up with a better one, feel free.
I have the following simplified schema:
vendors
INT id
locations
INT id
INT vendor_id
FLOAT latitude
FLOAT longitude
I am perfectly capable of return a list of the nearest vendors, sorted by proximity, limited by an approximation of radius:
SELECT * FROM locations
WHERE latitude IS NOT NULL AND longitude IS NOT NULL
AND ABS(latitude - 30) + ABS(longitude - 30) < 50
ORDER BY ABS(latitude - 30) + ABS(longitude - 30) ASC
I can't at this moment find my way around the repetition of the order/limit term. I initially attempted aliasing it as "distance" among the SELECT
fields, but psql told me that this alias wasn't available in the WHERE
clause. Fine. If there's some fancy pants way around this, I'm all ears, but on to my main question:
What I'd like to do is to return a list of vendors, each joined with the closest of its locations, and have this list ordered by proximity and limited by radius.
So supposing I have 2 vendors, each with two locations. I want a query that limits the radius such that only one of the four locations is within it to return that location's associated vendor alongside the vendor itself. If the radius encompassed all the locations, I'd want vendor 1 presented with the closest between its locations and vendor 2 with the closest between its locations, ultimately ordering vendors 1 and 2 based on the proximity of their closest location.
In MySQL, I managed to get the closest location in each vendor's row by using GROUP BY
and then MIN(distance)
. But PostgreSQL seems to be stricter on the usage of GROUP BY
.
I'd like to, if possible, avoid meddling with the SELECT
clause. I'd also like to, if possible reuse the WHERE
and ORDER
parts of the above query. But these are by no means absolute requirements.
I have made hackneyed attempts at DISTINCT ON
and GROUP BY
, but these gave me a fair bit of trouble, mostly in terms of me missing mirrored statements elsewhere, which I won't elaborate in great detail on now.
Solution
I ended up adopting a solution based off OMG Ponies' excellent answer.
SELECT vendors.* FROM (
SELECT locations.*,
ABS(locations.latitude - 2.1) + ABS(locations.longitude - 2.1) AS distance,
ROW_NUMBER() OVER(PARTITION BY locations.locatable_id, locations.locatable_type
ORDER BY ABS(locations.latitude - 2.1) + ABS(locations.longitude - 2.1) ASC) AS rank
FROM locations
WHERE locations.latitude IS NOT NULL
AND locations.longitude IS NOT NULL
AND locations.locatable_type = 'Vendor'
) ranked_locations
INNER JOIN vendors ON vendors.id = ranked_locations.locatable_id
WHERE (ranked_locations.rank = 1)
AND (ranked_locations.distance <= 0.5)
ORDER BY ranked_locations.distance;
Some deviations from OMG Ponies' solution:
- Locations are now polymorphically associated via
_type
. A bit of a premise change. - I moved the join outside the subquery. I don't know if there are performance implications, but it made sense in my mind to see the subquery as a getting of locations and partitioned rankings and then the larger query as an act of bringing it all together.
- minor Took away table name aliasing. Although I'm plenty used to aliasing, it just made it harder for me to follow along. I'll wait until I'm more experienced with PostgreSQL before working in that flair.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于 PostgreSQL 8.4+,您可以使用类似 ROW_NUMBER 的分析
:保留距离过滤,以防排名最高的值超过 50,因此供应商不会出现。如果您不希望发生这种情况,请删除小于 50 部分的距离检查。
ROW_NUMBER 将返回一个不同的顺序值,该值会为本示例中的每个供应商重置。如果您想要重复项,则需要考虑使用 DENSE_RANK。
请参阅这篇文章,了解在 PostgreSQL 8.4 之前的版本上模拟 ROW_NUMBER。
For PostgreSQL 8.4+, you can use analytics like ROW_NUMBER:
I left the filtering on distance, in case the top ranked value was over 50 so the vendor would not appear. Remove the distance check being less than 50 portion if you don't want this to happen.
ROW_NUMBER will return a distinct sequential value that resets for every vendor in this example. If you want duplicates, you'd need to look at using DENSE_RANK.
See this article for emulating ROW_NUMBER on PostgreSQL pre-8.4.
MySQL 扩展了 GROUP BY,并且并非所有列都需要聚合。 http://dev.mysql.com/doc /refman/5.0/en/group-by-hidden-columns.html
我在这里看到了很多有同样问题的问题。诀窍是在子查询中获取必要的列,然后将其自连接到外部查询中:
MySQL extends GROUP BY and not all columns are required to be aggregates. http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html
I have seen many questions here with the same issue. The trick is to get the nececssary columns in a subquery and then self join it in the outer query: