如何通过“多”中的某个标准对基于一对多关联的查询结果进行分组？

发布于 2024-10-16 13:52:03 字数 2357 浏览 2 评论 0原文

请原谅这个尴尬的标题。我很难将我的问题提炼成一个短语。如果有人能提出更好的方案，请随意。

我有以下简化模式：

vendors
  INT id

locations
  INT id
  INT vendor_id
  FLOAT latitude
  FLOAT longitude

我完全能够返回最近供应商的列表，按邻近程度排序，受半径近似值限制：

SELECT * FROM locations
WHERE latitude IS NOT NULL AND longitude IS NOT NULL
  AND ABS(latitude - 30) + ABS(longitude - 30) < 50
ORDER BY ABS(latitude - 30) + ABS(longitude - 30) ASC

目前我无法找到重复订单/限制项的方法。我最初尝试将其别名为 SELECT 字段中的“距离”，但 psql 告诉我该别名在 WHERE 子句中不可用。美好的。如果有一些奇特的裤子可以解决这个问题，我会洗耳恭听，但我的主要问题是：

我想做的是返回一个供应商列表，每个供应商都与最近的位置相连，并且将此列表按距离排序并按半径限制。

因此，假设我有 2 个供应商，每个供应商都有两个位置。我想要一个限制半径的查询，以便四个位置中只有一个位于其中，以返回该位置的关联供应商以及供应商本身。如果半径包含所有位置，我希望供应商 1 显示其位置之间最接近的位置，供应商 2 显示其位置之间最接近的位置，最终根据供应商 1 和 2 最近位置的接近程度对供应商进行排序。

在 MySQL 中，我通过使用 GROUP BY 和 MIN(distance) 设法获取每个供应商行中最近的位置。但 PostgreSQL 对 GROUP BY 的使用似乎更严格。

如果可能的话，我想避免干扰 SELECT 子句。如果可能的话，我还想重用上述查询的 WHERE 和 ORDER 部分。但这些绝不是绝对的要求。

我在 DISTINCT ON 和 GROUP BY 方面进行了一些老套的尝试，但是这些给我带来了相当大的麻烦，主要是因为我在其他地方丢失了镜像语句，但我不会这样做。现在不详细阐述。

解决方案

我最终采用了基于 OMG 小马的出色回答。

SELECT vendors.* FROM (
  SELECT locations.*, 
    ABS(locations.latitude - 2.1) + ABS(locations.longitude - 2.1) AS distance,
    ROW_NUMBER() OVER(PARTITION BY locations.locatable_id, locations.locatable_type
      ORDER BY ABS(locations.latitude - 2.1) + ABS(locations.longitude - 2.1) ASC) AS rank
    FROM locations
    WHERE locations.latitude IS NOT NULL
    AND locations.longitude IS NOT NULL
    AND locations.locatable_type = 'Vendor'
  ) ranked_locations
INNER JOIN vendors ON vendors.id = ranked_locations.locatable_id
WHERE (ranked_locations.rank = 1)
  AND (ranked_locations.distance <= 0.5)
ORDER BY ranked_locations.distance;

与 OMG Ponies 解决方案的一些偏差：

位置现在通过 _type 进行多态关联。稍微改变一下前提。
我将连接移到了子查询之外。我不知道是否会对性能产生影响，但在我看来，将子查询视为获取位置和分区排名，然后将更大的查询视为将所有这些结合在一起的行为是有意义的。
次要取消了表名别名。尽管我已经习惯了别名，但这让我更难跟上。我会等到我对 PostgreSQL 有了更多的经验之后再从事这种工作。

原文

Please forgive the awkward title. I had a hard time distilling my question into one phrase. If anyone can come up with a better one, feel free.

I have the following simplified schema:

vendors
  INT id

locations
  INT id
  INT vendor_id
  FLOAT latitude
  FLOAT longitude

I am perfectly capable of return a list of the nearest vendors, sorted by proximity, limited by an approximation of radius:

SELECT * FROM locations
WHERE latitude IS NOT NULL AND longitude IS NOT NULL
  AND ABS(latitude - 30) + ABS(longitude - 30) < 50
ORDER BY ABS(latitude - 30) + ABS(longitude - 30) ASC

I can't at this moment find my way around the repetition of the order/limit term. I initially attempted aliasing it as "distance" among the SELECT fields, but psql told me that this alias wasn't available in the WHERE clause. Fine. If there's some fancy pants way around this, I'm all ears, but on to my main question:

What I'd like to do is to return a list of vendors, each joined with the closest of its locations, and have this list ordered by proximity and limited by radius.

So supposing I have 2 vendors, each with two locations. I want a query that limits the radius such that only one of the four locations is within it to return that location's associated vendor alongside the vendor itself. If the radius encompassed all the locations, I'd want vendor 1 presented with the closest between its locations and vendor 2 with the closest between its locations, ultimately ordering vendors 1 and 2 based on the proximity of their closest location.

In MySQL, I managed to get the closest location in each vendor's row by using GROUP BY and then MIN(distance). But PostgreSQL seems to be stricter on the usage of GROUP BY.

I'd like to, if possible, avoid meddling with the SELECT clause. I'd also like to, if possible reuse the WHERE and ORDER parts of the above query. But these are by no means absolute requirements.

I have made hackneyed attempts at DISTINCT ON and GROUP BY, but these gave me a fair bit of trouble, mostly in terms of me missing mirrored statements elsewhere, which I won't elaborate in great detail on now.

Solution

I ended up adopting a solution based off OMG Ponies' excellent answer.

SELECT vendors.* FROM (
  SELECT locations.*, 
    ABS(locations.latitude - 2.1) + ABS(locations.longitude - 2.1) AS distance,
    ROW_NUMBER() OVER(PARTITION BY locations.locatable_id, locations.locatable_type
      ORDER BY ABS(locations.latitude - 2.1) + ABS(locations.longitude - 2.1) ASC) AS rank
    FROM locations
    WHERE locations.latitude IS NOT NULL
    AND locations.longitude IS NOT NULL
    AND locations.locatable_type = 'Vendor'
  ) ranked_locations
INNER JOIN vendors ON vendors.id = ranked_locations.locatable_id
WHERE (ranked_locations.rank = 1)
  AND (ranked_locations.distance <= 0.5)
ORDER BY ranked_locations.distance;

Some deviations from OMG Ponies' solution:

Locations are now polymorphically associated via _type. A bit of a premise change.
I moved the join outside the subquery. I don't know if there are performance implications, but it made sense in my mind to see the subquery as a getting of locations and partitioned rankings and then the larger query as an act of bringing it all together.
minor Took away table name aliasing. Although I'm plenty used to aliasing, it just made it harder for me to follow along. I'll wait until I'm more experienced with PostgreSQL before working in that flair.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初雪 2024-10-23 13:52:03

对于 PostgreSQL 8.4+，您可以使用类似 ROW_NUMBER 的分析

SELECT x.*
  FROM (SELECT v.*,
               t.*,
               ABS(t.latitude - 30) + ABS(t.longitude - 30) AS distance,
               ROW_NUMBER() OVER(PARTITION BY v.id
                                     ORDER BY ABS(t.latitude - 30) + ABS(t.longitude - 30)) AS rank
          FROM VENDORS v
          JOIN LOCATIONS t ON t.vendor_id = v.id
         WHERE t.latitude IS NOT NULL 
           AND t.longitude IS NOT NULL) x
  WHERE x.rank = 1
    AND x.distance < 50
ORDER BY x.distance

：保留距离过滤，以防排名最高的值超过 50，因此供应商不会出现。如果您不希望发生这种情况，请删除小于 50 部分的距离检查。

ROW_NUMBER 将返回一个不同的顺序值，该值会为本示例中的每个供应商重置。如果您想要重复项，则需要考虑使用 DENSE_RANK。

请参阅这篇文章，了解在 PostgreSQL 8.4 之前的版本上模拟 ROW_NUMBER。

For PostgreSQL 8.4+, you can use analytics like ROW_NUMBER:

SELECT x.*
  FROM (SELECT v.*,
               t.*,
               ABS(t.latitude - 30) + ABS(t.longitude - 30) AS distance,
               ROW_NUMBER() OVER(PARTITION BY v.id
                                     ORDER BY ABS(t.latitude - 30) + ABS(t.longitude - 30)) AS rank
          FROM VENDORS v
          JOIN LOCATIONS t ON t.vendor_id = v.id
         WHERE t.latitude IS NOT NULL 
           AND t.longitude IS NOT NULL) x
  WHERE x.rank = 1
    AND x.distance < 50
ORDER BY x.distance

I left the filtering on distance, in case the top ranked value was over 50 so the vendor would not appear. Remove the distance check being less than 50 portion if you don't want this to happen.

ROW_NUMBER will return a distinct sequential value that resets for every vendor in this example. If you want duplicates, you'd need to look at using DENSE_RANK.

See this article for emulating ROW_NUMBER on PostgreSQL pre-8.4.

回复收藏 0 原文

动听の歌 2024-10-23 13:52:03

MySQL 扩展了 GROUP BY，并且并非所有列都需要聚合。 http://dev.mysql.com/doc /refman/5.0/en/group-by-hidden-columns.html

我在这里看到了很多有同样问题的问题。诀窍是在子查询中获取必要的列，然后将其自连接到外部查询中：

create temp table locations (id int, vender_id int, latitude int, longitude int);
CREATE TABLE
insert into locations values
        (1, 1, 50, 50),
        (2, 1, 35, 30),
        (3, 2, 5, 30)
;
SELECT
     locations.*, distance
     FROM
     (
          SELECT 
              vender_id,
              MIN(ABS(latitude - 30) + ABS(longitude - 30)) as distance
              FROM locations
              WHERE latitude IS NOT NULL AND longitude IS NOT NULL
                  GROUP BY vender_id
      ) AS min_locations
      JOIN locations ON
           ABS(latitude - 30) + ABS(longitude - 30) = distance
           AND min_locations.vender_id = locations.vender_id
       WHERE distance < 50
       ORDER BY distance
;
 id | vender_id | latitude | longitude | distance 
----+-----------+----------+-----------+----------
  2 |         1 |       35 |        30 |        5
  3 |         2 |        5 |        30 |       25

MySQL extends GROUP BY and not all columns are required to be aggregates. http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html

I have seen many questions here with the same issue. The trick is to get the nececssary columns in a subquery and then self join it in the outer query:

create temp table locations (id int, vender_id int, latitude int, longitude int);
CREATE TABLE
insert into locations values
        (1, 1, 50, 50),
        (2, 1, 35, 30),
        (3, 2, 5, 30)
;
SELECT
     locations.*, distance
     FROM
     (
          SELECT 
              vender_id,
              MIN(ABS(latitude - 30) + ABS(longitude - 30)) as distance
              FROM locations
              WHERE latitude IS NOT NULL AND longitude IS NOT NULL
                  GROUP BY vender_id
      ) AS min_locations
      JOIN locations ON
           ABS(latitude - 30) + ABS(longitude - 30) = distance
           AND min_locations.vender_id = locations.vender_id
       WHERE distance < 50
       ORDER BY distance
;
 id | vender_id | latitude | longitude | distance 
----+-----------+----------+-----------+----------
  2 |         1 |       35 |        30 |        5
  3 |         2 |        5 |        30 |       25

回复收藏 0 原文

~没有更多了~