SQL 查询某个位置半径内的总点

发布于 2024-08-11 03:27:32 字数 2295 浏览 2 评论 0原文

我有一个包含美国所有邮政编码的数据库表，其中包括城市、州、纬度和地区。每个邮政编码的经度。我还有一个点数据库表，每个点都有一个纬度和纬度。与它们相关的经度。我希望能够使用 1 个 MySQL 查询为我提供邮政编码表中所有唯一城市/州组合的列表，以及该城市/州给定半径内的总点数。我可以使用以下查询获取唯一的城市/州列表：

select city,state,latitude,longitude
from zipcodes 
group by city,state order by state,city;

我可以使用以下查询获取特定城市 100 英里半径内纬度“$lat”和经度“$lon”的点数：

select count(*) 
from points 
where (3959 * acos(cos(radians($lat)) * cos(radians(latitude)) * cos(radians(longitude) - radians($lon)) + sin(radians($lat)) * sin(radians(latitude)))) < 100;

我有什么我无法做的是找出如何以不会破坏我的数据库的方式组合这些查询。这是我的一次悲伤的尝试：

select city,state,latitude,longitude,
    (select count(*) from points
     where status="A" AND 
          (3959 * acos(cos(radians(zipcodes.latitude)) * cos(radians(latitude)) * cos(radians(longitude) - radians(zipcodes.longitude)) + sin(radians(zipcodes.latitude)) * sin(radians(latitude)))) < 100) as 'points' 
from zipcodes 
group by city,state order by state,city;

这些表当前具有以下索引：

Zipcodes - `zip` (zip)
Zipcodes - `location` (state,city)
Points - `status_length_location` (status,length,longitude,latitude)

当我在上一个 MySQL 查询之前运行解释时，这是输出：

+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+
| id | select_type        | table    | type | possible_keys          | key                    | key_len | ref   | rows  | Extra                           |
+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+
|  1 | PRIMARY            | zipcodes | ALL  | NULL                   | NULL                   | NULL    | NULL  | 43187 | Using temporary; Using filesort | 
|  2 | DEPENDENT SUBQUERY | points   | ref  | status_length_location | status_length_location | 2       | const | 16473 | Using where; Using index        | 
+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+

我知道我可以循环遍历所有邮政编码并计算给定半径内的匹配点数量但积分表会一直增长，我不想在邮政编码数据库中出现过时的积分总数。我希望 MySQL 专家能够告诉我我的方法的错误。预先感谢您的帮助！

原文

I have a database table of all zipcodes in the US that includes city,state,latitude & longitude for each zipcode. I also have a database table of points that each have a latitude & longitude associated with them. I'd like to be able to use 1 MySQL query to provide me with a list of all unique city/state combinations from the zipcodes table with the total number of points within a given radius of that city/state. I can get the unique city/state list using the following query:

select city,state,latitude,longitude
from zipcodes 
group by city,state order by state,city;

I can get the number of points within a 100 mile radius of a specific city with latitude '$lat' and longitude '$lon' using the following query:

select count(*) 
from points 
where (3959 * acos(cos(radians($lat)) * cos(radians(latitude)) * cos(radians(longitude) - radians($lon)) + sin(radians($lat)) * sin(radians(latitude)))) < 100;

What I haven't been able to do is figure out how to combine these queries in a way that doesn't kill my database. Here is one of my sad attempts:

select city,state,latitude,longitude,
    (select count(*) from points
     where status="A" AND 
          (3959 * acos(cos(radians(zipcodes.latitude)) * cos(radians(latitude)) * cos(radians(longitude) - radians(zipcodes.longitude)) + sin(radians(zipcodes.latitude)) * sin(radians(latitude)))) < 100) as 'points' 
from zipcodes 
group by city,state order by state,city;

The tables currently have the following indexes:

Zipcodes - `zip` (zip)
Zipcodes - `location` (state,city)
Points - `status_length_location` (status,length,longitude,latitude)

When I run explain before the previous MySQL query here is the output:

+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+
| id | select_type        | table    | type | possible_keys          | key                    | key_len | ref   | rows  | Extra                           |
+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+
|  1 | PRIMARY            | zipcodes | ALL  | NULL                   | NULL                   | NULL    | NULL  | 43187 | Using temporary; Using filesort | 
|  2 | DEPENDENT SUBQUERY | points   | ref  | status_length_location | status_length_location | 2       | const | 16473 | Using where; Using index        | 
+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+

I know I could loop through all the zipcodes and calculate the number of matching points within a given radius but the points table will be growing all the time and I'd rather not have stale point totals in the zipcodes database. I'm hoping a MySQL guru out there can show me the error of my ways. Thanks in advance for your help!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

土豪我们做朋友吧 2024-08-18 03:27:32

无论是否是 MySQL Guru，问题是，除非您找到一种过滤掉各种行的方法，否则需要计算每个点和每个城市之间的距离......

有两种通用方法可以帮助这种情况

使距离公式更简单
过滤掉给定城市 10 万半径范围内不太可能的候选者

在进入这两种改进途径之前，您应该决定这 100 英里距离所需的精度级别，还应该指出数据库覆盖了哪个地理区域（这只是美国大陆等吗？

这样做的原因是，虽然在数值上更精确，但大圆公式的计算量非常大。性能改进的另一个途径是另外存储各种“网格坐标”（或者代替） ) 纬度/经度坐标

编辑：
关于更简单（但不太精确）的公式的一些想法：
由于我们处理的是相对较小的距离（我猜测在北纬 30 度到 48 度之间），我们可以使用欧几里德距离（或者更好的是欧几里德距离的平方）而不是更复杂的球面三角函数公式.
根据预期的精度水平，甚至可以接受一个完整经度的线性距离参数，取所考虑区域的平均值（例如大约 46法定英里）。然后，该公式将变为

  LatDegInMi = 69.0
  LongDegInMi = 46.0
  DistSquared = ((Lat1 - Lat2) * LatDegInMi) ^2 + ((Long1 - Long2) * LongDegInMi) ^2

“关于具有网格信息的列进行过滤以限制距离计算所考虑的行数”的想法。
系统中的每个“点”，无论是城市还是另一个点（？送货地点、商店位置……等等）都被分配两个整数坐标，定义该点所在位置的平方（例如 25 英里 * 25 英里）。距参考点（给定城市）100 英里内的任何点的坐标在 x 方向上最多为 +/- 4，在 y 方向上最多为 +/- 4。然后，我们可以编写类似于以下

SELECT city, state, latitude, longitude, COUNT(*)
FROM zipcodes Z
JOIN points P 
  ON P.GridX IN (
    SELECT GridX - 4, GridX - 3, GridX - 2, GridX - 1, GridX, GridX +1, GridX + 2 GridX + 3, GridX +4
   FROM zipcode ZX WHERE Z.id = ZX.id)
  AND
   P.GridY IN (
    SELECT GridY - 4, GridY - 3, GridY - 2, GridY - 1, GridY, GridY +1, GridY + 2 GridY + 3, GridY +4
   FROM zipcode ZY WHERE Z.id = ZY.id)
WHERE P.Status = A
   AND ((Z.latitude - P.latitude) * LatDegInMi) ^2 
      + ((Z.longitude - P.longitude) * LongDegInMi) ^2 < (100^2)
GROUP BY city,state,latitude,longitude;

内容的查询。请注意，LongDegInMi 可以是硬编码的（对于美国大陆内的所有位置都相同），也可以来自邮政编码表中的相应记录。类似地，LatDegInMi 可以进行硬编码（几乎不需要使其变化，因为与其他不同，它相对恒定）。

之所以更快，是因为对于邮政编码表和点表之间的笛卡尔积中的大多数记录，我们根本不计算距离。我们根据索引值（GridX 和 GridY）消除它们。

这给我们带来了要生成哪些 SQL 索引的问题。当然，我们可能想要：
- GridX + GridY + 状态（在积分表上）
- GridY + GridX + 状态（可能）
- 邮政编码表上的城市 + 州 + 纬度 + 经度 + GridX + GridY

网格的另一种替代方法是根据给定城市的纬度和经度“绑定”我们将考虑的纬度和经度限制。即 JOIN 条件变成一个范围而不是 IN ：

JOIN points P 
  ON    P.latitude > (Z.Latitude - (100 / LatDegInMi)) 
    AND P.latitude < (Z.Latitude + (100 / LatDegInMi)) 
    AND P.longitude > (Z.longitude - (100 / LongDegInMi)) 
    AND P.longitude < (Z.longitude + (100 / LongDegInMi))

MySQL Guru or not, the problem is that unless you find a way of filtering out various rows, the distance needs to be calculated between each point and each city...

There are two general approaches that may help the situation

make the distance formula simpler
filter out unlikely candidates to the 100k radius from a given city

Before going into these two avenue of improvement, you should decide on the level of precision desired with regard to this 100 miles distance, also you should indicate which geographic area is covered by the database (is this just continental USA etc.

The reason for this is that while more precise numerically, the Great Circle formula, is very computationally expensive. Another avenue of performance improvement would be to store "Grid coordinates" of sorts in addtion (or instead of) the Lat/Long coordinates.

Edit:
A few ideas about a simpler (but less precise) formula:
Since we're dealing with relatively small distances, (and I'm guessing between 30 and 48 deg Lat North), we can use the euclidean distance (or better yet the square of the euclidean distance) rather than the more complicated spherical trigonometry formulas.
depending on the level of precision expected, it may even be acceptable to have one single parameter for the linear distance for a full degree of longitude, taking something average over the area considered (say circa 46 statute miles). The formula would then become

  LatDegInMi = 69.0
  LongDegInMi = 46.0
  DistSquared = ((Lat1 - Lat2) * LatDegInMi) ^2 + ((Long1 - Long2) * LongDegInMi) ^2

On the idea of a columns with grid info to filter to limit the number of rows considered for distance calculation.
Each "point" in the system, be it a city, or another point (?delivery locations, store locations... whatever) is assigned two integer coordinate which define the square of say 25 miles * 25 miles where the point lies. The coordinates of any point within 100 miles from the reference point (a given city), will be at most +/- 4 in the x direction and +/- 4 in the y direction. We can then write a query similar to the following

SELECT city, state, latitude, longitude, COUNT(*)
FROM zipcodes Z
JOIN points P 
  ON P.GridX IN (
    SELECT GridX - 4, GridX - 3, GridX - 2, GridX - 1, GridX, GridX +1, GridX + 2 GridX + 3, GridX +4
   FROM zipcode ZX WHERE Z.id = ZX.id)
  AND
   P.GridY IN (
    SELECT GridY - 4, GridY - 3, GridY - 2, GridY - 1, GridY, GridY +1, GridY + 2 GridY + 3, GridY +4
   FROM zipcode ZY WHERE Z.id = ZY.id)
WHERE P.Status = A
   AND ((Z.latitude - P.latitude) * LatDegInMi) ^2 
      + ((Z.longitude - P.longitude) * LongDegInMi) ^2 < (100^2)
GROUP BY city,state,latitude,longitude;

Note that the LongDegInMi could either be hardcoded (same for all locations within continental USA), or come from corresponding record in the zipcodes table. Similarly, LatDegInMi could be hardcoded (little need to make it vary, as unlike the other it is relatively constant).

The reason why this is faster is that for most records in the cartesian product between the zipcodes table and the points table, we do not calculate the distance at all. We eliminate them on the basis of a index value (the GridX and GridY).

This brings us to the question of which SQL indexes to produce. For sure, we may want:
- GridX + GridY + Status (on the points table)
- GridY + GridX + status (possibly)
- City + State + latitude + longitude + GridX + GridY on the zipcodes table

An alternative to the grids is to "bound" the limits of latitude and longitude which we'll consider, based on the the latitude and longitude of the a given city. i.e. the JOIN condition becomes a range rather than an IN :

JOIN points P 
  ON    P.latitude > (Z.Latitude - (100 / LatDegInMi)) 
    AND P.latitude < (Z.Latitude + (100 / LatDegInMi)) 
    AND P.longitude > (Z.longitude - (100 / LongDegInMi)) 
    AND P.longitude < (Z.longitude + (100 / LongDegInMi))

回复收藏 0 原文

对不⑦ 2024-08-18 03:27:32

当我进行这些类型的搜索时，我的需求允许一些近似值。因此，我使用第二个查询中的公式首先计算“边界”——允许半径的极值处的四个纬度/经度值，然后采用这些边界并执行一个简单的查询以查找其中的匹配项（小于最大纬度、经度，大于最小纬度、经度）。所以我最终得到的是一个正方形内的所有东西，位于由半径定义的圆内。

回复收藏 0 原文

爱你不解释 2024-08-18 03:27:32

SELECT * FROM tblLocation 
    WHERE 2 > POWER(POWER(Latitude - 40, 2) + POWER(Longitude - -90, 2), .5)

其中 2 >部分是平行线的数量，40和-90是测试点的纬度/经度

对不起，我没有使用你的表名或结构，我只是从我的一个数据库中的一个存储过程中复制了这个。

如果我想查看邮政编码中的点数，我想我会这样做：

SELECT 
    ParcelZip, COUNT(LocationID) AS LocCount 
FROM 
    tblLocation 
WHERE 
    2 > POWER(POWER(Latitude - 40, 2) + POWER(Longitude - -90, 2), .5)
GROUP BY 
    ParcelZip

获取范围内所有位置的总数将如下所示：

SELECT 
    COUNT(LocationID) AS LocCount 
FROM 
    tblLocation 
WHERE 
    2 > POWER(POWER(Latitude - 40, 2) + POWER(Longitude - -90, 2), .5)

交叉连接在这里可能效率低下，因为我们正在讨论大量记录，但这应该在单个查询中完成工作：

SELECT 
    ZipCodes.ZipCode, COUNT(PointID) AS LocCount 
FROM
    Points
CROSS JOIN 
    ZipCodes
WHERE 
    2 > POWER(POWER(Points.Latitude - ZipCodes.Latitude, 2) + POWER(Points.Longitude - ZipCodes.Longitude, 2), .5)
GROUP BY 
    ZipCodeTable.ZipCode

SELECT * FROM tblLocation 
    WHERE 2 > POWER(POWER(Latitude - 40, 2) + POWER(Longitude - -90, 2), .5)

where the 2 > part would be the number of parallels away and 40 and -90 are lat/lon of the test point

Sorry I didn't use your tablenames or structures, I just copied this out of one of my stored procedures I have in one of my databases.

If I wanted to see the number of points in a zip code I suppose I would do something like this:

SELECT 
    ParcelZip, COUNT(LocationID) AS LocCount 
FROM 
    tblLocation 
WHERE 
    2 > POWER(POWER(Latitude - 40, 2) + POWER(Longitude - -90, 2), .5)
GROUP BY 
    ParcelZip

Getting the total count of all locations in the range would look like this:

SELECT 
    COUNT(LocationID) AS LocCount 
FROM 
    tblLocation 
WHERE 
    2 > POWER(POWER(Latitude - 40, 2) + POWER(Longitude - -90, 2), .5)

A cross join may be inefficient here since we are talking about a large quantity of records but this should do the job in a single query:

SELECT 
    ZipCodes.ZipCode, COUNT(PointID) AS LocCount 
FROM
    Points
CROSS JOIN 
    ZipCodes
WHERE 
    2 > POWER(POWER(Points.Latitude - ZipCodes.Latitude, 2) + POWER(Points.Longitude - ZipCodes.Longitude, 2), .5)
GROUP BY 
    ZipCodeTable.ZipCode

回复收藏 0 原文

~没有更多了~