MySQL 在商店定位器-Google 地图应用程序的分组查询中选择错误的列值
首先我必须声明我是一个极端的新手。现在刚刚使用 PHP 和 MySQL 大约 4 周。如果我没有正确地格式化这个问题或者没有使用正确的艺术术语,请提前接受我的歉意。
我正在构建一个商店定位器应用程序。为了进行测试,我有一个名为“位置”的表,其中包含 5 家不同连锁餐厅的名称、地址和纬度/经度数据,总共 1500 条(位置)记录。
我让应用程序作为标准商店定位器运行良好,用户输入他们的地址和以英里为单位的距离进行搜索。当删除 GROUP BY 语句时,下面的代码会正确返回这些结果。例如,当用户输入地址和距离进行搜索时,SELECT 语句会返回该距离内的所有餐馆。
我的应用要求仅返回并显示用户指定距离内每个连锁餐厅的最近位置。我添加了 GROUP BY 语句来完成此任务。返回正确的记录数以及正确的 loc_name 和与用户的距离。然而,所有其他字段永远都不正确。它们似乎是从 MIN 值之外的其他记录中随机选择的。例如,返回的第一条记录是距离 4.38 英里的 DAIRY QUEEN - 正确。但是,4.38 英里处的 DAIRY QUEEN 的地址、州、城市等不正确。
我已经广泛阅读了有关 GROUP BY 的问题以及使用 INNER JOIN 的要求也许可以解决我的问题? stackoverflow 最近的一个问答非常具体地解决了这个问题,请参阅 MySQL在分组依据查询中选择错误的列值。到目前为止,我读过的所有解决方案都让我使用计算出的距离作为执行 JOIN 的关键,但我不明白这是怎么可能的。
问题 1:如何构造 SELECT 语句以获得所需的结果:位置表中仅针对每个连锁餐厅的完整数据字段行?
关于我的代码的注释,它不像看起来那么可怕,也不需要理解它来解决我的问题:
MIN() 中的三角公式计算用户地址之间的英里距离(转换为纬度/经度)和每个位置记录的纬度/经度。相信我,这工作正常。
ORDER BY 13 语句:表示按 SELECT 中列出的第 13 个字段进行排序,在本例中它是别名“distance”。我提到这一点是因为我注意到这种语法并不为人所知。
WHERE 语句后面的代码检查用户的地址(以纬度/经度为单位)是否位于具有纬度/经度角的框中,这些角是用户指定的要搜索位置的距离。这称为“边界框”。它用于优化搜索时间。人们可以简单地测试“距离”是否小于用户输入的距离,但这需要读取整个位置文件。生产版本将包含大约一百万条记录。位置表有一个索引:(loc_lat,loc_lon,loc_id)。我的理解是,在WHERE语句中使用Bounding Box会限制需要读取的索引的范围。问题2:我的实现方式是这样吗,会按照我描述的方式进行处理吗?问题1的解决方案会保留优化吗?
预先感谢大家的帮助。我真的才刚接触 mySQL 和 PHP 4 周,正如你所看到的,这超出了我的理解范围?
我的问题归结为这一点。应该如何修改此 SELECT 以仅返回 1 个位置表记录,其中每个 loc_name 都有相应的字段,即距用户输入地址的最小距离?
SELECT loc_id,loc_name,loc_address_1,loc_address_2,loc_city,
loc_state,loc_postal_code,loc_phone,loc_fax,
loc_lat,loc_lon,loc_geocoded_status,
MIN( ((ACOS( SIN( $lat * PI( ) /180 ) * SIN( loc_lat * PI( ) /180 ) +
COS( $lat * PI( ) /180 ) * COS( loc_lat * PI( ) /180 ) *
COS( ($long - loc_lon) * PI( ) /180 ) ) *180 / PI( )) *60 * 1.1515) )
AS distance
FROM locations WHERE (loc_lat between $lat1 and $lat2
AND loc_lon between $lon1 and $lon2)
AND loc_geocoded_status = 1
GROUP BY loc_name
ORDER BY 13
First I must declare that I'm an extreme newbie. Have just been using PHP and MySQL for about 4 weeks now. Please accept my apologies in advance if I haven't formatted this question correctly or am not using proper terms of art.
I am building a store locator application. For testing, I have a table named 'locations' that contains name, address and latitude/longitude data for 5 different restaurant chains with 1500 total (location) records.
I got the app running fine as a standard store locator where the user inputs their address and a distance in miles to search. The code below properly returns those results when the GROUP BY statement is removed. For example, when the user inputs their address and distance to search, the SELECT statement returns ALL of the restaurants within that distance just fine.
My app requires that only the nearest location of each restaurant chain within the user specified distance be returned and displayed. I added the GROUP BY statement to accomplish this. The correct number of records are returned with the correct loc_name and distance from the user. However, all of the other fields are never correct. They seem randomly selected from other records that are outside of the MIN value. For example, the 1st record returned is for DAIRY QUEEN at a distance of 4.38 mile - that's correct. However, the address, state, city, etc. for the DAIRY QUEEN at 4.38 miles are not correct.
I have read extensively about issues with GROUP BY and the requirement to use an INNER JOIN perhaps to solve my problem? A recent question and answer in stackoverflow addresses this very specifically, see MySQL Selecting wrong column value in Group By query. All the solutions I've read so far would have me using the computed distance as a key to perform the JOIN and I don't see how that's possible.
Question 1: How do I construct the SELECT statement to get my desired result: the complete row of data fields for only one each of the restaurant chains in the locations table?
Notes about my code that is not as scary as it looks and not necessary to understand to solve my problem:
The trig formula within MIN() computes the distance in miles between the user's address (translated to latitude/longitude) and the lat/lon of each location record. Trust me, this works OK.
ORDER BY 13 statement: means to ORDER by the 13th field listed in the SELECT, in this case it is the alias 'distance'. I'm mentioning this because I've noted that this syntax is not well known.
The code following the WHERE statement checks to see if the user's address (in lat/lon) is within a box with lat/lon corners that are the user specified distance of locations to search. This is called a 'Bounding Box'. It's used to optimize the search time. One could simply test to see if 'distance' is <= than user input distance but that would require reading the entire locations file. The production version will contain approximately a million records. An index is in place for the locations table: (loc_lat,loc_lon,loc_id). My understanding is that using the Bounding Box in the WHERE statement will limit the range of the index that needs to be read. Question 2: Is this true the way I have it implemented, will it be processed as I described? Will the solution to Question 1 retain the optimization?
Thank all of you in advance for your help. I'm really just 4 weeks into mySQL and PHP and as you can see, in over my head?
My question boils down to this. How should this SELECT be modified to return only 1 locations table record with corresponding fields for each loc_name that is the minimum distance from the user's input address??
SELECT loc_id,loc_name,loc_address_1,loc_address_2,loc_city,
loc_state,loc_postal_code,loc_phone,loc_fax,
loc_lat,loc_lon,loc_geocoded_status,
MIN( ((ACOS( SIN( $lat * PI( ) /180 ) * SIN( loc_lat * PI( ) /180 ) +
COS( $lat * PI( ) /180 ) * COS( loc_lat * PI( ) /180 ) *
COS( ($long - loc_lon) * PI( ) /180 ) ) *180 / PI( )) *60 * 1.1515) )
AS distance
FROM locations WHERE (loc_lat between $lat1 and $lat2
AND loc_lon between $lon1 and $lon2)
AND loc_geocoded_status = 1
GROUP BY loc_name
ORDER BY 13
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
四个星期以来你已经取得了很大的进步。它有助于包含最少的 DDL 和 INSERT 语句,以鼓励更多的人做出响应。
是的,这对于 MySQL 来说是正常的。 MySQL 标准分组依据一文解释了这种行为。
您需要一个确定的结果集,而不是不确定的结果集。此语句应该为您提供一个两列结果集,其中每个位置名称包含一行。
您应该能够在位置名称和距离上使用该语句和 JOIN 表达式来获取您需要的其他列。
我将算术包装在名为“distance”的函数中,然后
您需要添加边界框信息。当我试图确保 JOIN 正常工作时,我忽略了它。我在内部 SELECT 子句中有一个不必要的 ORDER BY,但那是一个前咖啡因子句,所以我删除了它。
您可能还需要 loc_name 上的索引,因为它在
GROUP BY
中使用。请参阅 MySQL 文档以了解 EXPLAIN 语法。You've come a long way in four weeks. It helps to include minimal DDL and INSERT statements to encourage more people to respond.
Yes, that's normal for MySQL. The article MySQL Standard Group By explains that behavior.
You need a determinate result set, not an indeterminate one. This statement should give you a two-column result set containing one row per location name.
And you should be able to use that statement and a JOIN expression on the location name and distance to get the other columns you need.
I wrapped the arithmetic in a function named "distance", then
You'll need to add your bounding box information. I left it out while I was trying to make sure the JOIN worked right. I had an unnecessary ORDER BY in the inner SELECT clause, but that was a pre-caffeine clause, so I deleted it.
You will probably need an index on loc_name, too, because it's used in the
GROUP BY
. See MySQL's docs for EXPLAIN syntax.