帮助解决 SQL 查询速度和性能问题
我有一个汽车分类列表网站,我正在用 PHP 开发该网站。用户使用主页上的下拉选项框输入他们正在寻找的汽车的详细信息。当他们单击“提交”时,他们将进入结果页面,这就是我遇到问题的地方。
目前的设置方式是:
- 查询数据库以查找与他们正在寻找的汽车相匹配的任何结果。查询返回汽车的 ID 和广告的邮政编码;
- 然后检查每个广告的用户邮政编码和广告的邮政编码之间的距离。这本身就需要数据库查询来查找每个广告的各个邮政编码的坐标,并且对于有时超过 350 个结果来说非常耗时;
- 然后使用 if 语句来确定距离是否小于或等于用户在主页上输入的距离。
- 如果广告在允许的距离内,则将其 ID 添加到数组中;
- 然后计算数组中的广告总数并用于确定取决于广告数量和要在页面上显示的广告数量的变量;
- 然后使用
WHERE
语句和数组中的 ID 执行广告表的第二个查询。例如SELECT * FROM adverts WHERE ID=1 AND ID=4 AND ID=23
........ 查询中使用的 ID 总数取决于第 5 点中提到的变量。当用户单击下一页时,查询将从数组中保留的位置重新运行,然后重新创建并执行查询。
我遇到的问题是它需要很长时间才能完成,我一直在寻找一种更节省资源和时间的方式来完成它。
最初的设计是,针对每个用户对汽车的特定要求,使用 WHERE 子句执行查询,然后在输出到页面之前,使用 if 语句检查距离。这导致了页面编号问题,因为无法确定与查询中返回的广告的距离要求相匹配的广告数量,因此在收集完整广告之前满足距离条件即可完成此操作,因此计算要显示的广告的确切数量。
抱歉有点长 - 希望它有意义。我没有包含任何代码,因为它会使其更长,而且与实际代码相比,这是逻辑问题。
感谢您提出的任何建议。
有人要求提供表布局和 SQL。这里......
广告表
ID、品牌、型号、颜色、里程、发动机、年份、邮政编码
邮政编码表
ID、邮政编码、GridN、GridE、经度、纬度
第一个查询获取 ID 和邮政编码的 SQL
SELECT ID, Postcode FROM adverts WHERE Make = '$subMake' AND Model = '$subModel'
等
第二个查询的 SQL,以使用与距离要求匹配的 ID 获取广告详细信息:(
SELECT Make, Model, Year, Engine, Colour FROM adverts WHERE ID IN(1,2,6,90,112,898)
抱歉,如果它在语法上不正确,但它确实有效,SQL 只是多行查询字符串的粗略轮廓。)
I've got a classified listing website for cars which I'm in the process of developing built in PHP. The user enters the details of the car they are looking for using drop down option boxes on the homepage. When they click submit they are taken through to the results page, and this is where I'm having problems.
The way it is set-up at the moment is:
- The database is queried for any results matching the car they are looking for. The query returns the ID of the car and the Postcode of the advert;
- Each advert is then checked for the distance between the users postcode and the postcode of the advert. This itself requires a database query to find the coordinates for individual postcodes of each advert and is quite time consuming for what can be in excess of 350 results at times;
- An if statement is then used to determine if the distance is less than or equal to the distance the user entered on the homepage
- If the advert is within the allowed distance it's ID is added to an array;
- The total number of adverts in the array is then calculated and used to determine a variable dependant on the number of adverts and the number of adverts to be displayed on the page;
- A second query of the advert table is then executed using a
WHERE
statement and the ID's in the array. e.g.SELECT * FROM adverts WHERE ID=1 AND ID=4 AND ID=23
........ The total number of ID's used in the query depend on the variable mentioned in point 5. When the user then clicks next page the query is re-run from the position in the array that it was left at and the query is then re-created and executed.
The problem I'm having is that it is taking ages to complete and I was looking for a more resource and time concious way of completing it.
It originally was designed that a query would execute with WHERE clauses for each of the users specific requirements for the car, and then before being output to the page the distance was being checked using an if statement. This caused problems with the page numbering because it was impossible to determine the number of adverts that would match the distance requirements from the adverts returned in the query- hence it is done this way with the distance conditions being satisfied before the full adverts are collected so an exact number of adverts to be displayed is calculated.
Sorry its a little long - hope it makes sense. I haven't included any code because it would make it longer, and its a problem with the logic as opposed to the actual code.
Thanks for any suggestions you are able to make.
Someone has requested the table layout and SQL. Here goes.....
Advert Table
ID, Make, Model, Colour, Mileage, Engine, Year, Postcode
Postcode Table
ID, Postcode, GridN, GridE, Longitude, Latitude
SQL for first query to get the ID and Postcode
SELECT ID, Postcode FROM adverts WHERE Make = '$subMake' AND Model = '$subModel'
etc
SQL for the second query to get the advert details using the ID's that match the distance requirements:
SELECT Make, Model, Year, Engine, Colour FROM adverts WHERE ID IN(1,2,6,90,112,898)
(Sorry if its not syntactically correct, it does work, that SQL is just a rough outline of the many lines of the query strings.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
最大的优化是在插入广告行时查询邮政编码表并将网格引用存储在广告表中。
这将大大减少访问邮政编码表的次数。
您还可以通过对广告表进行一些简单的过滤来减少计算次数,如下所示。
从邮政编码表中获取用户 GridN 和 GridE 值。
将 minN 计算为 GridN - maxDistance ,将 maxN 计算为 GridN + maxDistance ,将 minE 计算为 GridE - maxDistance ,将 maxE 计算为 GridE + Maxdistance 。
然后,您可以像这样查询 advert 表:
为了进一步加快速度,您可以向 GridN 和 GridE 添加索引。
选择行后,您可以计算“实际”距离并拒绝超出限制的几行。
The biggest optimization would be query the postcode table and store the Grid references in the adverts table -- when you are inserting the advert row.
This would drastically reduce the number of accesses to the post code table.
You could also reduce the number of calculations by some simple filtering on the advert table as follows.
Get the Users GridN and GridE values from the post code table.
Calculate minN as GridN - maxDistance , maxN as GridN + maxDistance, minE as GridE - maxDistance, and maxE as GridE + Maxdistance.
You can then query on the advert table like so:
To further speed this up you can add indexes to GridN and GridE.
Once you have selected the rows you can calculate the "real" distance and reject the few rows that fall outside the limit.
修改查询,使其包含邮政编码之间的距离,并仅限于指定距离范围内的广告。
Amend the query so that it includes the distance between the postcodes and is restricted to those adverts within the specified distance range.
您应该将距离函数更改为包含所有可能的邮政编码组合的视图,然后您可以在查询中加入该组合,而不是点击该函数,或者您可以计算距用户邮政编码 50 公里的纬度和
经度如果您提供固定选项(大多数网站仅提供 5、10、25、50、100 作为距离选项),那么您可以预先计算这些距离计算,并且要更进一步,您可以执行额外的操作检查每个邮政编码并将其映射到附近的所有邮政编码(如果您确实想要的话),您只需要为每个邮政编码计算 5 次(5 个距离),并且您可以从先前的值中排除结果,这样您就可以从以下值中排除 5 公里10km 查询,因为您只查找距离 <= 10km。
You should change your distance function into a view with all possible combinations of postcodes, then you can join on that within your queries rather than hitting the function, or you can calculate the latitudes and longitudes that are 50km from your user's post code
Further to that if you provide fixed options (most of these websites offer 5, 10, 25, 50, 100 as the distance options only) then you can pre-compute these distance calculations and to go even further you could do the extra check and map each postcode to all the ones nearby if you really wanted, you would only need to calculate it 5 times (5 distances) for each post code, and you could exclude the results from the previous value, such that you exclude 5km from the 10km query because you just look for distance <= 10km.
根据您的数据库,可能会使用 PostGIS 之类的东西?
在 Adverts 表中为 LonLat 数据类型设置一列,然后运行 ST_DWithin 等内置函数来查找距目标记录指定距离内具有 LonLat 的所有广告。
只是要指出我发现使用静态邮政编码数据库的另一个问题是它们很快就会过时(特别是对于新版本)。您可能还想使用 Mapstraction 之类的工具从 Google / Yahoo 等返回地理编码结果并保存 LonLat - 尽管您可能需要对邮政编码输入进行更多错误检查并将返回的结果限制为完全匹配。
Depending on your database maybe use something like PostGIS?
Setup a column in the Adverts table for a LonLat datatype and then run the built-in functions such as ST_DWithin to find all adverts with a LonLat within a specified distance from the target record.
Just to point out another issue that I find with using a static postcode database is that they quickly go out of date (especially for new builds). You might also want to use something like Mapstraction to return a geocoded result from Google / Yahoo etc and save that LonLat instead - although you might have to have more error checking on the postcode input and restrict your returned results to exact matches.