通过匹配邮政编码字符串查找与其他英国邮政编码最接近的英国邮政编码
这是一个让我好几天都清醒的问题。到目前为止我得出的唯一结论是红牛通常不会帮助程序员。
我的应用程序中有一个场景,其中有几个工作(1 到 50)。该职位有一个地址,我有一个地址的以下属性:邮政编码、纬度和经度。
我也有一张工人表,他们也有地址。虽然工作或工人是通过屏幕创建的,但我使用 Google 地图查询来确保提供的邮政编码有效且位于英国,以便验证所有地址。
我正在使用调度程序控件在 y 轴上显示一些工作人员,并在 x 轴上显示时间线。每个作业都有一个日期,并且只能在作业日期的调度程序上垂直移动。用户选择多个作业,它们会显示在靠近调度程序的篮子中。然后,用户可以将作业拖放到工人身上。所有这些都是手动的,所以它可以工作。
我的任务是自动执行此操作,以便用户除了验证和分配作业之外无需执行太多操作。因此,我必须使该过程自动化。
每个工人都有一个名为 WillingMaximumDistanceTravel 的属性,它是一个代表英里的整数,工人愿意为工作出差。
现在令人头疼的是:我有超过 1500 名工人。我有一个实用程序函数,它使用 Newtonsoft 的 Json Convert 来反序列化来自 Google 地图的响应流。我需要向它提供邮政编码 A 和 B。
我还计划向数据库引入一个新表来存储距离查找结果,如邮政编码 A、邮政编码 B 和距离。因此,如果我发现自己再次比较相同的邮政编码,我只会从数据库中检索结果,慢慢地,最终,我不再需要再麻烦谷歌了,因为这个表将非常全面。
我不能使用简单的半正矢公式,因为乌鸦飞行路径不是我在这里的要求。这样做的痛苦在于需要花费大量时间来计算。有些工作人员可以行驶超过 10 英里,而有些工作人员的行程从 15 到 80 英里不等。我必须从列表中选择第一项工作,并与系统中每个适用的工作人员一起运行它!我想知道英国邮政编码有一个模式。如果我们对英国邮政编码列表进行排序,我们能否根据字母数字模式粗略估计我们将在哪里达到 100 英里标记、200 英里标记等等?
如果有人对代码感兴趣,请写一行,我将粘贴它。
Here is a question that has me awake for a number of days now. The only conclusion I came up so far is that Red Bull does not usually help coders.
I have a scenario in my application where I have a couple of jobs (1 to 50). The job has an address and I have the following properties of an address: Postcode, Latitude, and Longitude.
I have a table of workers also and they too have addresses. While the jobs or workers are created through screens, I use Google Map queries to make sure the provided Postcode is valid and is in UK so all the addresses are verified.
I am using a scheduler control to display some workers on y-axis and a timeline on x-axis. Every job has a date and can only move vertically on the scheduler on the job’s date. The user selects a number of jobs and they are displayed in a basket close to the scheduler. The user can then drag and drop job against workers. All this is manual so it works.
My task is to automate this so that the user does not do much except just verifying and allotting the jobs. Therefore, I have to automate the process.
Every worker has a property called WillingMaximumDistanceTravel which is an integer representing miles, the worker is willing to travel for a job.
Now here is the headache: I have over 1500 workers. I have a utility function that uses Newtonsoft’s Json Convert to de-serialize a stream of response from Google Maps. I need to feed it Postcode A and B.
I also plan to introduce a new table to DB to store the distance finds as Postcode A, Postcode B, and Distance. Therefore, if I find myself comparing the same postcodes again, I will just retrieve the result from DB instead and slowly and eventually, I would no longer require bothering Google anymore as this table would be very comprehensive.
I cannot use the simple Haversine formula, as Crow-fly path is not my requirement here. The pain in this is that it takes a lot of time to calculate. Some workers can travel over 10 miles while some vary from 15 to 80. I have to take the first job from the list and run it with every applicable worker o the system! I was wondering that the UK postcode has a pattern to it. If we sort a list of UK postcodes, can we rough-estimate, from the alphanumeric pattern, where will we hit a 100-mile mark, a 200-mile mark and so on?
If anyone is interested in the code, please drop a line and I will paste it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
(我为 Google 工作,但我不代表 Google 发言。我与地图 API 无关。)
我怀疑这对于使用 Google 地图 API 来说不是一个好情况,因为您正在推动这么多数据通过。您确实不想提出那么多请求,即使您可以根据方向限制。
当我在之前的工作中解决类似的问题时,我们购买了本地托管的地图 API - 但即使这样对于此类工作来说也不够快。我们最终预先计算了从每个邮政编码“区域”的质心出发的旅行时间(可能是错误的名称,但邮政编码的第一部分后面跟着余数的第一个数字,例如“SW1W 9”代表“SW1W 9TQ”) ") 到每个其他区域,将结果存储在一个巨大的表中。我认为我们只对 100 英里以内的邮政编码或类似的情况进行此操作,以减少预处理量。
即使如此,简单的数据库也没有我们想要的那么快 - 因此我们将结果存储在一个巨大的文件中,每个源/目标对只有一个字节。 (我们有固定的源邮政编码和目标邮政编码序列,因此我们不需要指定它们。)此时,计算行程时间包括:
字节的准确性是滑动的,因此在前 60 分钟内是按每分钟计算的,那么每个额外的值意味着额外的 2 分钟,然后是 5 分钟,等等。(这些不是确切的值,但大致是这样的。)
当你找出“好的候选人”时,你当然,您可以向现场 API 或 Google Maps API 询问您的确切邮政编码的更准确方向。
(I work for Google, but I'm not speaking on behalf of Google. I have nothing to do with the maps API.)
I suspect this isn't a great situation for using the Google Maps API, simply because you're pushing so much data through. You really don't want to make that many requests, even if you could do so under the directions limits.
When I tackled something similar in a previous job, we bought into a locally-hosted maps API - but even that wasn't fast enough for this sort of work. We ended up precomputing the time to travel from the centroid of each postcode "area" (probably the wrong name for it, but the first part of the postcode followed by the first digit of the remainder, e.g. "SW1W 9" for "SW1W 9TQ") to every other area, storing the result in a giant table. I think we only did it for postcodes which were within 100 miles or something similar, to cut down on the amount of preprocessing.
Even then, a simple DB wasn't quite as fast as we wanted - so we stored the results in a giant file, with a single byte per source/destination pair. (We had a fixed sequence of source postcodes and target postcodes, so we didn't need to specify those.) At that point, computing a travel time consisted of:
The bytes were on a sliding scale of accuracy, so for the first 60 minutes it was on a per-minute basis, then each extra value meant an extra 2 minutes, then 5 etc. (Those aren't the exact values, but it was something like that.)
When you've worked out "good candidates" you can ask an on-site API or the Google Maps API for more accurate directions for your exact postcodes, of course.
您想要寻找空间索引或空间填充曲线。空间索引将二维问题简化为一维问题,并递归地将表面细分为更小的图块,但它基本上是图块的重新排序。您可以使用索引或使用 4 个字符的字符串来细分曲面。后一种对您很有用,因为它允许您查询字符串,并将所有字符串操作隐藏在数据库引擎中。您想要查找 Nick 的空间索引四叉树希尔伯特曲线博客。
You want to look for a spatial-index or a space-filling-curve. A spatial index reduce the 2d problem to a 1d problem and recursivley subdivide the surface into smaller tiles but it is basically a reordering of the tiles. You can subdivide the surface either with an index or a string using 4 characters. The latter one can be useful to you because it let you query the string with all string operation hidden in the database engine. You want to look for Nick's spatial index quadtree hilbert-curve blog.