使用 PHP 确定短文本(500 到 1000 个字)中提到的位置
我想找到一种方法来获取用户提供的一段文本,并确定文本中提到了地图上的哪些地址。我很乐意使用免费的网络服务(如果存在)或使用不会消耗太多资源的脚本。
我可以想象这样做的一种方法是获取一个巨大的数据库,在文本中单独寻址和搜索它们中的每一个,但这似乎效率不高。有没有更好的算法或技术可以建议?
我的基本想法是获取位置信息并将其转换为谷歌地图上的标记。如果自动确定位置太困难或占用 CPU 资源,我可以要求用户在位置字段中添加信息(如有必要),但我不想这样做,因为有些用户将是非常年轻的学生。
这需要用 PHP 完成,因为这是我学校托管服务器上可用的脚本语言。
请注意,整个设置将在 Drupal 节点的上下文中进行,我计划使用过滤器从各个节点收集必要的位置信息,因此此解析只会发生一次(当新文本进入数据库时) 。
I'd like to find a way to take a piece of user supplied text and determine what addresses on the map are mentioned within the text. I'd be happy to use a free web service if it exists or use a script which will not consume too many resources.
One way I can imagine doing this is taking a gigantic database of addressing and searching for each of them individually in the text, but this does not seem efficient. Is there a better algorithm or technique one can suggest?
My basic idea is to take the location information and turn it into markers on a Google Map. If it is too difficult or CPU intensive to determine the locations automatically, I could require users to add information in a location field if necessary but I would prefer not to do this as some of the users are going to be quite young students.
This needs to be done in PHP as that is the scripting language available on my school hosted server.
Note this whole set-up will happen within the context of a Drupal node, and I plan on using a filter to collect the necessary location information from the individual node, so this parsing would only happen once (when the new text enters the database).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用 opencalais 来标记您的文本。它返回的类别之一是“城市”,然后您可以使用另一个第三方模块来显示城市的位置。
You could get something like opencalais to tag your text. One of the catigories which it returns is "city" you coud then use another third party module to show the location of the city.
如果您在关系数据库中确实有一个巨大的位置列表,并且您只关心 500 到 1000 个单词,那么您绝对可以通过 SQL 命令来查找 500-1000 个单词的匹配项,这将非常高效。
但即使你确实需要调用一个缓慢的 API,你也可以一一请求 500 个单词。如果您保留匹配项的缓存,那么缓存可能会很快填满所有停用词(您知道,例如“the”、“if”、“and”),然后使用缓存,很可能您每次搜索的字数将远少于 500 个字。
我想你可能会对暴力方法的运行速度感到惊讶。
If you did have a gigantic list of locations in a relational database, and you're only concerned about 500 to 1000 words, then you could definitely just pass the SQL command to find matches for the 500-1000 words and it would be quite efficient.
But even if you did have to call a slow API, you could feasibly request for 500 words one by one. If you kept a cache of the matches, then the cache would probably quickly fill up with all the stop words (you know, like "the", "if", "and") and then using the cache, it'd be likely that you would be searching much less than 500 words each time.
I think you might be surprised at how fast the brute force approach would work.
为了将来的参考,我想提一下名为 Placemaker 的 Yahoo API 和服务 GeoMaker 构建于其之上。
这些工具可用于根据此处的要求从文本中解析出位置。不幸的是,目前似乎不存在 Drupal 模块,但自定义解决方案似乎很容易编码。
For future reference I would just like to mention the Yahoo API called Placemaker and the service GeoMaker that is built on top of it.
Those tools can be used to parse out locations from a text as requested here. Unfortunately no Drupal module seems to exists right now- but a custom solution seems easy to code.