如何从 OSM XML 星球文件中提取分层城市/州/国家数据?
我想编写一个脚本来解析 OpenStreetMap (OSM) XML 文件并以分层方式构建城镇数据库。我希望生成的数据集具有在美国可能如下所示的层次结构:
USA -> California -> San Francisco County -> San Francisco
在英国可能如下所示:
United Kingdom -> England -> Middlesex -> London -> Soho
输出将是一个 JSON 文档,它描述 OSM 文件中所有城市的层次结构,其结构如下上面的例子。
我正在使用 Python 和“imposm”解析器库,我可以毫无问题地加载和解析文件;我的问题是缺乏对 OSM 数据结构的理解:我不知道如何知道 OSM 数据中节点之间的父/子关系。例如,如果我找到“Soho”的节点,如何将其绑定回“威斯敏斯特市”、“大伦敦”、“米德尔塞克斯”和“英格兰”的节点?
我知道某些节点有一个“is_in”标签,可能会提供其中一些信息,但是
- A)这是不一致的,
- B)它似乎是一个自由格式的文本字段,而不是指向 OSM 节点的链接(即 is_in :“威斯敏斯特市”没有给我任何到威斯敏斯特节点的链接)。
如果您对如何分层链接这些节点有任何建议,请告诉我。
I want to write a script that parses OpenStreetMap (OSM) XML files and builds a database of towns and cities in a hierarchical fashion. I want the resulting data set to have a hierarchy that might look like this in the US:
USA -> California -> San Francisco County -> San Francisco
and maybe like this in the UK:
United Kingdom -> England -> Middlesex -> London -> Soho
The output will be a JSON document that describes a hierarchy for all cities in the OSM file, with a structure like the examples above.
I'm using Python and the "imposm" parser library and I can load and parse the file without a problem; my issue is a lack of understanding of how the OSM data is structured: I don't know how to know the parent/child relationship between nodes in OSM's data. For example, if I locate the node for "Soho", how can I tie it back to the nodes for "City of Westminster", "Greater London", "Middlesex" and "England"?
I know that some nodes have an "is_in" tag that might give some of this information, but
- A) this is inconsistent and
- B) it seems to be a free-form text field, not a link to an OSM node (ie. is_in: "City of Westminster" does not give me any link to the Westminster node).
Please let me know if you have any suggestions for how to link these nodes hierarchically.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
基本上,OSM 中的一切都是“自由形式”。关于标记有一些约定,但不能保证人们会遵守它们。因此,您需要进行一些数据清理和后处理以获得一致的结果。
至于父子关系,OSM 中不存在硬连线关系,除了:
OSM 关系可用于定义层次关系,但定义这些关系的方式非常通用。语义基于约定(通常在 OSM Wiki 页面上描述)。
如果您正在寻找“is_in”关系,我认为您需要使用几何方法来建立它。不幸的是,您不能真正仅仅依靠 OSM 标记来实现这一点。
Basically everything is "free-form" in OSM. There are conventions on tagging, but there is no guarantee people will stick to them. So you will need to do some data cleaning and postprocessing to get anything consistent.
As for parent-child relationships, there are no hard-wired relationships in OSM other than:
OSM relations can be used to define hierarchical relationships, but the way these are defined is very generic. The semantics is based on conventions (usually described on OSM Wiki pages).
If you're looking for an "is_in" relationship, I think you will need to establish it using geometric methods. You cannot really rely just on OSM tagging for this, unfortunately.