如何从 OSM XML 星球文件中提取分层城市/州/国家数据?

发布于 2024-12-04 19:06:24 字数 696 浏览 1 评论 0原文

我想编写一个脚本来解析 OpenStreetMap (OSM) XML 文件并以分层方式构建城镇数据库。我希望生成的数据集具有在美国可能如下所示的层次结构:

USA -> California -> San Francisco County -> San Francisco

在英国可能如下所示:

United Kingdom -> England -> Middlesex -> London -> Soho

输出将是一个 JSON 文档,它描述 OSM 文件中所有城市的层次结构,其结构如下上面的例子。

我正在使用 Python 和“imposm”解析器库,我可以毫无问题地加载和解析文件;我的问题是缺乏对 OSM 数据结构的理解:我不知道如何知道 OSM 数据中节点之间的父/子关系。例如,如果我找到“Soho”的节点,如何将其绑定回“威斯敏斯特市”、“大伦敦”、“米德尔塞克斯”和“英格兰”的节点?

我知道某些节点有一个“is_in”标签,可能会提供其中一些信息,但是

  • A)这是不一致的,
  • B)它似乎是一个自由格式的文本字段,而不是指向 OSM 节点的链接(即 is_in :“威斯敏斯特市”没有给我任何到威斯敏斯特节点的链接)。

如果您对如何分层链接这些节点有任何建议,请告诉我。

I want to write a script that parses OpenStreetMap (OSM) XML files and builds a database of towns and cities in a hierarchical fashion. I want the resulting data set to have a hierarchy that might look like this in the US:

USA -> California -> San Francisco County -> San Francisco

and maybe like this in the UK:

United Kingdom -> England -> Middlesex -> London -> Soho

The output will be a JSON document that describes a hierarchy for all cities in the OSM file, with a structure like the examples above.

I'm using Python and the "imposm" parser library and I can load and parse the file without a problem; my issue is a lack of understanding of how the OSM data is structured: I don't know how to know the parent/child relationship between nodes in OSM's data. For example, if I locate the node for "Soho", how can I tie it back to the nodes for "City of Westminster", "Greater London", "Middlesex" and "England"?

I know that some nodes have an "is_in" tag that might give some of this information, but

  • A) this is inconsistent and
  • B) it seems to be a free-form text field, not a link to an OSM node (ie. is_in: "City of Westminster" does not give me any link to the Westminster node).

Please let me know if you have any suggestions for how to link these nodes hierarchically.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

绝對不後悔。 2024-12-11 19:06:24

基本上,OSM 中的一切都是“自由形式”。关于标记有一些约定,但不能保证人们会遵守它们。因此,您需要进行一些数据清理和后处理以获得一致的结果。

至于父子关系,OSM 中不存在硬连线关系,除了:

  • 节点由一种或多种方式使用 节点
  • 是一种或多种关系的成员
  • 方式是一种或多种关系的成员
  • 关系是一个或多个关系的成员

OSM 关系可用于定义层次关系,但定义这些关系的方式非常通用。语义基于约定(通常在 OSM Wiki 页面上描述)。

如果您正在寻找“is_in”关系,我认为您需要使用几何方法来建立它。不幸的是,您不能真正仅仅依靠 OSM 标记来实现这一点。

Basically everything is "free-form" in OSM. There are conventions on tagging, but there is no guarantee people will stick to them. So you will need to do some data cleaning and postprocessing to get anything consistent.

As for parent-child relationships, there are no hard-wired relationships in OSM other than:

  • A node is used by one or more ways
  • A node is a member of one or more relations
  • A way is a member of one or more relations
  • A relation is a member of one or more relations

OSM relations can be used to define hierarchical relationships, but the way these are defined is very generic. The semantics is based on conventions (usually described on OSM Wiki pages).

If you're looking for an "is_in" relationship, I think you will need to establish it using geometric methods. You cannot really rely just on OSM tagging for this, unfortunately.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文