我应该使用什么 DBMS 将 openstreetmap 存储为图形?
背景:
我需要将以下数据存储在数据库中:
带有标签的osm节点;
带有权重的 osm 边(即从 .osm 文件中的“way”中提取的两个节点之间的边)。
位于相同“道路”集合中的形成边的节点应该具有与这些道路相同的标签,即作为高速公路的“道路”节点集合中的每个节点应该具有“高速公路”标签。
我需要这种结构来轻松生成基于各种过滤器的图形,例如仅由高速公路的节点和边缘组成的图形,或“步行路径”图形等。
问题:
我没有听说过关于之前的空间索引,所以我只是将 .osm 文件解析到 MySQL 数据库中:
- 所有节点都到“节点”表(带有各自的坐标列) - 好吧,在我的情况下大约有 9,000,000 行:
(INSERT INTO Nodes VALUES [pseudocode]node_id,lat,lon[/pseudocode]
;
- 到“边”表的所有方式(通常一种方式创建一些边) - 好的,大约 9,000,000 行好:
<代码>(插入边缘值[pseudocode]edge_id,from_node_id,to_node_id[/pseudocode];
- 向节点添加标签,计算边的权重 - 问题:
这是有问题的 php 脚本:
$query = mysql_query('SELECT * FROM edges');
$i=0;
while ($res = mysql_fetch_object($query)) {
$i++;
echo "$i\n";
$node1 = mysql_query('SELECT * FROM nodes WHERE id='.$res->from);
$node1 = mysql_fetch_object($node1);
$tag1 = $node1->tags;
$node2 = mysql_query('SELECT * FROM nodes WHERE id='.$res->to);
$node2 = mysql_fetch_object($node2);
$tag2 = $node2->tags;
mysql_query('UPDATE nodes SET tags="'.$tag1.$res->tags.'" WHERE nodes.id='.$res->from);
mysql_query('UPDATE nodes SET tags="'.$tag2.$res->tags.'" WHERE nodes.id='.$res->to);`
Nohup 显示 'echo 的输出“$i\n”每 55-60 秒一次(如果“edges”表的大小超过,可能需要 17 年以上才能完成 一样)。
9,000,000 行,就像我的例子
对于尝试计算边的权重(距离)的脚本也存在同样的问题(选择所有边,获取一条边,然后从“节点”表中选择该边的两个节点,然后计算距离,然后更新边表)。
问题:
如何才能使此 SQL 更新更快?我应该调整任何 MySQL 配置设置吗?或者我应该使用带有 PostGIS 扩展的 PostgreSQL?我应该为我的数据使用其他结构吗?或者我应该以某种方式利用空间索引?
Background:
I need to store the following data in a database:
osm nodes with tags;
osm edges with weights (that is an edge between two nodes extracted from 'way' from an .osm file).
Nodes that form edges, which are in the same 'way' sets should have the same tags as those ways, i.e. every node in a 'way' set of nodes which is a highway should have a 'highway' tag.
I need this structure to easily generate a graph based on various filters, e.g. a graph consisting only of nodes and edges which are highways, or a 'foot paths' graph, etc.
Problem:
I have not heard about the spatial index before, so I just parsed an .osm file into a MySQL database:
- all nodes to a 'nodes' table (with respective coordinates columns) - OK, about 9,000,000 of rows in my case:
(INSERT INTO nodes VALUES [pseudocode]node_id,lat,lon[/pseudocode]
;
- all ways to an 'edges' table (usually one way creates a few edges) - OK, about 9,000,000 of rows as well:
(INSERT INTO edges VALUES [pseudocode]edge_id,from_node_id,to_node_id[/pseudocode]
;
- add tags to nodes, calculate weights for edges - Problem:
Here is the problematic php script:
$query = mysql_query('SELECT * FROM edges');
$i=0;
while ($res = mysql_fetch_object($query)) {
$i++;
echo "$i\n";
$node1 = mysql_query('SELECT * FROM nodes WHERE id='.$res->from);
$node1 = mysql_fetch_object($node1);
$tag1 = $node1->tags;
$node2 = mysql_query('SELECT * FROM nodes WHERE id='.$res->to);
$node2 = mysql_fetch_object($node2);
$tag2 = $node2->tags;
mysql_query('UPDATE nodes SET tags="'.$tag1.$res->tags.'" WHERE nodes.id='.$res->from);
mysql_query('UPDATE nodes SET tags="'.$tag2.$res->tags.'" WHERE nodes.id='.$res->to);`
Nohup shows the output for 'echo "$i\n"' each 55-60 seconds (which can take more than 17 years to finish if the size of the 'edges' table is more than 9,000,000 rows, like in my case).
Htop shows a /usr/bin/mysqld process which takes 40-60% of CPU.
The same problem exists for the script which tries to calculate the weight (the distance) of an edge (select all edges, take an edge, then select two nodes of this edge from 'nodes' table, then calculate the distance, then update the edges table).
Question:
How can I make this SQL updates faster? Should I tweak any of MySQL config settings? Or should I use PostgreSQL with PostGIS extension? Should I use another structure for my data? Or should I somehow utilize the spatial index?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果我理解正确的话,有两件事需要讨论。
首先,您将高速公路标签放在开始和停止节点上的想法。一个节点可以有多个连接的边,那么将第二条边的标签放在哪里?如果是十字路口,还是第三或第四?高速公路标签首先放入边缘表的原因是,从关系的角度来看,这就是它所属的位置。
其次,获取整个表并在数据库外部对其进行处理并不是正确的方法。关系数据库真正擅长的是处理整个过程。
我没有使用过 mysql,并且我完全同意,如果迁移到 PostGIS,您可能会获得更多乐趣,因为 PostGIS 具有更好的空间功能(即使您不需要任何空间功能来完成此特定任务)我听说过。
因此,如果我们忽略第一个问题,只是为了展示这个概念,那么只有两条边连接到一个节点,并且每个节点都有两个标记字段。标签1和标签2。那么在 PostGIS 中它可能看起来像这样:
如果禁用索引,速度应该会非常快。
再次,
如果我没理解错的话。
If I understand you right there is two things to discuss.
First, your idea of putting the highway-tag on the start and stop nodes. A node can have more than one edge connected, where to put the tag from the second edge? Or third or fourth if it is a crossing? The reason the highway tag is putted in the edges table in the first place is that from a relational point of view that is where it belongs.
Second, to get the whole table and process it outside the database is not the right way. What a relational database is really good at is taking care of this whole process.
I have not worked with mysql, and I fully agree that you will probably get a lot more fun if migrating to PostGIS since PostGIS has a lot better spatial capabilities (even if you don't need any spatial capabilities for this particular task) from what I have heard.
So if we ignore the first problem and just for showing the concept say that there is only two edges connected to one node and that each node has two tag-fields. tag1 and tag2. Then it could look something like this in PostGIS:
If you disable the indexes that should be very fast.
Again,
if I have understood you right.
PostgreSQL
Openstreetmap 本身使用 PostgreSQL,所以我想这是推荐的。
参见:http://wiki.openstreetmap.org/wiki/PostgreSQL
可以看到OSM的数据库模式位于: http://wiki.openstreetmap.org/wiki/Database_Schema
所以你可以使用OSM 使用相同的字段、字段类型和索引来实现最大兼容性。
MySQL
如果您想将 .osm 文件导入 MySQL 数据库,请查看:
http://wiki.openstreetmap.org/wiki/OsmDB.pm
在这里,您将找到 Perl 代码,它将创建 MySQL 表、解析 OSM 文件并将其导入 MySQL 数据库。
加快速度
如果您要批量更新,则无需在每次更新后都更新索引。
您只需禁用索引,执行所有更新并重新启用索引即可。
我猜这应该要快得多。
祝你好运
PostgreSQL
Openstreetmap itself uses PostgreSQL, so I guess that's recommended.
See: http://wiki.openstreetmap.org/wiki/PostgreSQL
You can see OSM's database schema at: http://wiki.openstreetmap.org/wiki/Database_Schema
So you can use the same fields, fieldtypes and indexes that OSM uses for maximum compatibility.
MySQL
If you want to import .osm files into a MySQL database, have a look at:
http://wiki.openstreetmap.org/wiki/OsmDB.pm
Here you will find perl code that will create MySQL tables, parse a OSM file and import it into your MySQL database.
Making it faster
If you are updating in bulk, you don't need to update the indexes after every update.
You can just disable the indexes, do all your updates and re-enable the index.
I'm guessing that should be a whole lot faster.
Good luck