图数据库更适合最短路径算法吗?

发布于 2024-11-27 02:01:24 字数 771 浏览 7 评论 0原文

我的目标是为道路网络编写最短路径算法。

目前我的架构是这样的:我将所有数据存储在支持 PostGIS 的 PostgreSQL 数据库中。我执行一个 SELECT * FROMways,在一个有 100,000 条边(路)的表上花费不到 3 秒,之后我将应用一种(Java、Ruby 或基于任何东西的)最短路径算法已经驻留在内存中的图。对于具有 100,000 条边的图,第二个操作可能需要大约 1.5 秒。

因此,需要:

  • 2-3 秒将数据库中的所有路径加载到内存中并创建一个图(节点与路径(边)存储在一张表中);
  • 1-1.5 秒即可计算内存中已存在的图表上的最短路径。

这与 pgRouting 所做的非常相似(据我所知,它使用 C Boost 将图存储在内存中),除了 pgRouting 总共需要大约 2 秒来计算同一数据集上的最短路径(是的,它很快,但是它对我来说是一个黑匣子,所以我需要自己的)。

但最近我发现了有关图数据库和 Neo4j 的信息。他们在他们的网站上声称,“仍然能够在数百万条道路和航路点的图表上以亚秒的速度进行这些计算,使得在许多情况下可以放弃使用 K/V 存储预先计算索引的常规方法,并能够将路由纳入关键路径,使其能够适应生活条件并构建高度个性化和动态的空间服务。”

所以问题是:图形数据库对于我的特定问题会更快吗?

该问题具有以下性质:

  • 数据库由一张表(路)组成;
  • 对数据库的唯一查询是将所有路径放入内存(构建图表);
  • 我不需要可扩展性,即图很可能不会增长。

My objective is to write a shortest path algorithm for a road network.

Currently my architecture is something like that: I store all the data in the PostGIS enabled PostgreSQL database. I do one SELECT * FROM ways, which takes less than 3 seconds on a table with 100,000 edges (ways) and after that I will apply a (Java, Ruby or anything-based) shortest path algorithm to the graph that already resides in memory. The second operation can take about 1.5 seconds on a graph with 100,000 edges.

So, it takes:

  • 2-3 seconds to load all the ways from the database into memory and create a graph (nodes are stored in one table with ways(edges));
  • 1-1.5 seconds to calculate a shortest path on a graph which is already in memory.

This is very similar to what pgRouting does (to my knowledge it uses C Boost to store the graph in memory), except pgRouting takes about 2 seconds in total to compute a shortest path on the same data set (yes, it is fast, but it is a black box for me, so I need my own).

But recently I found about Graph databases and about Neo4j. On their site they claim that "Still being able to do these calculations in sub-second speeds on graphs of millions of roads and waypoints makes it possible in many cases to abandon the normal approach of precomputing indexes with K/V stores and be able to put routing into the critical path with the possibility to adapt to the live conditions and build highly personalized and dynamic spatial services.".

So the question is: Will a graph database be faster with my particular problem?

The problem has the following properties:

  • the database consists of one table (ways);
  • the only query to the database is to get all the ways into the memory (to build a graph);
  • I do not need scalability, i.e. it is likely that the graph will not grow.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

箹锭⒈辈孓 2024-12-04 02:01:24

如果您使用任何图形数据库(例如 Neo4j),您当然不必重新发明轮子。许多最短路径算法都内置于此,它旨在处理复杂性,以防您必须考虑任何特定道路、单向道路、道路分数等的速度限制。当数据增长 10 时,如何保持性能次,或者100次。考虑到 100,000 种方式的总计算时间为 3 秒,1M 种方式可能需要几分钟,而在 Neo4j 中,响应将以毫秒为单位。

You certainly dont have to reinvent the wheel if you are using any graph database, like Neo4j. Many shortest path algorithms are built into this and it's designed to handle complexity in case you have to consider speed limitation in any specific road, one-way road, score of a road etc. How do you keep up with performance when your data grows 10 times, or, 100 times. Considering your total computation time 3sec for 100,000 ways, it can be in minutes for 1M ways and in Neo4j, the response will be in milli sec.

忘东忘西忘不掉你 2024-12-04 02:01:24

图数据库的突破不仅在于性能,更在于概念:您的路由算法处理单个关系图(即链接都属于同一类型的图),而对于图数据库,您有一个<多关系图。

这使您能够计算仅采用特定类型边的节点之间的最短路径或避免使用其他类型。

有关更多信息,您应该阅读图数据库背后的代数和管道的概念。

我强烈推荐 thinkerpop 项目从图数据库开始。

The breakthrough with graph databases is not only performances, it more about concept: your routing algorithms deal with single relational graphs (that is graph were links are all of the same type) whereas with graphdatabases you have a multi-relational graph.

This enables you to compute the shortest path between nodes taking only a specific kind of edge or avoid another type.

For more information you should read about the algebra behind graph db and the concept of pipes.

I strongly recommend thinkerpop project to start with graph database.

恰似旧人归 2024-12-04 02:01:24

我没有“图形”数据库的经验,但从你的问题来看,我有一些想法。

首先,简单的答案是“创建这样一个图形数据库并与您的解决方案进行性能比较”。您可以测量内存使用情况、执行时间(速度)、CPU 利用率和/或可能的其他指标。这将为您提供足够的数据来做出决定。

我的另一个建议是修改你的方法。您描述的三个问题属性(一张表,加载所有路径并且不需要可扩展性)适用于您当前的域,但不适用于图形数据库的域。这是一种完全不同的编程范例,您可能必须调整和调整您的方法以适应这些特殊类型数据库的领域。如果您在非标准环境(例如图形数据库)中应用标准方法,则进行性能或任何其他类型的比较是不合理的。

回顾:将您的问题转换为图形数据库的术语并相应地对其进行建模。完成此操作后,对两种解决方案进行性能比较。

我敢打赌,假设你翻译了&针对图形数据库对您的问题进行适当的建模,它将为您提供更好的性能。 “存储-读取-排序”的经典方法很简单,但除非积极优化,否则不会那么有效。

I don't have experience with "graph" databases but judging by your question i have a few things in mind.

First of all, the straightforward answer will be "Create such a graph database and do a performance comparison with your solution". You could measure memory usage, execution time (speed), cpu utilization and/or possibly other metrics. That would provide you with enough data to make your decision.

My other advice is to revise your method. The three problem properties that you described (one table, loading all paths & no need for scalability) apply in your current domain but not in the graph databases' one. It's a whole different programming paradigm and you might have to adjust and adapt your method to suit the domain of those special kind of databases. It is unreasonable to do performance or any other kind of comparisons if you're applying your standard approach in a non-standard environment (like that graph database).

Recap: Translate your problem to the terms of the graph database and model it accordingly. After doing that, do a performance comparison between the two solutions.

My bet is, assuming that you translated & modeled your problem suitably for the graph database, it will grant you better performance. Your classical approach of "store-read-sort" is simple but not that effective unless optimized aggressively.

梦太阳 2024-12-04 02:01:24

图形数据库最初可能不会将所有数据加载到内存中,但随着时间的推移,好的数据库旨在处理极大的数据集。然而,一旦数据存在,图形数据库在遍历链接方面所要做的工作就比关系数据库要少。这是因为它可以使用相关对象的身份直接访问相关对象,而不必使用 B 树索引和(可能)连接表,因此一旦节点和边被缓存,它应该会更快。

A graph database probably won't load all of your data into memory initially, but over time, as good ones are designed to deal with extremely large datasets. However, once the data is there, the graph database has to do less work that the relational database to traverse the links. This is because it can directly access related objects using their identities, rather than having to use B-tree indices and (possibly) a join table, so it should be faster once the nodes and edges are cached.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文