使用 MongoDB 作为我们的主数据库,我应该使用单独的图数据库来实现实体之间的关系吗?
我们目前正在为一家专业公司内部实施类似 CRM 的解决方案。由于存储信息的性质以及信息的不同值和键,我们决定使用文档存储数据库,因为它完全适合目的(在本例中我们选择 MongoDB)。
作为此 CRM 解决方案的一部分,我们希望存储实体之间的关系和关联,示例包括存储利益冲突信息、股东、受托人等。以最有效的方式将所有这些实体连接在一起,我们确定有必要建立一个“关系”的中心模型。所有关系都应附加历史信息(开始和终止日期)以及不同的元数据;例如,股东关系还包含持有的股份数量。
由于传统的RDBMS解决方案不适合我们以前的需求,因此在我们当前的情况下使用它们是不可行的。我试图确定的是,在我们的情况下使用图形数据库是否更相关,或者实际上仅使用 mongo 的内置关系信息是否合适。
关系信息将在整个系统中大量使用。我们希望执行的一些信息查询的示例如下:
- 获取“xyz Limited”“客户”公司的所有“关键联系人”人员
- 获取“john”为股东的公司的所有其他“股东”
- 获取全部 作为“abc Limited”的“客户”和“trust usbank Limited”的客户的实体的“关键联系人”人员
鉴于这种“树”关系结构,使用图形数据库(例如 Neo4j)是否更合适?
We're currently in the process of implementing a CRM-like solution internally for a professional firm. Due to the nature of the information stored, and the varying values and keys for the information we decided to use a document storage database, as it suited the purposes perfectly (In this case we chose MongoDB).
As part of this CRM solution we wish to store relationships and associations between entities, examples include storing conflicts of interest information, shareholders, trustees etc. Linking all these entities together in the most effective way we determined a central model of "relationship" was necessary. All relationships should have history information attached to them ( commencement and termination dates), as well as varying meta data; for example a shareholder relationship would also contain number of shares held.
As traditional RDBMS solutions didn't suit our former needs, using them in our current situation is not viable. What I'm trying to determine is whether using a graph database is more pertinent in our case, or if in fact just using mongo's built-in relational information is appropriate.
The relationship information is going to be used quite heavily throughout the system. An example of some of the informational queries we wish to perform are:
- Get all 'key contact' people of companies who are 'clients' of 'xyz limited'
- Get all other 'shareholders' of companies where 'john' is a shareholder
- Get all 'Key contact' people of entities who are 'clients' of 'abc limited' and are clients of 'trust us bank limited'
Given this "tree" structure of relationships, is using a graph database (such as Neo4j) more appropriate?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
迈克,
您应该能够将关系数据存储在图形数据库中。它在遍历大图时的高性能来自于局部性,即您不全局运行查询,而是启动一组节点(在您的情况下等于文档,通过索引查找。您甚至可以存储起始节点-用于在 mongo 文档中快速访问的 ids)。从那里您可以在恒定时间内遍历任意大的路径(关于数据集大小)。
您的其他要求是什么(即数据集大小、并发访问数量等、关系/图形复杂性)。
您的查询非常适合图形数据库,并且可以轻松地用其术语表达。
我建议您获取像 Neo4j 这样的 graphdb,并对您的域进行快速峰值以验证总体可行性,并在投资第二种技术之前找出您希望回答的其他问题。
PS 如果您还没有开始,您也可以使用纯 graphdb 方法,因为图数据库是文档数据库的超集。无论如何,您宁愿谈论您的案例中的域,而不仅仅是通用文档。 (例如 structr 是构建在 Neo4j 之上的 CMS)。
Mike,
you should be able to store your relationship data in the graph database. Its high performance on traversing big graphs comes from locality, i.e. you don't run queries globally but rather start a a set of nodes (which equal documents in your case, which are looked up by an index. you might even store start-node-ids for quick access in your mongo documents). From there you can traverse arbitrarily large paths in constant time (wrt data set size).
What are your other requirements (i.e. data set size, # of concurrent accesses etc, relationship/graph complexity).
Your queries are a really good fit for the graph database and easily expressable in its terms.
I'd suggest that you just grab a graphdb like neo4j and do a quick spike with your domain to verify the general feasibility and also find out additional questions you would like to have answered before investing in the second technology.
P.S. If you hadn't started yet, you could also have gone with a pure graphdb approach as graph databases are a superset of document databases. And you'd rather talk domain in your case anyway than just generic documents. (E.g. structr is a CMS built on top of Neo4j).
MongoDB 中的文档非常类似于 Neo4j 中的节点,只是缺少关系。它们都拥有键值属性。如果您已经选择使用 MongoDB,那么您可以使用 Neo4j 来存储关系,然后在应用程序中桥接存储。如果您选择新技术,则可以使用 Neo4j 来处理所有事情,因为节点可以像文档一样保存属性数据。
至于关系部分,Neo4j 非常适合。你有一个图表,而不是不相关的文件。使用图形数据库在这里非常有意义,并且示例查询都写有图形。
但老实说,找出最适合您的方法的最佳方法是进行 PoC——低成本、高价值。
免责声明:我在 Neo Technology 工作。
The documents in MongoDB very much resemble nodes in Neo4j, minus the relationships. They both hold key-value properties. If you've already made the choice to go with MongoDB, then you can use Neo4j to store the relationships and then bridge the stores in your application. If you're choosing new technology, you can go with Neo4j for everything, as the nodes can hold property data just as well as documents can.
As for the relationship part, Neo4j is a great fit. You have a graph, not unrelated documents. Using a graph database makes perfect sense here, and the sample queries have graph written all over them.
Honestly though, the best way to find out what works for you is to do a PoC - low cost, high value.
Disclaimer: I work for Neo Technology.
继续使用 mongodb。有两个原因 - 1. 如果可以降低复杂性,最好留在同一个域中;2. mongodb 非常适合查询,并且比 redis 需要更少的工作。
stay with mongodb. Two reasons - 1. its better to stay in the same domain if you can to reduce complexity and 2. mongodb is excellent for querying and requires less work than redis, for example.
我们最终使用了两者,我们正在为交通网络实现一个搜索引擎。
一旦超过 1 或 2 个“链接”,尝试在 MongoDB 中实现关系可能会变得笨拙。本质上,您将把 objectid 存储在一个数组中,如果您想实现双向关系,那么您必须实现两个单独的链接。在 Mongo 中,指向实体(或“链接”)的“指针”只是另一个文本属性(可以有不同的解释),它不是像 Neo4j 中的关系那样的第一类对象。
因此,我们决定使用 Neo4j 来存储关系,并使用 MongoDB 来存储其他所有内容。接下来的挑战就变成了保持两家商店的同步。
我们正在使用一个名为“MongoConnector”的 10gen 实验室项目,它是保持 MongoDB 与另一个存储同步的机制。该项目目前不受支持,但代码可用:
http://blog.mongodb .org/post/29127828146/introducing-mongo-connector
MongoConnector使用副本机制来实现同步。本质上,您正在监视 MongoDB OpLog,并且正在为任何更新插入(更新或插入)和删除实现回调。这个实现在 MongoConnector 中被称为“DocumentManager”。我们结束了 Neo4jDocumentManager 的实现。
在查询方面,我们发现 Neo 更适合“朋友的朋友”类型的查询,而 MongoDB 更适合通用查询,即。处理日期的每个字段或范围查询。
我一直计划进行一次演讲和一篇博客文章,但我还没有开始:
http://www.meetup.com/graphdb-boston/events/91703472/
此解决方案有一些缺点,例如如果进程出现故障或同步缓慢(不是实时)。
We ended up using both, we are implementing a search engine for a transportation network.
Trying to implement relationships in MongoDB can become unwieldy once you go beyond 1 or 2 "links". Essentially you would be storing objectids in an array and if you want to implement bi-directional relationships, then you have to implement two separate links. In Mongo, a "pointer" to an entity (or "link") is just another text property (that can be interpreted differently), it is not a first class object like a relationship in Neo4j.
So we decided to use Neo4j to store the relationships and MongoDB to store everything else. The challenge then became keeping the two stores in sync.
We are using a 10gen lab project called "MongoConnector" which is mechanism to keep MongoDB in sync with another store. The project is currently unsupported, but the code is available:
http://blog.mongodb.org/post/29127828146/introducing-mongo-connector
MongoConnector uses the replica mechanism to implement the syncing. Essentially you are monitoring the MongoDB OpLog and you are implementing callbacks for any upserts (update or insert) and deletes. This implementation is called a "DocumentManager" in MongoConnector speak. We ended implementing a Neo4jDocumentManager.
On the query side, we found that Neo is better suited for "friend of a friend" kind of query, whereas MongoDB was better for general purpose queries, ie. per field or range queries dealing with dates.
I've been planning to have a talk and a blog post, but I haven't got to it yet:
http://www.meetup.com/graphdb-boston/events/91703472/
There are drawbacks to this solution, like things getting out of sync if a process goes down or syncing being slow (not in realtime).