在 Neo4J 中存储多个图
我有一个应用程序,将关系信息存储在 MySQL 表中(contact_id、other_contact_id、strength、recorded_at)。如果我需要做的只是显示联系人的关系,甚至生成两个联系人的相互联系人列表,那么这很好。
但现在我需要生成如下统计数据:“2011 年 1 月强度为 3 或更好的双向连接总数是多少”或(假设每个联系人都是某个组的一部分)“哪个组拥有最多的连接数” 我很快发现用于生成这些统计
数据的 SQL 变得非常难以处理。
所以我编写了一个脚本,对于任何给定的日期,它将在内存中生成一个图表。然后我可以根据该图表运行我想要的任何统计数据。更容易理解,而且一般来说,性能也更高——除了生成图形部分。
我的下一个想法是缓存这些图表,这样我就可以在需要运行新统计数据时调用它们(或生成稍后的图表:例如,对于今天的图表,我采用昨天的图表并应用自昨天以来发生的任何更改)。我尝试了 memcached,它工作得很好,直到图表增长> 1MB。
所以现在我正在考虑使用像 Neo4J 这样的图形数据库。
唯一的问题是,我只有一张图表。或者我这样做,但它会随着时间的推移而变化,我需要能够使用不同的参考时间来查询它。
那么,我可以:
- 在 Neo4J 中存储多个图形并分别检索/与它们交互吗?然后我会为每个日期创建并存储单独的社交图表。
或者
- 向每个边缘添加有效的时间戳和适当的过滤图形:因此,如果我想要“5 月 1 日”的图形,我只会遵循“5 月 1 日”之前创建的两个节点之间的最新边缘(并且如果所有边是在 5 月 1 日之后创建的,那么这些节点将不会连接)。
我对图形数据库非常陌生,因此任何帮助/指针/提示将不胜感激。
I have an application that stores relationship information in a MySQL table (contact_id, other_contact_id, strength, recorded_at). This is fine if all I need to do is show who a contact's relationships are or even to generate a list of mutual contacts for two contacts.
But now I need to generate stats like: 'what was the total number of 2-way connections of strength 3 or better in January 2011' or (assuming that each contact is part of a group) 'which group has the most number of connections to other groups' etc.
I quickly found that the SQL for generating these stats became unwieldy real fast.
So I wrote a script that for any given date it will generate a graph in memory. I could then run whatever stat I wanted against that graph. Much easier to understand and in general, much more performant also -- except for the generating the graph part.
My next thought was to cache those graphs so I could call on them whenever I needed to run a new stat (or generate a later graph: eg for today's graph I take yesterday's graph and apply any changes that happened since yesterday). I tried memcached which worked great until the graphs grew > 1 MB.
So now I'm thinking about using a graph database like Neo4J.
Only problem is, I don't have just one graph. Or I do, but it is one that changes over time and I need to be able to query it with different reference times.
So, can I:
- store multiple graphs in Neo4J and rertrieve/interact with them separately? i would then create and store separate social graphs for each date.
or
- add valid to and from timestamps to each edge and filter the graph appropriately: so if i wanted a graph for "May 1st" i would only follow the newest edge between two noeds that was created before "May 1st" (and if all the edges were created after May 1st then those nodes wouldn't be connected).
I'm pretty new to graph databases so any help/pointers/hints will be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
现在,您可以在单个 Neo4j 实例中仅存储一个图形数据库,但是这个图形数据库可以包含任意数量的不同子图。您只需在执行全局操作(例如索引查询)时记住这一点,但您可以执行包含时间戳属性的复合查询来限制结果。
一种方法是,正如您所说,将时间信息添加到边缘以表示给定日期的图形结构,然后您可以遍历当时的图形结构。
参考节点在 Neo4j 中具有不同的含义。
每天使用类别节点(并将它们链接起来并聚合它们以获得更高级别的时间跨度)是比索引属性更图形化的节点分类方式。 (实际上,这些是图内索引,您可以轻松地将其包含在遍历和图查询中)。
只要您只对不同的时间结构感兴趣,就不必复制节点。如果您的节点也不同(例如更改属性,您可以复制它们,从而有效地创建不同的子图)或在每个节点上创建仅包含更改的历史节点的连接列表(或完整快照,具体取决于您的要求) 。
您的域听起来非常适合图形数据库。如果您有更多详细问题,请随时加入 Neo4j 邮件列表。
Right now you can store just one graph database in a single Neo4j instance, but this one graphdb can contain as many different sub-graphs as you like. You only have to keep that in mind when doing global operations (like index queries) but there you can do compound queries that include timestamped properties as well to limit the results.
One way of doing that is, as you said adding temporal information to edges to represent the structure of a graph for a given date you can then traverse the structure of the graph back then.
Reference node has a different meaning in Neo4j.
Using category nodes per day (and linking them and also aggregating them for higher level timespans) is the more graphy way of categorizing nodes than indexed properties. (Effectively these are in-graph indices that you can easily include in your traversals and graph queries).
You don't have to duplicate the nodes as long as you are only interested in different temporal structures. If your nodes are also different (e.g. changing properties, you could either duplicate them, and so effectively creating different subgraphs) or create a connected list of history nodes on each node that contain just the changes (or the full snapshot depending on your requirements).
Your domain sounds very fitting for the graph database. If you have more and detailed questions feel free to join the Neo4j mailing list.
这不是最简单的解决方案(我假设您只使用一台机器),但如果您确实想分离图表,您只需要记住图表是一个目录。
然后,您可以创建一个动态加载器类,该类获取所需数据库的路径,将其加载到内存中以供查询,并在获得答案后将其关闭。您还可以配置代理服务器,并向加载器发送 2 个参数:您的查询(我认为在本例中是密码查询)和您要查询的数据库的路径。
如果您有大量实时查询需要回答,这还不够。但如果它只是为了存储和对数据集进行一些分析,它绝对可以满足您的需求。
Not the easiest solution (I'm assuming you only work with one machine), but if you really want to separate your graphs, you only need to remember that a graph is a directory.
You can then create a dynamic loader class which takes the path of the database you want, load it in memory for the query, and close it after you getting your answer. You could also configure a proxy server, and send 2 parameters to your loader: your query (which I presume is a cypher query in this case) and the path of the database you want to query.
This is not adequate if you have tons of real-time queries to answer. But if it is simply for storing and doing some analytics over data sets, it can definitly answer your needs.
这是一个老问题,但从 Neo4j 4.x 开始,支持多租户 并且您可以在同一个 Neo4j 服务器中拥有不同的数据库(具有不同的 RBAC 权限) 。
This is an old question, but starting with Neo4j 4.x, multi-tenancy is supported and you can have different databases within the same Neo4j server (with distinct RBAC permissions).