基于图的数据库 (http://neo4j.org/) 有哪些用例?
我经常使用关系数据库,并决定尝试其他可用的类型。
这个特殊的产品看起来不错并且很有前途:http://neo4j.org/
有人使用过基于图形的数据库吗? 从可用性角度来看有哪些优点和缺点?
您在生产环境中使用过这些吗? 促使您使用它们的要求是什么?
I have used Relational DB's a lot and decided to venture out on other types available.
This particular product looks good and promising: http://neo4j.org/
Has anyone used graph-based databases? What are the pros and cons from a usability prespective?
Have you used these in a production environment? What was the requirement that prompted you to use them?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我在之前的工作中使用过图形数据库。 我们没有使用 neo4j,它是构建在 Berkeley DB 之上的内部工具,但它很相似。 它已用于生产(现在仍然如此)。
我们使用图数据库的原因是系统存储的数据以及系统对数据进行的操作正是关系数据库的弱点,而这正是图数据库的强项。 该系统需要存储缺乏固定模式并通过关系链接在一起的对象集合。 为了推理数据,系统需要执行大量操作,这些操作可能是在图形数据库中进行几次遍历,但在 SQL 中这将是相当复杂的查询。
图模型的主要优点是快速的开发时间和灵活性。 我们可以快速添加新功能而不影响现有部署。 如果潜在客户想要导入一些自己的数据并将其移植到我们的模型之上,通常可以由销售代表在现场完成。 当我们设计新功能时,灵活性也很有帮助,使我们不必尝试将新数据压缩到严格的数据模型中。
拥有一个奇怪的数据库让我们可以构建许多其他奇怪的技术,为我们提供许多秘密武器来将我们的产品与竞争对手的产品区分开来。
主要缺点是我们没有使用标准的关系数据库技术,当您的客户有进取心时,这可能会成为问题。 我们的客户会问为什么我们不能将数据托管在他们巨大的 Oracle 集群上(我们的客户通常拥有大型数据中心)。 其中一个团队实际上重写了数据库层以使用 Oracle(或 PostgreSQL、MySQL),但比原来的速度稍慢。 至少有一家大型企业甚至制定了仅限 Oracle 的政策,但幸运的是 Oracle 收购了 Berkeley DB。 我们还必须编写很多额外的工具 - 例如,我们不能只使用 Crystal Reports。
图数据库的另一个缺点是我们自己构建它,这意味着当我们遇到问题(通常具有可扩展性)时,我们必须自己解决它。 如果我们使用关系数据库,供应商十年前就已经解决了这个问题。
如果您正在为企业客户构建产品并且您的数据适合关系模型,请尽可能使用关系数据库。 如果您的应用程序不适合关系模型但适合图形模型,请使用图形数据库。 如果它只适合其他东西,请使用它。
如果您的应用程序不需要适应当前的 blub 架构,请使用图形数据库、CouchDB、BigTable 或任何适合您的应用程序并且您认为很酷的数据库。 它可能会给你带来优势,并且尝试新事物很有趣。
无论您选择什么,都不要自己构建数据库引擎,除非您真的喜欢构建数据库引擎。
I used a graph database in a previous job. We weren't using neo4j, it was an in-house thing built on top of Berkeley DB, but it was similar. It was used in production (it still is).
The reason we used a graph database was that the data being stored by the system and the operations the system was doing with the data were exactly the weak spot of relational databases and were exactly the strong spot of graph databases. The system needed to store collections of objects that lack a fixed schema and are linked together by relationships. To reason about the data, the system needed to do a lot of operations that would be a couple of traversals in a graph database, but that would be quite complex queries in SQL.
The main advantages of the graph model were rapid development time and flexibility. We could quickly add new functionality without impacting existing deployments. If a potential customer wanted to import some of their own data and graft it on top of our model, it could usually be done on site by the sales rep. Flexibility also helped when we were designing a new feature, saving us from trying to squeeze new data into a rigid data model.
Having a weird database let us build a lot of our other weird technologies, giving us lots of secret-sauce to distinguish our product from those of our competitors.
The main disadvantage was that we weren't using the standard relational database technology, which can be a problem when your customers are enterprisey. Our customers would ask why we couldn't just host our data on their giant Oracle clusters (our customers usually had large datacenters). One of the team actually rewrote the database layer to use Oracle (or PostgreSQL, or MySQL), but it was slightly slower than the original. At least one large enterprise even had an Oracle-only policy, but luckily Oracle bought Berkeley DB. We also had to write a lot of extra tools - we couldn't just use Crystal Reports for example.
The other disadvantage of our graph database was that we built it ourselves, which meant when we hit a problem (usually with scalability) we had to solve it ourselves. If we'd used a relational database, the vendor would have already solved the problem ten years ago.
If you're building a product for enterprisey customers and your data fits into the relational model, use a relational database if you can. If your application doesn't fit the relational model but it does fit the graph model, use a graph database. If it only fits something else, use that.
If your application doesn't need to fit into the current blub architecture, use a graph database, or CouchDB, or BigTable, or whatever fits your app and you think is cool. It might give you an advantage, and its fun to try new things.
Whatever you chose, try not to build the database engine yourself unless you really like building database engines.
我们已经与 Neo 团队合作一年多了,并且非常愉快。 我们对学术文献及其关系进行建模,这对于图数据库来说是正确的,并通过网络运行推荐算法。
如果您已经在使用 Java,我认为使用 Neo4j 进行建模非常简单,并且在我们尝试过的任何其他解决方案中,它具有最平坦/最快的 R/W 性能。
老实说,我很难不从图/网络的角度思考,因为它比设计复杂的表结构来保存对象属性和关系要容易得多。
话虽这么说,我们确实在 MySQL 中存储了一些信息,只是因为业务方更容易运行快速 SQL 查询。 要使用 Neo 执行相同的功能,我们需要编写目前没有足够带宽的代码。 一旦我们这样做了,我就会将所有数据转移到 Neo!
祝你好运。
We've been working with the Neo team for over a year now and have been very happy. We model scholarly artifacts and their relationships, which is spot on for a graph db, and run recommendation algorithms over the network.
If you are already working in Java, I think that modeling using Neo4j is very straight forward and it has the flattest / fastest performance for R/W of any other solutions we tried.
To be honest, I have a hard time not thinking in terms of a Graph/Network because it's so much easier than designing convoluted table structures to hold object properties and relationships.
That being said, we do store some information in MySQL simply because it's easier for the Business side to run quick SQL queries against. To perform the same functions with Neo we would need to write code that we simply don't have the bandwidth for right now. As soon as we do though, I'm moving all that data to Neo!
Good luck.
两点:
首先,根据我过去 5 年在 SQL Server 中使用的数据,我最近遇到了 SQL 需要运行的查询类型(嵌套关系……你知道的)的可扩展性墙。 ..图表)。 我一直在使用 neo4j,当我需要这种查找时,我的查找时间快了几个数量级。
其次,图数据库已经过时了。 不。 早期,当人们试图弄清楚如何有效地存储和查找数据时,他们创建并使用了图形和网络样式的数据库模型。 这些设计的目的是使物理模型反映逻辑模型,因此它们的效率不是那么高。 这种类型的数据结构适用于半结构化数据,但不适用于结构化密集数据。 因此,这个名叫 Codd 的 IBM 家伙正在研究安排和存储结构化数据的有效方法,并提出了关系数据库模型的想法。 这很好,人们很高兴。
我们有什么在这里? 两种工具用于两种不同的目的。 图数据库模型非常适合表示半结构化数据和实体之间的关系(可能存在也可能不存在)。 关系数据库适用于具有非常静态模式且连接深度不是很深的结构化数据。 一种适用于一种数据,另一种适用于其他类型的数据。
套用一句话来说,没有银弹。 说图数据库模型已经过时并且使用它就放弃了 40 年的进步,这是非常短视的。 这就像说使用 C 就放弃了我们为获得 Java 和 C# 等东西而经历的所有技术进步。 但这不是真的。 C 是执行某些任务所需的工具。 Java 是执行其他任务的工具。
Two points:
First, on the data I've been working with the past 5 years in SQL Server, I've recently hit the scalability wall with SQL for the type of queries we need to run (nested relationhsips...you know...graphs). I've been playing around with neo4j, and my lookup times are several orders of magnitude faster when I need this kind of lookup.
Second, to the point that graph databases are outdated. Um...no. Early on, as people were trying to figure out how to store and lookup data efficiently, they created and played with graph and network style database models. These were designed so the physical model reflected the logical model, so their efficiency wasnt that great. This type of data structure was good for semi-structured data, but not as good for structured dense data. So, this IBM dude named Codd was researching efficient ways to arrange and store structured data and came up with the idea for the relational database model. And it was good, and people were happy.
What do we have here? Two tools for two different purposes. Graph database models are very good for representing semi-structured data and the relationships between entities (that may or may not exist). Relational databases are good for structured data that has a very static schema, and where join depths do not go very deep. One is good for one kind of data, the other is good for other kinds of data.
To coin the phrase, there is no Silver Bullet. Its very short sighted to say that graph database models are out of date and to use one gives up 40 years of progress. That's like saying using C is giving up all the technological progress we've gone through to get things like Java and C#. That's not true though. C is a tool that is needed for certain tasks. And Java is a tool for other tasks.
我多年来一直使用 MySQL 来管理工程数据,它运行良好,但我们遇到的问题之一(但没有意识到)是我们总是必须预先规划模式。 我们知道遇到的另一个问题是将数据映射到域对象并映射回来。
现在我们刚刚开始尝试 neo4j,看起来它正在为我们解决这两个问题。 向每个节点(和关系)添加不同属性的能力使我们能够重新思考我们的整个数据方法。 这就像动态语言与静态语言(Ruby 与 Java)的比较,但对于数据库而言。 在数据库中构建数据模型可以以更加敏捷和动态的方式完成,这极大地简化了我们的代码。
而且由于代码中的对象模型通常是图形结构,因此从数据库进行映射也更简单,代码更少,因此错误也更少。
作为额外的好处,我们用于将数据加载到 Neo4j 的初始原型代码实际上比以前的 MySQL 版本执行得更快。 我对此还没有确切的数字,但这是一个很好的附加功能。
但最终,选择可能应该主要基于领域模型的性质。 它是否能更好地映射到表格或图表? 通过做一些原型、加载数据并使用它来决定。 使用 neoclipse 查看数据的不同视图。 一旦你做到了这一点,希望你知道你是否在做一件好事。
I've been using MySQL for years to manage engineering data, and it worked well, but one of the problems we had (but didn't realise we had) was that we always had to plan the schema up-front. Another problem we knew we had was mapping the data up to domain objects and back.
Now we've just started trying out neo4j and it looks like it is solving both problems for us. The ability to add different properties to each node (and relation) has allowed us to re-think our entire approach to data. It is like dynamic versus static languages (Ruby versus Java), but for databases. Building the data model in the database can be done in a much more agile and dynamic way, and that is dramatically simplifying our code.
And since the object model in code is generally a graph structure, mapping from the database is also simpler, with less code and consequently fewer bugs.
And as a additional bonus, our initial prototype code for loading our data into neo4j is actually performing faster than the previous MySQL version. I have no solid numbers on this (yet), but that was a nice additional feature.
But at the end of the day, the choice probably should be based mostly on the nature of your domain model. Does it map better to tables or graphs? Decide by doing some prototypes, load the data and play with it. Use neoclipse to look at different views of the data. Once you've done that, hopefully you know if you're on to a good thing or not.
这是一篇讨论非关系数据库满足需求的好文章:http://www.readwriteweb.com/enterprise/2009/02/is-the-relational-database-doomed.php
它在指出方面做得很好(除了名称)关系数据库没有缺陷或错误,只是现在人们开始在主流软件和网站中处理越来越多的数据,而关系数据库无法满足这些需求。
Here is a good article that talks about the needs that non relational databases fill: http://www.readwriteweb.com/enterprise/2009/02/is-the-relational-database-doomed.php
It does a good job at pointing out (aside from the name) that relational databases arent flawed or wrong, its just that these days people are starting to process more and more data in mainstream software and web sites, and that relational databases just wont scale for these needs.
我正在我的公司建立一个内联网。
我有兴趣了解如何加载存储在表(Oracle、MySQL、SQL Server、Excel、Access、各种随机列表)中的数据并将其加载到 Neo4J 或其他图形数据库中。 具体来说,当通用数据与系统中已有的数据重叠时会发生什么。
是的,我知道某些数据最好在 RDBMS 中建模,但我有一个想法让我很痒,即当您需要叠加多个不同的表时,图模型比表结构更好。
例如,我在制造环境中工作。 我们正在开展一个重大项目,由于其复杂性,每个部门都创建了一个单独的 Excel 电子表格,其中包含
因此,问题之一是将所有这些注释合并到一个“视图”中,以便有人可以看到任何特定部分需要解决的所有问题。
第二个问题是,当在多个子装配中使用公共组件时,Excel 电子表格很难表示层次结构的 BOM。 这意味着,如果有人在点火组件中写下有关 P34 继电器的注释,则相同的注释也应与电机驱动器组件中使用的 P34 继电器相关联。 Excel 电子表格中不会出现这种情况。
对于公司内网,我希望能够轻松地搜索到任何东西。 例如与零件号、BOM 结构、电话号码、电子邮件地址、公司政策或程序相关的数据。 我什至想扩展它来管理计算机硬件资产和安装的软件。
我设想,一旦信息网络开始填充,您就可以开始进行很酷的遍历,例如“我想给从事 XYZ 项目的每个人写一封电子邮件”。 人们将与该项目相关联,因为他们将被标记为在 XYZ 项目中创建和修改数据。 因此,通过使用 XYZ 项目作为搜索关键字,将创建一个包含与 XYZ 项目相关的所有内容的巨大集合。 包括构建 XYZ 项目的人员的链接。 人员链接将连接到他们的电子邮件地址。 因此,通过他们参与 XYZ 项目,他们将包含在我的电子邮件中。 这与某些秘书试图维护该项目工作人员名单形成鲜明对比。 我们生成了很多列表。 我们花费大量时间维护列表并确保它们是最新的。 其中大部分不会给我们的产品增加任何价值。
另一个很酷的遍历可以按版本报告安装了特定软件的所有计算机。 该报告可用于生成任务以删除旧软件的额外副本并更新需要拥有最新副本的人员。 它对于许可证跟踪也很有用。
I am building an intranet at my company.
I am interested in understanding how to load data that was stored in tables (Oracle, MySQL, SQL Server, Excel, Access, various random lists) and loading it into Neo4J, or some other graph database. Specifcally, what happens when common data overlaps existing data already in the system.
Yes, I know some data is best modeled in RDBMS, but I have this idea itching me, that when you need to superimpose several distinct tables, the graph model is better than the table structure.
For instance, I work in a manufacturing environment. There is a major project we are working on and because of the complexity, each department has created a seperate Excel spreadsheet that has a BOM (Bill Of Materials) hierarchy in a column on the left and then several columns of notes and checks made by individuals who made these sheets.
So one of the problems is merging all these notes together into one "view" so that someone can see all the issues that need to be addressed in any particular part.
The second problem is that an Excel spreadsheet sucks at representing a hierarchial BOM when a common component is used in more than one subassembly. Meaning that, if someone writes a note about the P34 relay in the ignition subassembly, the same comment should be associated with the P34 relays used in the motor driver subassembly. This won't occur in the excel spreadsheet.
For the company intranet, I want to be able to search for anything easily. Such as data related to a part number, a BOM structure, a phone number, an email address, a company policy, or procedure. I want to even extend this to manage computer hardware assets, and installed software.
I envision that once the information network starts to get populated you can start doing cool traversals such as "I want to write an email to everyone working on the XYZ project". People will have been associated with the project because they will be tagged as creating and modifying the data within the XYZ project. So by using the XYZ project as a search key, a huge set with everything related to the XYZ project will be created. Including links to people who built the XYZ project. The people links will connect to their email addresses. So by their involvement in the XYZ project, they will be included in my email. This is in stark contrast to some secretary trying to maintain a list of people work on the project. We generate a lot of lists. We spend a lot of time maintaining lists and making sure they are up to date. And most of it doesn't add any value to our products.
Another cool traversal could report all the computers that have a certain piece of software installed, by version. That report could be used to generate tasks to remove extra copies of old software and to update people who need to have the latest copy. It would also be useful for license tracking.
可能有点晚了,但是使用 Neo4j 的项目越来越多,其中较知名的项目列于 Neo4j 。 此外,NeoTechnology(Neo4j 背后的公司)在 他们的客户页面 上有一些参考资料
注意:我是Neo4j 团队
might be a bit late, but there is a growing number of projects using Neo4j, the better known ones listed at Neo4j . Also NeoTechnology, the company behind Neo4j, has some references at their customers page
Note: I am part of the Neo4j team