可以处理大型 RDF 数据集的企业级数据库?
是否有任何企业级数据库引擎(Oracle、MS SQL...等)可以处理大型 RDF 数据集(3.2 亿)和 SPARQL 查询?我想我的问题也是:SPARQL/RDF/OWL 是否准备好为企业提供大型现实数据仓库服务?如果没有,是否存在针对典型数据仓库星型模式调整 SPARQL/RDF 的有效机制。
谢谢!
Are there any enterprise-grade database engines (Oracle, MS SQL...etc) that can handle large RDF datasets (320 million) and SPARQL queries? I guess my question is also: is SPARQL/RDF/OWL ready for serving large real-world data warehouses for an enterprise? If not, are there efficient mechanisms for adapting SPARQL/RDF against a typical data warehouse star schema.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
Virtuoso - 是 Bio2RDF 和 DBPedia
Virtuoso - is the datastore used by Bio2RDF and DBPedia
根据 Kaarel 的建议,今年在 ISWC 上展示的参赛作品之一使用了 4store,尽管竞争对手将其设置为一些奇怪的配置,但 Gralik(开发 4store)的 CTO 向我和同事形容为“疯狂”,但它的规模确实如此之大。 4store 将能够达到这种规模 - http://4store.org
另外 Virtuoso 支持这种规模的商店,他们有您可以使用该实时应用程序对大多数主要 LOD(链接开放数据)数据源(总计约 90 亿个 Triples)进行 SPARQL 查询
Virtuoso - http://virtuoso.openlinksw.com
LOD 应用程序 - http://lod.openlinksw.com/sparql
Following from Kaarel's suggestion one of the entries this year presented at ISWC used 4store which does scale that far though the competitor set it up in some weird configuration which the CTO of Gralik (who develop 4store) described to me and colleagues as 'crazy' but 4store would be capable of that scale - http://4store.org
Also Virtuoso supports stores at this scale, they have a live application that you can use to SPARQL query over the majority of the major LOD (Linked Open Data) data sources which total around 9 billion Triples
Virtuoso - http://virtuoso.openlinksw.com
LOD Application - http://lod.openlinksw.com/sparql
我在 W3C wiki 上维护了这个大型三重存储列表:
http://esw.w3.org/topic/LargeTripleStores
有 7 个三元组存储已知能够容纳超过十亿个三倍。其中四个是开源的。如果您有更多信息,请更新上述维基页面。
显然,性能取决于您使用它的用途。我在一个大型工业项目中使用了Virtuoso,速度相当快。
I maintain this list of large triplestores on the W3C wiki:
http://esw.w3.org/topic/LargeTripleStores
There are 7 seven triplestores that are known to be able to hold over a billion triples. Four of them are open source. Please update the above-mentioned wiki page if you have more information.
Obviously, performance depends on what you use it for. I used Virtuoso in a large-scale industrial project, and it is quite fast.
Neo4j 开箱即用,SAIL API 处理大约 1+ 十亿个三元组 此处,同时仍然拥有整个图表来执行诸如 Gremlin 或 SPARQL。
免责声明:我是 Neo4j 团队的一员。
Neo4j handles around 1+ Billion triples out of the box, SAIL API here, while still have the whole graph to do advanced stuff with things like Gremlin, or SPARQL.
Disclaimer: I am part of the Neo4j team.
Intellidimension 提供了一个名为 语义服务器,在 Microsoft 的 SQL Server 2005 或 2008 之上开发。它可以轻松扩展到数亿个三元组,我知道他们至少有一个客户很高兴运行包含超过十亿条语句的企业部署。
我是他们处理数据集的客户之一> 1亿。我们的计划是向数百亿条报表迈进。
Intellidimension provides a solution called Semantic Server that is developed on top of Microsoft's SQL Server 2005 or 2008. It easily scales to the hundreds of millions of triples and I know they have at least one customer happily running an enterprise deployment with over a billion statements.
I am one of their customers working with datasets > 100 million. Our plans are to move towards the 10s of billions of statements.
4store 看起来是一个很好的解决方案,但是此时的文档非常稀疏,当我上次查看它时,无法从图表中删除单个三元组。
我还会看一下 BigData
这是他们主页上的一句话,总结了他们的产品。
4store looks to be a good solution however the documentation is pretty sparse at this time and when I last looked at it there was no ability to delete an individual triple from the graph.
I would also take a look at BigData
Here is a quote from their main page summarizing their offering.