可以处理大型 RDF 数据集的企业级数据库?

发布于 2024-08-09 19:41:09 字数 170 浏览 3 评论 0原文

是否有任何企业级数据库引擎(Oracle、MS SQL...等)可以处理大型 RDF 数据集(3.2 亿)和 SPARQL 查询?我想我的问题也是:SPARQL/RDF/OWL 是否准备好为企业提供大型现实数据仓库服务?如果没有,是否存在针对典型数据仓库星型模式调整 SPARQL/RDF 的有效机制。

谢谢!

Are there any enterprise-grade database engines (Oracle, MS SQL...etc) that can handle large RDF datasets (320 million) and SPARQL queries? I guess my question is also: is SPARQL/RDF/OWL ready for serving large real-world data warehouses for an enterprise? If not, are there efficient mechanisms for adapting SPARQL/RDF against a typical data warehouse star schema.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

酒解孤独 2024-08-16 19:41:09

Virtuoso - is the datastore used by Bio2RDF and DBPedia

嘿哥们儿 2024-08-16 19:41:09

根据 Kaarel 的建议,今年在 ISWC 上展示的参赛作品之一使用了 4store,尽管竞争对手将其设置为一些奇怪的配置,但 Gralik(开发 4store)的 CTO 向我和同事形容为“疯狂”,但它的规模确实如此之大。 4store 将能够达到这种规模 - http://4store.org

另外 Virtuoso 支持这种规模的商店,他们有您可以使用该实时应用程序对大多数主要 LOD(链接开放数据)数据源(总计约 90 亿个 Triples)进行 SPARQL 查询

Virtuoso - http://virtuoso.openlinksw.com
LOD 应用程序 - http://lod.openlinksw.com/sparql

Following from Kaarel's suggestion one of the entries this year presented at ISWC used 4store which does scale that far though the competitor set it up in some weird configuration which the CTO of Gralik (who develop 4store) described to me and colleagues as 'crazy' but 4store would be capable of that scale - http://4store.org

Also Virtuoso supports stores at this scale, they have a live application that you can use to SPARQL query over the majority of the major LOD (Linked Open Data) data sources which total around 9 billion Triples

Virtuoso - http://virtuoso.openlinksw.com
LOD Application - http://lod.openlinksw.com/sparql

隐诗 2024-08-16 19:41:09

我在 W3C wiki 上维护了这个大型三重存储列表:
http://esw.w3.org/topic/LargeTripleStores

有 7 个三元组存储已知能够容纳超过十亿个三倍。其中四个是开源的。如果您有更多信息,请更新上述维基页面。

显然,性能取决于您使用它的用途。我在一个大型工业项目中使用了Virtuoso,速度相当快。

I maintain this list of large triplestores on the W3C wiki:
http://esw.w3.org/topic/LargeTripleStores

There are 7 seven triplestores that are known to be able to hold over a billion triples. Four of them are open source. Please update the above-mentioned wiki page if you have more information.

Obviously, performance depends on what you use it for. I used Virtuoso in a large-scale industrial project, and it is quite fast.

南汐寒笙箫 2024-08-16 19:41:09

Neo4j 开箱即用,SAIL API 处理大约 1+ 十亿个三元组 此处,同时仍然拥有整个图表来执行诸如 Gremlin 或 SPARQL。

免责声明:我是 Neo4j 团队的一员。

Neo4j handles around 1+ Billion triples out of the box, SAIL API here, while still have the whole graph to do advanced stuff with things like Gremlin, or SPARQL.

Disclaimer: I am part of the Neo4j team.

走过海棠暮 2024-08-16 19:41:09

Intellidimension 提供了一个名为 语义服务器,在 Microsoft 的 SQL Server 2005 或 2008 之上开发。它可以轻松扩展到数亿个三元组,我知道他们至少有一个客户很高兴运行包含超过十亿条语句的企业部署。

我是他们处理数据集的客户之一> 1亿。我们的计划是向数百亿条报表迈进。

Intellidimension provides a solution called Semantic Server that is developed on top of Microsoft's SQL Server 2005 or 2008. It easily scales to the hundreds of millions of triples and I know they have at least one customer happily running an enterprise deployment with over a billion statements.

I am one of their customers working with datasets > 100 million. Our plans are to move towards the 10s of billions of statements.

爱的故事 2024-08-16 19:41:09

4store 看起来是一个很好的解决方案,但是此时的文档非常稀疏,当我上次查看它时,无法从图表中删除单个三元组。

我还会看一下 BigData

这是他们主页上的一句话,总结了他们的产品。

Bigdata(R) 是一种开源横向扩展存储和计算结构,支持可选事务、非常高的并发性和非常高的聚合 IO 速率。 Bigdata 是从头开始设计的一种分布式数据库架构,针对在 100 到 1000 台机器的集群上运行的非常高的聚合 IO 速率进行了优化,但也可以在单服务器模式下运行。 Bigdata 提供了一个分布式文件系统(类似于 Google 文件系统),但也可用于工作流队列、数据可扩展稀疏行存储(类似于 Google 广泛认可的 bigtable 项目)以及用于在集群上并行化数据密集型工作流的映射/归约处理。< /p>

Bigdata(R) 附带一个非常高性能的 RDF 存储,支持 RDF(S) 和 OWL Lite 推理。 Bigdata RDF Store 是目前唯一能够在具有动态键范围索引分区的集群上分布式运行的 RDF 数据库。 Bigdata RDF Store 专为满足超大规模语义对齐和联合的要求而设计。 RDF 是一种语义 Web 技术,特别适合对图形数据和元数据进行建模,例如关联实体链接模型,其中参与者在不断发展的概念本体的背景下以特定方式彼此链接用于与特定问题域相关的实体类型和链接类型。 Bigdata RDF Store 在数据收集系统中实际使用,以模式灵活的方式创建来自无数来源的结构化、半结构化和非结构化数据的混搭。

4store looks to be a good solution however the documentation is pretty sparse at this time and when I last looked at it there was no ability to delete an individual triple from the graph.

I would also take a look at BigData

Here is a quote from their main page summarizing their offering.

Bigdata(R) is an open-source scale-out storage and computing fabric supporting optional transactions, very high concurrency, and very high aggregate IO rates. Bigdata was designed from the ground up as a distributed database architecture optimized for very high aggregate IO rates running over clusters of 100s to 1000s of machines, but can also run in a single-server mode. Bigdata offers a distributed file system, similar to the Google File System but also useful for workflow queues, a data extensible sparse row store, similar to Googles widely recognized bigtable project, and map/reduce processing for parallelizing data intensive workflows over a cluster.

Bigdata(R) comes packaged with a very high-performance RDF store supporting RDF(S) and OWL Lite inference. The Bigdata RDF Store is currently the only RDF database capable of operating distributed on a cluster with dynamic key-range partitioning of indices. The Bigdata RDF Store was designed specifically to meet requirements for very large scale semantic alignment and federation. RDF is a Semantic Web technology particularly well-suited to modeling graph-shaped data and metadata, such as an associative entity-link model, whereby actors are linked to one another in an ad-hoc fashion within the context of an evolving ontology of concepts for entity types and link types related to a particular problem domain. The Bigdata RDF Store is used operationally in data harvesting systems to create mash-ups of structured, semi-structured, and unstructured data from myriad sources in a schema-flexible manner.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文