三联店可以扩展吗

发布于 2024-09-10 19:15:26 字数 243 浏览 5 评论 0原文

我读到的大多数三元组存储据说可以扩展到大约 5 亿个三元组。

我很想知道人们是否认为有一个理论上的原因来解释为什么他们必须有一个上限,以及你是否知道任何特定的方法来使它们更具可扩展性。

我很好奇现有的三联店是否会做这样的事情:

  • 用整数表示 URI
  • 整数按顺序排列
  • 搜索整数而不是 URI,我认为这一定更快(因为您可以执行二进制搜索等操作)

    想法...

  • Most triple stores I read about are said to be scalable to around .5 billion triples.

    I am interested to know if people think there is a theoretical reason to why they have to have an upper limit, and whether you know of any particular ways to make them more scalable.

    I am curious to know if existing triple stores do things like this:

  • Represent URIs with integers
  • Integers in order
  • Search the integers instead of the URIs which I would imagine must be faster (because you can do things like a binary search etc.)

    Thoughts ...

  • 如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

    扫码二维码加入Web技术交流群

    发布评论

    需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

    评论(1

    白云悠悠 2024-09-17 19:15:26

    为了达到 5 亿美元,一家三联店必须做到所有这些,甚至更多。我花了几年时间致力于三元组存储的实现,我可以告诉你,突破 10 亿个三元组并不像看起来那么简单。

    问题是许多 rdf 查询是二阶或三阶的(高阶查询也并非闻所未闻)。这意味着您不仅查询一组实体,而且同时查询有关该组实体的数据;有关实体模式的数据;描述用于描述实体模式的模式语言的数据。

    所有这一切都没有关系数据库可用的任何约束,以允许它对该数据/元数据/元元数据/等的形状做出假设。

    有很多方法可以达到 5 亿以上的目标,但它们绝非微不足道,而且为了达到我们现在的水平,需要实现容易实现的目标(即您提到的方法)。

    话虽这么说,rdf 存储提供的灵活性,加上通过其在描述逻辑中的解释可用的指称语义,使得这一切都是值得的。

    Just to get to 500million a triple store has to do all of that and more. I have spent several years working on a triple store implementation, and I can tell you that breaking 1 billion triples is not as simple as it may seem.

    The problem is that many rdf queries are 2nd or 3rd order (and higher-orders are far from unheard of). This means that you are not only querying a set of entities, but simultaneously the data about the set of entities; data about the entities schemas; data describing the schema language used to describe the entities schemas.

    All of this without any of the constraints available to a relational database to allow it to make assumptions about the shape of this data/metadata/metametadata/etc.

    There are ways to get beyond 500 million, but they are far from trivial, and the low hanging fruit (ie. the approaches you have mentioned) were required just to get to where we are now.

    That being said, the flexibility provided by an rdf-store, combined with a denotational semantic available via its interpretation in Description Logics, makes it all worthwhile.

    ~没有更多了~
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文