在 RDF 图中搜索部分匹配

发布于 2024-10-17 15:23:50 字数 1640 浏览 8 评论 0原文

我如何搜索 RDF 数据库以找到与示例图重叠最多的图段?

例如,假设我的数据库存储以下任意图:

entity1 [
    type "TOP" ;
    attr1 [
        attr11 [
            attr111 "apple" ;
        ] ;
        attr12 [
            attr121 "orange" ;
        ] ;
        attr13 [
            attr131 "banana" ;
        ] ;
    ] ;
    attr2 [
        attr21 [
            attr211 "falcon" ;
        ] ;
        attr22 [
            attr221 "pigeon" ;
        ] ;
        attr23 [
            attr231 "parrot" ;
        ] ;
    ] ;
] .
entity2 [
    type "TOP" ;
    attr11 [
        attr111 "apple" ;
    ] ;
    attr12 [
        attr121 "orange" ;
    ] ;
] .
entity3 [
    type "TOP" ;
    attr2 [
        attr_middle [
            attr21 [
                attr211 "falcon" ;
            ] ;
            attr22 [
                attr221 "pigeon" ;
            ] ;
            attr23 [
                attr231 "parrot" ;
            ] ;
        ] ;
    ] ;
] .

现在假设我有样本图:

sample [
    type "TOP" ;
    attr11 [
        attr111 "apple" ;
    ] ;
    attr12 [
        attr121 "orange" ;
    ] ;
    attr13 [
        attr131 "banana" ;
    ] ;
    attr21 [
        attr211 "falcon" ;
    ] ;
    attr22 [
        attr221 "pigeon" ;
    ] ;
    attr23 [
        attr231 "parrot" ;
    ] ;
] .

显然,数据库中没有任何内容与样本完美匹配,但每个实体部分匹配它,即使命令三元组存在于每个实体的不同级别图形。

我如何找到与样本最接近的匹配项?在这种情况下,我希望查询返回,首先排序最佳匹配,[entity1,entity3,entity2]

我对 RDF 还是有点陌生​​,所以如果我的术语有问题,请原谅我。据我目前对 RDF 数据库的了解,我想要做的并不是它们的典型使用方式。如果我想使用 SPARQL 查询找到“包含”关系 attr111 =“apple”的实体,我通常必须假设该关系位于相对于每个实体的固定位置,而在相对于每个实体的任意位置搜索三元组“根”要困难得多。这是正确的吗?

How would I search an RDF database to find the segments of the graph that overlap the most with a sample graph?

For example, say my database stores the following arbitrary graphs:

entity1 [
    type "TOP" ;
    attr1 [
        attr11 [
            attr111 "apple" ;
        ] ;
        attr12 [
            attr121 "orange" ;
        ] ;
        attr13 [
            attr131 "banana" ;
        ] ;
    ] ;
    attr2 [
        attr21 [
            attr211 "falcon" ;
        ] ;
        attr22 [
            attr221 "pigeon" ;
        ] ;
        attr23 [
            attr231 "parrot" ;
        ] ;
    ] ;
] .
entity2 [
    type "TOP" ;
    attr11 [
        attr111 "apple" ;
    ] ;
    attr12 [
        attr121 "orange" ;
    ] ;
] .
entity3 [
    type "TOP" ;
    attr2 [
        attr_middle [
            attr21 [
                attr211 "falcon" ;
            ] ;
            attr22 [
                attr221 "pigeon" ;
            ] ;
            attr23 [
                attr231 "parrot" ;
            ] ;
        ] ;
    ] ;
] .

And now say I have the sample graph:

sample [
    type "TOP" ;
    attr11 [
        attr111 "apple" ;
    ] ;
    attr12 [
        attr121 "orange" ;
    ] ;
    attr13 [
        attr131 "banana" ;
    ] ;
    attr21 [
        attr211 "falcon" ;
    ] ;
    attr22 [
        attr221 "pigeon" ;
    ] ;
    attr23 [
        attr231 "parrot" ;
    ] ;
] .

Clearly, nothing in the database matches the sample perfectly, but each entity matches it partially, even if the comman triples exist at different levels in each graph.

How would I find the closest matches to the sample? In this case, I'd expect a query to return, sorted best match first, [entity1, entity3, entity2].

I'm still a little new to RDF, so forgive me if my terminology is off. As I currently understand RDF databases, what I'm trying to do isn't typically how they're used. If I want to find the entities "containing" the relation attr111 = "apple" using a SPARQL query, I'd generally have to assume that relation is at a fixed location relative to each entity, wheras searching to triples at arbitrary locations relative to a "root" is much more difficult. Is that correct?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦屿孤独相伴 2024-10-24 15:23:50

不,这并不困难,但您的 SPARQL 查询可能会变得相当长才能实现此目的。无需假设固定根,因为您可以使用变量作为根,如我的示例所示。在根固定的情况下,用变量代替值。

注意 - 如果生成的查询中没有变量,那么最好将其表述为 ASK 查询。如果您使用 SELECT 查询并且没有变量,您将无法区分匹配的查询结果和不匹配的查询结果。而 ASK 查询会返回 truefalse,具体取决于 WHERE 子句是否匹配

如果您的 SPARQL 处理器支持 SPARQL 1.1 那么您可以使用属性路径。例如,

SELECT * WHERE { ?s ex:predicate / ex:predicate / ex:predicate "value" }

如果您只有 SPARQL 1.0,则必须像这样显式地声明匹配:

SELECT * WHERE
{
  ?s ex:predicate _:b1 .
  _:b1 ex:predicate _:b2 .
  _:b2 ex:predicate "value" .
}

请注意,从语义上讲,这两种形式实际上是等效的 - SPARQL 1.1 形式是 SPARQL 1.0 形式的一个很好的语法快捷方式。

显然,您想要匹配的图表部分越大,您的 SPARQL 查询就会变得越大。

No it is not that difficult but your SPARQL queries may become rather long to achieve this. There is no need to assume a fixed root since you can use variables for the root as shown in my examples. In the case where the root is fixed substitute the variable for a value.

Note - If the resulting query has no variables in it then it would be better phrased as an ASK query. If you use a SELECT query and there are no variables you have no way to distinguish between a query results that matches and one that doesn't. Whereas an ASK query returns either true or false depending on whether the WHERE clause matches

If your SPARQL processor supports SPARQL 1.1 then you can use property paths .e.g

SELECT * WHERE { ?s ex:predicate / ex:predicate / ex:predicate "value" }

If you only have SPARQL 1.0 then you have to state the match explicitly like so:

SELECT * WHERE
{
  ?s ex:predicate _:b1 .
  _:b1 ex:predicate _:b2 .
  _:b2 ex:predicate "value" .
}

Note that semantically these two forms are actually equivalent - the SPARQL 1.1 form is a nice syntactic shortcut for the SPARQL 1.0 form.

Obviously the larger the part of your Graph you want to match grows the larger your SPARQL query will get.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文