使用正则表达式从 SPARQL 查询中提取信息

发布于 2024-09-15 10:03:21 字数 557 浏览 6 评论 0原文

我很难创建一个从该 SPARQL 查询中提取命名空间的正则表达式:

SELECT * 
WHERE {
    ?Vehicle rdf:type umbel-sc:CompactCar ;
             skos:subject <http://dbpedia.org/resource/Category:Vehicles_with_CVT_transmission>;
             dbp-prop:assembly ?Place.
    ?Place geo-ont:parentFeature dbpedia:United_States .
}

我需要得到:

"rdf", "umbel-sc", "skos", "dbp-prop", "geo-ont", "dbpedia"

我需要这样的表达式:

\\s+([^\\:]*):[^\\s]+

但上面的表达式不起作用,因为它在到达 之前也会占用空格:。我做错了什么?

I am having a hard time creating a regular expression that extracts the namespaces from this SPARQL query:

SELECT * 
WHERE {
    ?Vehicle rdf:type umbel-sc:CompactCar ;
             skos:subject <http://dbpedia.org/resource/Category:Vehicles_with_CVT_transmission>;
             dbp-prop:assembly ?Place.
    ?Place geo-ont:parentFeature dbpedia:United_States .
}

I need to get:

"rdf", "umbel-sc", "skos", "dbp-prop", "geo-ont", "dbpedia"

I need a expression like this:

\\s+([^\\:]*):[^\\s]+

But the above one does not work, because it also eats spaces before reaching :. What am I doing wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

雪若未夕 2024-09-22 10:03:21

所以我需要这样的表达式:

<前><代码>\\s+([^\\:]*):[^\\s]+

但是上面的方法不起作用,因为它在到达“:”之前也会吃掉空格。

正则表达式会吃掉这些空格,是的,但是括号捕获的组不会包含它。这是一个问题吗?您可以通过读取 Regex.Match 返回的 Match 对象中的 Groups[1].Value 来访问该组。

如果您确实需要正则表达式匹配这些空格,则可以使用所谓的后向断言

(?<=\s)([^:]*):[^\s]+

顺便说一句,您不需要将所有内容加倍你的反斜杠。使用逐字字符串来代替,如下所示:

Regex.Match(input, @"(?<=\s)([^:]*):[^\s]+")

So I need a expression like this:

\\s+([^\\:]*):[^\\s]+

But the above one does not work, because it also eats spaces before reaching ":".

The regular expression will eat those spaces, yes, but the group captured by your parenthesis won’t contain it. Is that a problem? You can access this group by reading from Groups[1].Value in the Match object returned from Regex.Match.

If you really need the regex to not match these spaces, you can use a so-called look-behind assertion:

(?<=\s)([^:]*):[^\s]+

As an aside, you don’t need to double all your backslashes. Use a verbatim string instead, like this:

Regex.Match(input, @"(?<=\s)([^:]*):[^\s]+")
苦笑流年记忆 2024-09-22 10:03:21

我不知道 SPARQL 语法的细节,但我想它不是一种正则语言,因此正则表达式无法完美地做到这一点。但是,如果您搜索看起来像单词并且左侧被空格包围且右侧被冒号包围的内容,则您可以非常接近。

对于快速解决方案或者如果您的输入格式已知且受到足够的限制,此方法可能足够好。对于更通用的解决方案,建议您为 SPARQL 语言寻找或创建合适的解析器。

话虽如此,试试这个:

string s = @"SELECT * 
WHERE {
    ?Vehicle rdf:type umbel-sc:CompactCar ;
    skos:subject <http://dbpedia.org/resource/Category:Vehicles_with_CVT_transmission>;
    dbp-prop:assembly ?Place.
    ?Place geo-ont:parentFeature dbpedia:United_States .
}";

foreach (Match match in Regex.Matches(s, @"\s([\w-]+):"))
{
    Console.WriteLine(match.Groups[1].Value);
}

结果:

rdf
umbel-sc
skos
dbp-prop
geo-ont
dbpedia

I don't know the details of SPARQL syntax, but I would imagine that it is not a regular language so regular expressions won't be able to do this perfectly. However you can get pretty close if you search for something that looks like a word and is surrounded by space on the left and a colon on the right.

This method might be good enough for a quick solution or if your input format is known and sufficiently restricted. For a more general solution suggest you look for or create a proper parser for the SPARQL language.

With that said, try this:

string s = @"SELECT * 
WHERE {
    ?Vehicle rdf:type umbel-sc:CompactCar ;
    skos:subject <http://dbpedia.org/resource/Category:Vehicles_with_CVT_transmission>;
    dbp-prop:assembly ?Place.
    ?Place geo-ont:parentFeature dbpedia:United_States .
}";

foreach (Match match in Regex.Matches(s, @"\s([\w-]+):"))
{
    Console.WriteLine(match.Groups[1].Value);
}

Result:

rdf
umbel-sc
skos
dbp-prop
geo-ont
dbpedia
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文