PHP 中的正则表达式提取 nquad 的组件

发布于 2024-12-13 17:39:34 字数 1118 浏览 6 评论 0原文

我正在寻找可以帮助我解析 nquad 文件的正则表达式。 nquad 文件是一个直接文本文件,其中每行代表一个四边形(s、p、o、c):

<http://mysubject> <http://mypredicate> <http://myobject> <http://mycontext> .
<http://mysubject> <http://mypredicate2> <http://myobject2> <http://mycontext> .
<http://mysubject> <http://mypredicate2> <http://myobject2> <http://mycontext> .

对象也可以是文字(而不是 uri),在这种情况下它们用双引号引起来:

<http://mysubject> <http://mypredicate> "My object" <http://mycontext> .

我正在寻找对于给出该文件一行的正则表达式,它将返回一个以下格式的 php 数组:

[0] => "http://mysubject"
[1] => "http://mypredicate"
[2] => "http://myobject"
[3] => "http://mycontext"

...或者在双引号用于对象的情况下:

[0] => "http://mysubject"
[1] => "http://mypredicate"
[2] => "My Object"
[3] => "http://mycontext"

最后一件事 - 在理想的世界中,正则表达式将满足那里的情况各个组件之间可能有 1 个或多个空格,例如

<http://mysubject>     <http://mypredicate>  "My object"       <http://mycontext> .

I'm looking around for a RegEx that can help me parse an nquad file. An nquad file is a straight text file where each line represents a quad (s, p, o, c):

<http://mysubject> <http://mypredicate> <http://myobject> <http://mycontext> .
<http://mysubject> <http://mypredicate2> <http://myobject2> <http://mycontext> .
<http://mysubject> <http://mypredicate2> <http://myobject2> <http://mycontext> .

The objects can also be literals (instead of uris), in which case they are enclosed with double quotes:

<http://mysubject> <http://mypredicate> "My object" <http://mycontext> .

I'm looking for a regex that given one line of this file, which will give me back a php array in the following format:

[0] => "http://mysubject"
[1] => "http://mypredicate"
[2] => "http://myobject"
[3] => "http://mycontext"

...or in the case where the double quotes are used for the object:

[0] => "http://mysubject"
[1] => "http://mypredicate"
[2] => "My Object"
[3] => "http://mycontext"

One final thing - in an ideal world, the regex will cater for the scenario there may be 1 or more spaces between the various components, e.g.

<http://mysubject>     <http://mypredicate>  "My object"       <http://mycontext> .

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

你是年少的欢喜 2024-12-20 17:39:34

我将添加另一个答案作为仅使用正则表达式并爆炸的附加解决方案:

$line = "<http://mysubject> <http://mypredicate> <http://myobject> <http://mycontext>";
$line2 = '<http://mysubject> <http://mypredicate> "My object" <http://mycontext>';

$delimeter = '---'; // Can't use space
$result = preg_replace('/<([^>]*)>\s+<([^>]*)>\s+(?:["<]){1}([^">]*)(?:[">]){1}\s+<([^>]*)>/i', '$1' . $delimeter . '$2' . $delimeter . '$3' . $delimeter . '$4', $line);
$array = explode( $delimeter, $result);

I'm going to add another answer as an additional solution using only a regex and explode:

$line = "<http://mysubject> <http://mypredicate> <http://myobject> <http://mycontext>";
$line2 = '<http://mysubject> <http://mypredicate> "My object" <http://mycontext>';

$delimeter = '---'; // Can't use space
$result = preg_replace('/<([^>]*)>\s+<([^>]*)>\s+(?:["<]){1}([^">]*)(?:[">]){1}\s+<([^>]*)>/i', '$1' . $delimeter . '$2' . $delimeter . '$3' . $delimeter . '$4', $line);
$array = explode( $delimeter, $result);
蛮可爱 2024-12-20 17:39:34

似乎可以按如下方式完成(我不知道您的字符限制,因此它可能无法专门满足您的需求,但适用于您的测试用例):

$line = "<http://mysubject> <http://mypredicate> <http://myobject> <http://mycontext>";
$line2 = '<http://mysubject> <http://mypredicate> "My object" <http://mycontext>';

// Remove unnecessary whitespace between entries (change $line to $line2 for testing)
$delimeter = '---';
$result = preg_replace('/([">]){1}\s+(["<]){1}/i', '$1' . $delimeter . '$2', $line);

// Explode on our delimeter
$array = explode( $delimeter, $result);
foreach( $array as &$a)
{
    // Replace the characters we don't want with nothing
    $a = str_replace( array( '<', '.', '>', '"'), '', $a);
}

var_dump( $array);

It seems this can be accomplished as follows (I do not know your character restrictions so it may not work specifically for your needs, but worked for your test cases):

$line = "<http://mysubject> <http://mypredicate> <http://myobject> <http://mycontext>";
$line2 = '<http://mysubject> <http://mypredicate> "My object" <http://mycontext>';

// Remove unnecessary whitespace between entries (change $line to $line2 for testing)
$delimeter = '---';
$result = preg_replace('/([">]){1}\s+(["<]){1}/i', '$1' . $delimeter . '$2', $line);

// Explode on our delimeter
$array = explode( $delimeter, $result);
foreach( $array as &$a)
{
    // Replace the characters we don't want with nothing
    $a = str_replace( array( '<', '.', '>', '"'), '', $a);
}

var_dump( $array);
烟火散人牵绊 2024-12-20 17:39:34

这个正则表达式会有所帮助:

/(\S+?)\s+(\S+?)\s+(\S+?)\s+(\S+?)\s+\./

(s, p, o, c) 值将位于 $1, $2, $3, $4 变量中。

This regular expression would help:

/(\S+?)\s+(\S+?)\s+(\S+?)\s+(\S+?)\s+\./

(s, p, o, c) values will be in $1, $2, $3, $4 variables.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文