PHP 正则表达式用于在字符第一次出现时分割字符串

发布于 2024-09-13 22:29:15 字数 452 浏览 4 评论 0原文

这可能是一个蹩脚的问题,但我对正则表达式完全是新手。我有一些格式的文本数据:

公司名称:公司名称、地点。
公司地址:一些, 地址,这里。
链接: http://www.somelink.com

现在,我想使用正则表达式来分割这些到一个名称:值对的数组中。我正在尝试的正则表达式是 /(.*):(.*)/preg_match_all() ,它确实适用于前两行,但在第三行上它在一个部分返回“Link: http:”,在另一部分返回“//www.somelink.com”。

那么,有没有办法只在第一次出现字符“:”时分割行?

This may be a lame question but I am a total novice with regular expressions. I have some text data in the format:

Company Name: Name of the company, place.
Company Address: Some,
address, here.
Link:
http://www.somelink.com

Now, I want to use a regex to split these into an array of name : value pairs. The regular expression I am trying is /(.*):(.*)/ with preg_match_all() and it does work well with the first two lines but on the third line it returns "Link: http:" in one part and "//www.somelink.com" in other.

So, is there any way to split the line only at the first occurrence of the character ':'?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

君勿笑 2024-09-20 22:29:15

使用否定字符类(参见 rubular.com):

/^([^:]*):(.*)$/m

[…]< /code> 是一个字符类。像 [aeiou] 之类的东西匹配任何小写元音之一。 [^…] 是一个否定字符类。 [^aeiou] 匹配除小写元音之外的任何内容之一。

模式开头和结尾的 ^$ 是行 锚点m 修饰符打开多行模式

原始模式的问题在于,当您可以更具体时,您正在(ab)使用 . ,并且由于 * 是贪婪的,因此第一组被击败了。人们很容易尝试通过不情愿地重复来“修复”这个问题,但最好是更具体地说明第一组匹配除 : 之外的任何内容。

但请注意,这是一个带有捕获的匹配模式。它实际上并不是仅匹配分隔符的分割模式。分隔符模式实际上只是 :

相关问题


PHP 片段

鉴于此:

$text = <<<EOT
Company Name: Name of the company, place.
Company Address: Some, address, here.
Link: http://www.somelink.com
EOT;

preg_match_all('/^([^:]*):(.*)$/m', $text, $matches, PREG_SET_ORDER);

print_r($matches);

输出为 (如 ideone.com 上所示< /a>):

Array
(
    [0] => Array
        (
            [0] => Company Name: Name of the company, place.
            [1] => Company Name
            [2] =>  Name of the company, place.
        )

    [1] => Array
        (
            [0] => Company Address: Some, address, here.
            [1] => Company Address
            [2] =>  Some, address, here.
        )

    [2] => Array
        (
            [0] => Link: http://www.somelink.com
            [1] => Link
            [2] =>  http://www.somelink.com
        )

)

Use negated character class (see on rubular.com):

/^([^:]*):(.*)$/m

The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.

The ^ and $ at the beginning and end of the pattern are the beginning and end of the line anchors. The m modifiers turns on the multi-line mode.

The problem with your original pattern is that you're (ab)using . when you could've been a lot more specific, and since * is greedy, the first group overmatched. It's tempting to try to "fix" that by making the repetition reluctant, but it's MUCH better to be more specific and say that the first group is matching anything but :.

Note however that this is a matching pattern, with captures. It's not actually a splitting pattern that matches only the delimiter. The delimiter pattern really is just :.

Related questions


PHP snippet

Given this:

$text = <<<EOT
Company Name: Name of the company, place.
Company Address: Some, address, here.
Link: http://www.somelink.com
EOT;

preg_match_all('/^([^:]*):(.*)$/m', $text, $matches, PREG_SET_ORDER);

print_r($matches);

The output is (as seen on ideone.com):

Array
(
    [0] => Array
        (
            [0] => Company Name: Name of the company, place.
            [1] => Company Name
            [2] =>  Name of the company, place.
        )

    [1] => Array
        (
            [0] => Company Address: Some, address, here.
            [1] => Company Address
            [2] =>  Some, address, here.
        )

    [2] => Array
        (
            [0] => Link: http://www.somelink.com
            [1] => Link
            [2] =>  http://www.somelink.com
        )

)
蓝咒 2024-09-20 22:29:15

您可能想要类似 /(.*?):(.*)/ 的内容。 * 之后的 ? 将使它“非贪婪”,因此它会以这种方式消耗尽可能少的文本。我认为这适合你的情况。默认情况下,* 是“贪婪的”,并尝试匹配尽可能多的重复。

编辑:有关使用 * 匹配重复的更多信息,请参阅此处 > 和 + 运算符。

You probably want something like /(.*?):(.*)/. The ? after the * will make it "non-greedy", so it will consume as little text as possible that way. I think that will work for your situation. By default, * is "greedy", and tries to match as many repetitions as it can.

Edit: See here for more about matching repetition using the * and + operators.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文