提取字符串匹配模式的一部分 - 正则表达式,关闭但没有雪茄

发布于 2024-12-12 18:01:40 字数 1694 浏览 0 评论 0原文

我有一个可能很长并且包含各种行和字符的字符串。

我想提取 SB & 包围的所有线。 EB:

SB1EB
SBa description of various lengthEB
SB123.456.78EB
SB99.99EB
SB99.99EB
SB2EB
SBanother description of various lengthEB
SB123.456.00EB
SB199.99EB
SB199.99EB
3
another description of various length that I don't want to return
123.456.00
599.99
599.99
SB60EB
SBanother description of various length that i want to keepEB
SB500.256.10EB
SB0.99EB
SB0.99EB
another bit of text that i don't want - can span multiple lines

这是我在 PHP 中使用的模式:

preg_match_all('/SB(\d+)EB\nSB(\w.*)EB\nSB(\d{3}\.\d{3}\.\d{2})EB\nSB(\d.*)EB\nSB(\d.*)EB\n/', $string, $matches)

所以这应该有希望返回:

[0] -> SB1EB
SBa description of various lengthEB
SB123.456.78EB
SB99.99EB
SB99.99EB

[1] -> SB2EB
SBanother description of various lengthEB
SB123.456.00EB
SB199.99EB
SB199.99EB

[2] -> SB60EB
SBanother description of various length that i want to keepEB
SB500.256.10EB
SB0.99EB
SB0.99EB

但我显然做错了什么,因为它不匹配任何东西。有人可以帮忙吗?

解决方案:

基于@Sajid 回复:

if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$))/', $string, $result)) {

    for($i=0;$i<count($result[0]);$i++){

        $single_item = $result[0][$i];
        $single_item = str_replace("SB","",$single_item);
        $single_item = str_replace("EB","",$single_item);
        if (preg_match('/(\d{3}\.\d{3}\.\d{2})/', $single_item)) {

            $id = $single_item;
            $qty = $result[0][$i-2];
            $name = $result[0][$i-1];
            $price = $result[0][$i+1];
            $total = $result[0][$i+2];

        }

    }

}

有点乱,但它有效! :)

谢谢

I have a string that can be very long and contain various lines and characters.

I am wanting to extract all lines that are surrounded by SB & EB:

SB1EB
SBa description of various lengthEB
SB123.456.78EB
SB99.99EB
SB99.99EB
SB2EB
SBanother description of various lengthEB
SB123.456.00EB
SB199.99EB
SB199.99EB
3
another description of various length that I don't want to return
123.456.00
599.99
599.99
SB60EB
SBanother description of various length that i want to keepEB
SB500.256.10EB
SB0.99EB
SB0.99EB
another bit of text that i don't want - can span multiple lines

This is the pattern I am using in PHP:

preg_match_all('/SB(\d+)EB\nSB(\w.*)EB\nSB(\d{3}\.\d{3}\.\d{2})EB\nSB(\d.*)EB\nSB(\d.*)EB\n/', $string, $matches)

So this should hopefully return:

[0] -> SB1EB
SBa description of various lengthEB
SB123.456.78EB
SB99.99EB
SB99.99EB

[1] -> SB2EB
SBanother description of various lengthEB
SB123.456.00EB
SB199.99EB
SB199.99EB

[2] -> SB60EB
SBanother description of various length that i want to keepEB
SB500.256.10EB
SB0.99EB
SB0.99EB

But I'm obviously doing something wrong because it isn't matching anything. Can somebody help please?

SOLUTION:

Based on @Sajid reply:

if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$))/', $string, $result)) {

    for($i=0;$i<count($result[0]);$i++){

        $single_item = $result[0][$i];
        $single_item = str_replace("SB","",$single_item);
        $single_item = str_replace("EB","",$single_item);
        if (preg_match('/(\d{3}\.\d{3}\.\d{2})/', $single_item)) {

            $id = $single_item;
            $qty = $result[0][$i-2];
            $name = $result[0][$i-1];
            $price = $result[0][$i+1];
            $total = $result[0][$i+2];

        }

    }

}

It's a bit messy, but it works! :)

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

国际总奸 2024-12-19 18:01:40

有点黑客,但这可以完成工作:

$a = array();    
if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$)){5}/', $x, $a)) {
    print_r($a);
}

注意 ?: 用于使组不捕获,结果将在 $a[0] 中(例如,$a[0][0], $a[0][1], $a[0][2] ...)

A bit of a hack, but this will do the job:

$a = array();    
if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$)){5}/', $x, $a)) {
    print_r($a);
}

Note that ?: is used to make the group non-capture, and the results will be in $a[0] (eg, $a[0][0], $a[0][1], $a[0][2] ...)

走过海棠暮 2024-12-19 18:01:40

基于@Sajid回复

if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$))/', $string, $result))
{
    for ($i=0; $i<count($result[0]); $i++)
    {
        $single_item = $result[0][$i];
        $single_item = str_replace("SB","",$single_item);
        $single_item = str_replace("EB","",$single_item);
        if (preg_match('/(\d{3}\.\d{3}\.\d{2})/', $single_item))
        {
            $id = $single_item;
            $qty = $result[0][$i-2];
            $name = $result[0][$i-1];
            $price = $result[0][$i+1];
            $total = $result[0][$i+2];
        }
    }
}

有点乱,但它有效! :)

Based on @Sajid reply:

if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$))/', $string, $result))
{
    for ($i=0; $i<count($result[0]); $i++)
    {
        $single_item = $result[0][$i];
        $single_item = str_replace("SB","",$single_item);
        $single_item = str_replace("EB","",$single_item);
        if (preg_match('/(\d{3}\.\d{3}\.\d{2})/', $single_item))
        {
            $id = $single_item;
            $qty = $result[0][$i-2];
            $name = $result[0][$i-1];
            $price = $result[0][$i+1];
            $total = $result[0][$i+2];
        }
    }
}

It's a bit messy, but it works! :)

聚集的泪 2024-12-19 18:01:40
preg_match_all('/SB\d+EB.*?(?=(?:SB\d+EB)|$)/s', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    # Matched text = $result[0][$i];
}

所以基本上我所做的(根据您的输入)只是检查“标头”字符串 SB\d+EB 作为入口点并消耗所有内容,直到找到另一个“标头”或输入的末尾。请注意 /s 修饰符,以便 .匹配换行符。

说明:

# SB\d+EB.*?(?=(?:SB\d+EB)|$)
# 
# Options: dot matches newline
# 
# Match the characters “SB” literally «SB»
# Match a single digit 0..9 «\d+»
#    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the characters “EB” literally «EB»
# Match any single character «.*?»
#    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=(?:SB\d+EB)|$)»
#    Match either the regular expression below (attempting the next alternative only if this one fails) «(?:SB\d+EB)»
#       Match the regular expression below «(?:SB\d+EB)»
#          Match the characters “SB” literally «SB»
#          Match a single digit 0..9 «\d+»
#             Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
#          Match the characters “EB” literally «EB»
#    Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
#       Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
preg_match_all('/SB\d+EB.*?(?=(?:SB\d+EB)|$)/s', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    # Matched text = $result[0][$i];
}

So basically what I am doing (based on your input) is simply checking the "header" string SB\d+EB as an entry point and consuming everything until I find another "header" or the end of the input. Note the /s modifier so that . matches newlines.

Explanation:

# SB\d+EB.*?(?=(?:SB\d+EB)|$)
# 
# Options: dot matches newline
# 
# Match the characters “SB” literally «SB»
# Match a single digit 0..9 «\d+»
#    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the characters “EB” literally «EB»
# Match any single character «.*?»
#    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=(?:SB\d+EB)|$)»
#    Match either the regular expression below (attempting the next alternative only if this one fails) «(?:SB\d+EB)»
#       Match the regular expression below «(?:SB\d+EB)»
#          Match the characters “SB” literally «SB»
#          Match a single digit 0..9 «\d+»
#             Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
#          Match the characters “EB” literally «EB»
#    Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
#       Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文