匹配括号中的 x 个单词正则表达式

发布于 2024-10-02 14:16:18 字数 439 浏览 5 评论 0原文

如果字符串包含 4 个或更多单词,我会尝试从该字符串中删除括号。我一直在摸不着头脑,却一事无成。

preg_replace('#\([word]{4,}\)#', '', $str); # pseudo code

示例字符串:

罗伯特·阿尔纳基金标准公开赛 NH 平地赛(由安德鲁·斯图尔特慈善基金会支持)

要匹配(括号中超过 x 个单词)并删除:

(由安德鲁斯图尔特慈善基金会支持)

我有两个数据源,正在使用:

similar_text($str1, $str2, &$percent)

进行比较,括号中的长字符串对于一个源来说是唯一的。

I am trying to remove brackets from a string if it contains 4 or more words. I have been scratching my head and cannot get anywhere with it.

preg_replace('#\([word]{4,}\)#', '', $str); # pseudo code

Sample string:

Robert Alner Fund Standard Open NH Flat Race (Supported by The Andrew Stewart Charitable Foundation)

To match (more than x words in brackets) and remove:

(Supported by The Andrew Stewart Charitable Foundation)

I have two sources of data, and am using:

similar_text($str1, $str2, &$percent)

to compare and longish strings in brackets are unique to one source.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

删除会话 2024-10-09 14:16:18

好吧,你已经很接近了......

preg_replace('#\((\b\w+\b[^\w)]*){4,}\)#', '', $str);

基本上,内部子模式 (\b\w+\b[^\w)]*) 匹配单词边界(意味着不在两个单词之间)单词字符),后跟至少一个单词字符 (a-z0-9),后跟另一个单词边界,最后后跟 0 个或多个非单词字符且不是 ) 的字符。 ..

测试:

$tests = array(
    'test1 (this is three)',
    'test2 (this is four words)',
    'test3 (this is four words) and (this is three)',
    'test4 (this is five words inside)',
);

foreach ($tests as $str) {
    echo $str . " - " . preg_replace('#\((\b\w+\b[^\w)]*){4,}\)#', '', $str) . "\n";
}

给出:

test1 (this is three) - test1 (this is three)
test2 (this is four words) - test2
test3 (this is four words) and (this is three) - test3  and (this is three)
test4 (this is five words inside) - test4

Well, you're close...

preg_replace('#\((\b\w+\b[^\w)]*){4,}\)#', '', $str);

Basically, the inner sub-pattern (\b\w+\b[^\w)]*) matches a word-boundary (meaning not in-between two word characters) followed by at least one word character (a-z0-9), followed by another word-boundary, and finally followed by 0 or more characters that are not word characters and are not )...

Testing with:

$tests = array(
    'test1 (this is three)',
    'test2 (this is four words)',
    'test3 (this is four words) and (this is three)',
    'test4 (this is five words inside)',
);

foreach ($tests as $str) {
    echo $str . " - " . preg_replace('#\((\b\w+\b[^\w)]*){4,}\)#', '', $str) . "\n";
}

Gives:

test1 (this is three) - test1 (this is three)
test2 (this is four words) - test2
test3 (this is four words) and (this is three) - test3  and (this is three)
test4 (this is five words inside) - test4
满栀 2024-10-09 14:16:18

为此,您不需要 preg_replace()。只需使用 substr_count() 计算空格,然后使用 str_replace()

You don't need preg_replace() for this. Just count the spaces with substr_count(), then use str_replace().

神经暖 2024-10-09 14:16:18

语法 […] 具有特殊含义。 […] 被称为 字符类 并匹配以下之一列出的字符。因此 [word] 匹配 word 的字符之一代码>.

现在如果你想匹配单词,你应该首先定义单词是什么。如果一个单词是除空白字符之外的字符序列(\S 表示所有非空白字符),您可以这样做:

/\S+(\s+\S+){3,}/

这匹配四个或更多单词的任何序列(非空白字符的序列) ),由空格字符 (\s) 分隔。

括号中包含四个或更多单词:

/\(\S+(\s+\S+){3,})/

请注意,\S 确实匹配除空白字符之外的任何其他字符,这意味着甚至包括周围的括号。因此,您可能需要将 \S 更改为 [^\s)]

/\([^\s)]+(\s+[^\s)]+){3,})/

The syntax […] has a special meaning. […] are so called character classes and match one of the listed characters. So [word] matches one of the character of w, o, r, d.

Now if you want to match words, you should first define what a word is. If a word is a sequence of characters except whitespace characters (\S represents all non-whitespace characters), you could do this:

/\S+(\s+\S+){3,}/

This matches any sequence of four or more words (sequence of non-whitespace characters) that are separated by whitespace characters (\s).

And four or more words in brackets:

/\(\S+(\s+\S+){3,})/

Note that \S does match anything else but whitespace characters, that means even the surrounding brackets. So you might want to change \S to [^\s)]:

/\([^\s)]+(\s+[^\s)]+){3,})/
笑看君怀她人 2024-10-09 14:16:18

我不是专家,但这可能有用。
这是一个模式字符串:

/\(((\w*?\s){3,}[\w]+?.*?)\)/i

这是 PHP 中的替换字符串,用于获取除前导和尾随转义括号之外的所有内容。

$1

这是 preg_replace 函数。

preg_replace('/\(((\w*?\s){3,}[\w]+?.*?)\)/i',$1,$string);

I'm no expert, but this might work.
Here's a pattern string:

/\(((\w*?\s){3,}[\w]+?.*?)\)/i

And here's a replacement string in PHP to take everything except the leading and trailing escaped parentheses.

$1

Here's the preg_replace function.

preg_replace('/\(((\w*?\s){3,}[\w]+?.*?)\)/i',$1,$string);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文