preg_replace 数组:掩码正则表达式字符问题
我想用链接替换词组。
词组在多维数组中定义。将有数千个术语需要替换,因此需要一个无索引的、轻量级的多维数组。
当术语后面有括号或方括号内时,不应替换任何内容。
问题: 正则表达式本身工作正常,但当单词组包含正则表达式语法字符(如 + ?)时,替换会中断。 / ( ETC。 所以我需要掩盖它们。我尝试了我能想到的所有变体,但它并不适用于所有情况。我无法用 $text 或 $s 掩盖它们。
<?php
$text = "<html><body><pre>
Replace all foo / bar / baz cases here:
Case 1: Text Foo text.
Case 2: Text 'Foo' Bar text Foo.
Case 3: Text Foobar (2) text.
Case 4: Text Bar & Baz.
Case 5: Text Bar Baz?
Case 6: Text Bar? & Baz?
Case 7: Text Bar-X.
Replace nothing here (text followed by brackets) or [inside square brackets]:
Case 1: Text Foo (text).
Case 2: Text 'Foo' Bar (text) Foo (text).
Case 3: Text Foobar (2) (text).
Case 4: Text Bar & Baz (text).
Case 5: Text Bar Baz (text).
Case 6: Text Bar? & Baz (text).
Case 7: Text Bar-X (text).
Case 8: [Text Foo]
</pre></body></html>";
$s = array(
array("t" => "Foo", "u" => "http://www.foo.net"),
array("t" => "'Foo' Bar", "u" => "http://www.foo.net"),
array("t" => "Foobar (2)", "u" => "http://www.foo.net"),
array("t" => "Bar & Baz", "u" => "http://www.foo.net"),
array("t" => "Bar Baz?", "u" => "http://www.foo.net"),
array("t" => "Bar? & Baz?", "u" => "http://www.foo.net"),
array("t" => "Bar-X", "u" => "http://www.foo.net")
);
$replaced = $text;
foreach ($s as $i => $row) {
# $replaced = preg_replace('/(?='.preg_quote($row["t"]).'[^\]][^(]+$)\b'.preg_quote($row["t"]).'\b/mS',
# $replaced = preg_replace('/(?='.preg_quote($row["t"], '/').'[^\]][^(]+$)\b'.preg_quote($row["t"], '/').'\b/mS',
# $replaced = preg_replace('/(?=\Q'.$row["t"].'\E[^\]][^(]+$)\b\Q'.$row["t"].'\E\b/mS',
$replaced = preg_replace('/(?='.$row["t"].'[^\]][^(])\b'.$row["t"].'\b/mS',
'<a href="'.$row["u"].'">'.$row["t"].'</a>',
$replaced);
}
echo $replaced;
?>
I want to replace word groups by links.
The word groups are defined in a multi-dimensional array. There will be thousands of terms to be replaced, so an unindexed, light-weight and multi-dimensional array is needed.
Nothing should be replaced when the term is followed by brackets or inside square brackets.
Problem:
The regex itself works fine, but the replacement breaks when the word groups include regex syntax characters like + ? / ( etc.
So I need to mask them. I tried all variations I can think of but it won't work for all cases. I can't mask them in $text or $s.
<?php
$text = "<html><body><pre>
Replace all foo / bar / baz cases here:
Case 1: Text Foo text.
Case 2: Text 'Foo' Bar text Foo.
Case 3: Text Foobar (2) text.
Case 4: Text Bar & Baz.
Case 5: Text Bar Baz?
Case 6: Text Bar? & Baz?
Case 7: Text Bar-X.
Replace nothing here (text followed by brackets) or [inside square brackets]:
Case 1: Text Foo (text).
Case 2: Text 'Foo' Bar (text) Foo (text).
Case 3: Text Foobar (2) (text).
Case 4: Text Bar & Baz (text).
Case 5: Text Bar Baz (text).
Case 6: Text Bar? & Baz (text).
Case 7: Text Bar-X (text).
Case 8: [Text Foo]
</pre></body></html>";
$s = array(
array("t" => "Foo", "u" => "http://www.foo.net"),
array("t" => "'Foo' Bar", "u" => "http://www.foo.net"),
array("t" => "Foobar (2)", "u" => "http://www.foo.net"),
array("t" => "Bar & Baz", "u" => "http://www.foo.net"),
array("t" => "Bar Baz?", "u" => "http://www.foo.net"),
array("t" => "Bar? & Baz?", "u" => "http://www.foo.net"),
array("t" => "Bar-X", "u" => "http://www.foo.net")
);
$replaced = $text;
foreach ($s as $i => $row) {
# $replaced = preg_replace('/(?='.preg_quote($row["t"]).'[^\]][^(]+$)\b'.preg_quote($row["t"]).'\b/mS',
# $replaced = preg_replace('/(?='.preg_quote($row["t"], '/').'[^\]][^(]+$)\b'.preg_quote($row["t"], '/').'\b/mS',
# $replaced = preg_replace('/(?=\Q'.$row["t"].'\E[^\]][^(]+$)\b\Q'.$row["t"].'\E\b/mS',
$replaced = preg_replace('/(?='.$row["t"].'[^\]][^(])\b'.$row["t"].'\b/mS',
'<a href="'.$row["u"].'">'.$row["t"].'</a>',
$replaced);
}
echo $replaced;
?>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不完全确定你想要做什么,但我看到“当单词组包含正则表达式语法字符时中断”,这让我认为你需要做的就是转义这些字符......即在它们前面放一个 \ 。
编辑:
我也被这个问题困扰了,但是如果向你展示我所拥有的,也许它可以帮助你:
Im not entirely sure what you are trying to do but i saw "breaks when the word groups include regex syntax characters" which makes me think that all you need to do is escape these characters... ie put a \ before them.
EDIT:
Im getting pretty stuck with this as well, but if if show you what I've got, maybe it could help you out:
这应该有效,至少在提供的测试用例中:
当您的匹配本身包含在某些边界内时,
\b
不会按预期工作(例如在Foobar(2)
中) ,因此您应该专门提供允许的字符列表。我快速将[.,\s!^]
和[.,\s!$]
放在那里,您可能需要根据您的规格添加更多允许的字符(如-
、_
?)This should work, at least at provided test cases:
\b
doesn't work as expected when your match itself is wrapped inside some boundaries (like inFoobar (2)
), so you should specifically provide a list of allowed characters. I quickly put[.,\s!^]
and[.,\s!$]
there, you probably will have to add some more allowed characters according to your specs (like-
,_
?)