preg_replace 数组:掩码正则表达式字符问题

发布于 2024-09-06 06:03:55 字数 1970 浏览 9 评论 0原文

我想用链接替换词组。

词组在多维数组中定义。将有数千个术语需要替换,因此需要一个无索引的、轻量级的多维数组。

当术语后面有括号或方括号内时,不应替换任何内容。

问题: 正则表达式本身工作正常,但当单词组包含正则表达式语法字符(如 + ?)时,替换会中断。 / ( ETC。 所以我需要掩盖它们。我尝试了我能想到的所有变体,但它并不适用于所有情况。我无法用 $text 或 $s 掩盖它们。

<?php

$text = "<html><body><pre>
Replace all foo / bar / baz cases here:
Case 1: Text Foo text.
Case 2: Text 'Foo' Bar text Foo.
Case 3: Text Foobar (2) text.
Case 4: Text Bar & Baz.
Case 5: Text Bar Baz?
Case 6: Text Bar? & Baz?
Case 7: Text Bar-X.

Replace nothing here (text followed by brackets) or [inside square brackets]: 
Case 1: Text Foo (text).
Case 2: Text 'Foo' Bar (text) Foo (text).
Case 3: Text Foobar (2) (text).
Case 4: Text Bar & Baz (text).
Case 5: Text Bar Baz (text).
Case 6: Text Bar? & Baz (text).
Case 7: Text Bar-X (text).
Case 8: [Text Foo]
</pre></body></html>";

$s = array(
  array("t" => "Foo",         "u" => "http://www.foo.net"),
  array("t" => "'Foo' Bar",   "u" => "http://www.foo.net"),
  array("t" => "Foobar (2)",  "u" => "http://www.foo.net"),
  array("t" => "Bar & Baz",   "u" => "http://www.foo.net"),
  array("t" => "Bar Baz?",    "u" => "http://www.foo.net"),
  array("t" => "Bar? & Baz?", "u" => "http://www.foo.net"),
  array("t" => "Bar-X",       "u" => "http://www.foo.net")
 );

$replaced = $text;
foreach ($s as $i => $row) {
# $replaced = preg_replace('/(?='.preg_quote($row["t"]).'[^\]][^(]+$)\b'.preg_quote($row["t"]).'\b/mS',
# $replaced = preg_replace('/(?='.preg_quote($row["t"], '/').'[^\]][^(]+$)\b'.preg_quote($row["t"], '/').'\b/mS',
# $replaced = preg_replace('/(?=\Q'.$row["t"].'\E[^\]][^(]+$)\b\Q'.$row["t"].'\E\b/mS',
    $replaced = preg_replace('/(?='.$row["t"].'[^\]][^(])\b'.$row["t"].'\b/mS',
                           '<a href="'.$row["u"].'">'.$row["t"].'</a>',
                           $replaced);
 }
echo $replaced;

?>

I want to replace word groups by links.

The word groups are defined in a multi-dimensional array. There will be thousands of terms to be replaced, so an unindexed, light-weight and multi-dimensional array is needed.

Nothing should be replaced when the term is followed by brackets or inside square brackets.

Problem:
The regex itself works fine, but the replacement breaks when the word groups include regex syntax characters like + ? / ( etc.
So I need to mask them. I tried all variations I can think of but it won't work for all cases. I can't mask them in $text or $s.

<?php

$text = "<html><body><pre>
Replace all foo / bar / baz cases here:
Case 1: Text Foo text.
Case 2: Text 'Foo' Bar text Foo.
Case 3: Text Foobar (2) text.
Case 4: Text Bar & Baz.
Case 5: Text Bar Baz?
Case 6: Text Bar? & Baz?
Case 7: Text Bar-X.

Replace nothing here (text followed by brackets) or [inside square brackets]: 
Case 1: Text Foo (text).
Case 2: Text 'Foo' Bar (text) Foo (text).
Case 3: Text Foobar (2) (text).
Case 4: Text Bar & Baz (text).
Case 5: Text Bar Baz (text).
Case 6: Text Bar? & Baz (text).
Case 7: Text Bar-X (text).
Case 8: [Text Foo]
</pre></body></html>";

$s = array(
  array("t" => "Foo",         "u" => "http://www.foo.net"),
  array("t" => "'Foo' Bar",   "u" => "http://www.foo.net"),
  array("t" => "Foobar (2)",  "u" => "http://www.foo.net"),
  array("t" => "Bar & Baz",   "u" => "http://www.foo.net"),
  array("t" => "Bar Baz?",    "u" => "http://www.foo.net"),
  array("t" => "Bar? & Baz?", "u" => "http://www.foo.net"),
  array("t" => "Bar-X",       "u" => "http://www.foo.net")
 );

$replaced = $text;
foreach ($s as $i => $row) {
# $replaced = preg_replace('/(?='.preg_quote($row["t"]).'[^\]][^(]+$)\b'.preg_quote($row["t"]).'\b/mS',
# $replaced = preg_replace('/(?='.preg_quote($row["t"], '/').'[^\]][^(]+$)\b'.preg_quote($row["t"], '/').'\b/mS',
# $replaced = preg_replace('/(?=\Q'.$row["t"].'\E[^\]][^(]+$)\b\Q'.$row["t"].'\E\b/mS',
    $replaced = preg_replace('/(?='.$row["t"].'[^\]][^(])\b'.$row["t"].'\b/mS',
                           '<a href="'.$row["u"].'">'.$row["t"].'</a>',
                           $replaced);
 }
echo $replaced;

?>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

我ぃ本無心為│何有愛 2024-09-13 06:03:56

我不完全确定你想要做什么,但我看到“当单词组包含正则表达式语法字符时中断”,这让我认为你需要做的就是转义这些字符......即在它们前面放一个 \ 。

编辑:

我也被这个问题困扰了,但是如果向你展示我所拥有的,也许它可以帮助你:

<?php

$text = "<html><body><pre>
Replace all foo / bar / baz cases here:
Case 1: Text Foo text.
Case 2: Text 'Foo' Bar text Foo.
Case 3: Text Foobar (2) text.
Case 4: Text Bar & Baz.
Case 5: Text Bar Baz?
Case 6: Text Bar? & Baz?
Case 7: Text Bar-X.

Replace nothing here (text followed by brackets) or [inside square brackets]: 
Case 1: Text Foo (text).
Case 2: Text 'Foo' Bar (text) Foo (text).
Case 3: Text Foobar (2) (text).
Case 4: Text Bar & Baz (text).
Case 5: Text Bar Baz (text).
Case 6: Text Bar? & Baz (text).
Case 7: Text Bar-X (text).
Case 8: [Text Foo]
</pre></body></html>";

function convertRegexChars($string)
{
    $converted = str_replace("?","?",$string);
    $converted = str_replace(".",".",$converted);
    $converted = str_replace("*","*",$converted);
    $converted = str_replace("+","+",$converted);
    return $converted;
}

$s = array(
  array("t" => "Foo",         "u" => "http://www.foo.net"),
  array("t" => "'Foo' Bar",   "u" => "http://www.foo.net"),
  array("t" => "Foobar (2)",  "u" => "http://www.foo.net"),
  array("t" => "Bar & Baz",   "u" => "http://www.foo.net"),
  array("t" => "Bar Baz?",    "u" => "http://www.foo.net"),
  array("t" => "Bar? & Baz?", "u" => "http://www.foo.net"),
  array("t" => "Bar-X",       "u" => "http://www.foo.net")
 );

$replaced = convertRegexChars($text);
foreach ($s as $i => $row) {
    $txt = convertRegexChars($row['t']);
    $replaced = preg_replace('/(?='.$txt.'[^\]][^(])\b'.$txt.'\b/mS',
                           '<a href="'.$row["u"].'">'.$txt.'</a>',
                           $replaced);
 }
echo $replaced;

?>

Im not entirely sure what you are trying to do but i saw "breaks when the word groups include regex syntax characters" which makes me think that all you need to do is escape these characters... ie put a \ before them.

EDIT:

Im getting pretty stuck with this as well, but if if show you what I've got, maybe it could help you out:

<?php

$text = "<html><body><pre>
Replace all foo / bar / baz cases here:
Case 1: Text Foo text.
Case 2: Text 'Foo' Bar text Foo.
Case 3: Text Foobar (2) text.
Case 4: Text Bar & Baz.
Case 5: Text Bar Baz?
Case 6: Text Bar? & Baz?
Case 7: Text Bar-X.

Replace nothing here (text followed by brackets) or [inside square brackets]: 
Case 1: Text Foo (text).
Case 2: Text 'Foo' Bar (text) Foo (text).
Case 3: Text Foobar (2) (text).
Case 4: Text Bar & Baz (text).
Case 5: Text Bar Baz (text).
Case 6: Text Bar? & Baz (text).
Case 7: Text Bar-X (text).
Case 8: [Text Foo]
</pre></body></html>";

function convertRegexChars($string)
{
    $converted = str_replace("?","?",$string);
    $converted = str_replace(".",".",$converted);
    $converted = str_replace("*","*",$converted);
    $converted = str_replace("+","+",$converted);
    return $converted;
}

$s = array(
  array("t" => "Foo",         "u" => "http://www.foo.net"),
  array("t" => "'Foo' Bar",   "u" => "http://www.foo.net"),
  array("t" => "Foobar (2)",  "u" => "http://www.foo.net"),
  array("t" => "Bar & Baz",   "u" => "http://www.foo.net"),
  array("t" => "Bar Baz?",    "u" => "http://www.foo.net"),
  array("t" => "Bar? & Baz?", "u" => "http://www.foo.net"),
  array("t" => "Bar-X",       "u" => "http://www.foo.net")
 );

$replaced = convertRegexChars($text);
foreach ($s as $i => $row) {
    $txt = convertRegexChars($row['t']);
    $replaced = preg_replace('/(?='.$txt.'[^\]][^(])\b'.$txt.'\b/mS',
                           '<a href="'.$row["u"].'">'.$txt.'</a>',
                           $replaced);
 }
echo $replaced;

?>
铜锣湾横着走 2024-09-13 06:03:55

这应该有效,至少在提供的测试用例中:

$replaced = preg_replace('/([.,\s!^]+)('.preg_quote($row["t"],'/').')([.,\s!$]+)(?!\()/mS',
                           '$1<a href="'.$row["u"].'">$2</a>$3',
                           $replaced);

当您的匹配本身包含在某些边界内时,\b不会按预期工作(例如在Foobar(2)中) ,因此您应该专门提供允许的字符列表。我快速将 [.,\s!^][.,\s!$] 放在那里,您可能需要根据您的规格添加更多允许的字符(如 -_?)

This should work, at least at provided test cases:

$replaced = preg_replace('/([.,\s!^]+)('.preg_quote($row["t"],'/').')([.,\s!$]+)(?!\()/mS',
                           '$1<a href="'.$row["u"].'">$2</a>$3',
                           $replaced);

\b doesn't work as expected when your match itself is wrapped inside some boundaries (like in Foobar (2)), so you should specifically provide a list of allowed characters. I quickly put [.,\s!^] and [.,\s!$] there, you probably will have to add some more allowed characters according to your specs (like -, _?)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文