未检测到正则表达式 preg_quote 符号

发布于 2024-11-08 19:03:11 字数 1944 浏览 4 评论 0原文

我在数据库中有一本脏话字典,下面的内容很好

preg_match_all("/\b".$f."(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);

$t 是输入文本,简单地说, $f = preg_quote("punk"); "punk" 来自数据库字典,因此此时循环中的表达式如下

preg_match_all("/\bpunk(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);

preg_quote 替换符号,例如。 #\\# 以便表达式被转义,但是当字典检查时,例如。 "F@CK""A$$" 在使用上述表达式的输入字符串中未检测到这些符号,我有 a$$ code> 和 f@ck 在字典中,但它们不起作用。如果我删除单词上的 preg_quote(),则正则表达式无效,因为这些符号未转义。

关于如何检测 "a$$" 有什么建议吗???

编辑:

所以我猜想未按预期工作的表达式是例如。

preg_match_all("/\bf\@ck(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);

应该在 $t 中找到 f@ck

更新:

这是我的用法,简单地说;如果 $m 中有匹配项,请将其替换为 "\*\*\*\*",整个块位于字典中每个单词的循环内,< code>$f 是字典单词,$t 是输入

$f = preg_quote($f);
preg_match_all("/\b$f(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
if (count($m) > 0) {
     $t = preg_replace("/(\b$f(?:ing|er|es|s)?\b)/si","\*\*\*\*\*",$t);
}

UPDATE: 看哪,var_dump

preg_quote($f) = string(5) "a\$\$"
$t = string(18) "You're such an a$$"
expression = string(29) "/\ba\$\$(?:ing|er|es|s)?\b/si"

更新: 仅当单词以符号结尾时才会发生这种情况。我测试了 "a$$hole" 没问题,但 "a$$" 不起作用。

另一个更新: 尝试这个简化版本,$words是一个临时字典,

$words = array("a$$","asshole","a$$hole","f@ck","f#ck","f*ck");
$text = "Input whatever you feel like here eg. a$$";

foreach ($words as $f) {
   $f = preg_quote($f,"/");
   $text = preg_replace("/\b".$f."(?:ing|er|es|s)?\b/si",
                         str_repeat("*",strlen($f)),
                        $t);
}

我应该期望看到“在这里输入你想要的任何内容,例如\*\*\*”作为结果。

I have a dictionary of swear words in the database, and the following works great

preg_match_all("/\b".$f."(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);

$t is the input text and simply, $f = preg_quote("punk"); "punk" is from the database dictionary, so at this point in the loop the expression is as follows

preg_match_all("/\bpunk(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);

preg_quote replaces symbols eg. # with \\# so that the expression is escaped, but when the dictionary is checking eg. "F@CK" or "A$$" these symbols are not detected in the input string with the above expression, I have both a$$ and f@ck in the dictionary, but they do not work. If I remove preg_quote() on the word, the regular expression is invalid as these symbols are not escaped.

Any suggestions on how I can detect "a$$" ???

Edit:

So I guess the expression that is not working as intended would be eg.

preg_match_all("/\bf\@ck(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);

Which should find f@ck in $t

UPDATE:

This is my usage, simply put; if there are matches in $m replace them with "\*\*\*\*", this whole block is inside a loop through each word in the dictionary, $f is the dictionary word and $t is the input

$f = preg_quote($f);
preg_match_all("/\b$f(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
if (count($m) > 0) {
     $t = preg_replace("/(\b$f(?:ing|er|es|s)?\b)/si","\*\*\*\*\*",$t);
}

UPDATE:
Behold, the var_dump:

preg_quote($f) = string(5) "a\$\$"
$t = string(18) "You're such an a$"
expression = string(29) "/\ba\$\$(?:ing|er|es|s)?\b/si"

UPDATE:
This is only happening when words end with a symbol. I tested "a$$hole" and it’s fine, but "a$$" doesn't work.

ANOTHER UPDATE:
Try this simplified version, $words being a make-shift dictionary

$words = array("a$","asshole","a$hole","f@ck","f#ck","f*ck");
$text = "Input whatever you feel like here eg. a$";

foreach ($words as $f) {
   $f = preg_quote($f,"/");
   $text = preg_replace("/\b".$f."(?:ing|er|es|s)?\b/si",
                         str_repeat("*",strlen($f)),
                        $t);
}

I should expect to see "Input whatever you feel like here eg. \*\*\*" as a result.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

风铃鹿 2024-11-15 19:03:11

无法完成

抱歉,这个“问题”确实无法解决。考虑这些:

  • ꜰᴜᴄᴋ   is U+A730.1D1C.1D04.1D0B, "\N{拉丁字母小写字母 F}\N{拉丁字母小写字母 U}\N{拉丁字母小写字母 C}\N{拉丁文字母小写 K}"
  • ᶠᵘᶜᵏ   是 U+1DA0.1D58.1D9C.1D4F, "\N{修饰符小写字母 F}\N{修饰符小写字母 U}\N{修饰符小写字母 C}\N{修饰符小写字母 K}"

Cannot Be Done

I'm sorry, but this “problem” is truly impossible to solve. Consider these:

  • ꜰᴜᴄᴋ   is U+A730.1D1C.1D04.1D0B, "\N{LATIN LETTER SMALL CAPITAL F}\N{LATIN LETTER SMALL CAPITAL U}\N{LATIN LETTER SMALL CAPITAL C}\N{LATIN LETTER SMALL CAPITAL K}"
  • ᶠᵘᶜᵏ   is U+1DA0.1D58.1D9C.1D4F, "\N{MODIFIER LETTER SMALL F}\N{MODIFIER LETTER SMALL U}\N{MODIFIER LETTER SMALL C}\N{MODIFIER LETTER SMALL K}"
  • ????????????????   is U+1D4BB.1D4CA.1D4B8.1D4C0, "\N{MATHEMATICAL SCRIPT SMALL F}\N{MATHEMATICAL SCRIPT SMALL U}\N{MATHEMATICAL SCRIPT SMALL C}\N{MATHEMATICAL SCRIPT SMALL K}"
  • ????????????????   is U+1D58B.1D59A.1D588.1D590, "\N{MATHEMATICAL BOLD FRAKTUR SMALL F}\N{MATHEMATICAL BOLD FRAKTUR SMALL U}\N{MATHEMATICAL BOLD FRAKTUR SMALL C}\N{MATHEMATICAL BOLD FRAKTUR SMALL K}"
  • ???? ???? ???? ????   is U+1D4D5.1D4B0.1D49E.1D4A6, "\N{MATHEMATICAL BOLD SCRIPT CAPITAL F}\N{MATHEMATICAL SCRIPT CAPITAL U}\N{MATHEMATICAL SCRIPT CAPITAL C}\N{MATHEMATICAL SCRIPT CAPITAL K}"
  • ⓕ ⓤ ⓒ ⓚ   is U+24D5.24E4.24D2.24DA, "\N{CIRCLED LATIN SMALL LETTER F}\N{CIRCLED LATIN SMALL LETTER U}\N{CIRCLED LATIN SMALL LETTER C}\N{CIRCLED LATIN SMALL LETTER K}"
  • Γ̵????ᏟᏦ   is U+393.335.10335.13DF.13E6, "\N{GREEK CAPITAL LETTER GAMMA}\N{COMBINING SHORT STROKE OVERLAY}\N{GOTHIC LETTER QAIRTHRA}\N{CHEROKEE LETTER TLI}\N{CHEROKEE LETTER TSO}"
  • ƒμɕѤ   is U+192.3BC.255.464, "\N{LATIN SMALL LETTER F WITH HOOK}\N{GREEK SMALL LETTER MU}\N{LATIN SMALL LETTER C WITH CURL}\N{CYRILLIC CAPITAL LETTER IOTIFIED E}"
  • Г̵ЦСК   is U+413.335.426.421.41A, "\N{CYRILLIC CAPITAL LETTER GHE}\N{COMBINING SHORT STROKE OVERLAY}\N{CYRILLIC CAPITAL LETTER TSE}\N{CYRILLIC CAPITAL LETTER ES}\N{CYRILLIC CAPITAL LETTER KA}"
  • ғᵾȼƙ   is U+493.1D7E.23C.199, "\N{CYRILLIC SMALL LETTER GHE WITH STROKE}\N{LATIN SMALL CAPITAL LETTER U WITH STROKE}\N{LATIN SMALL LETTER C WITH STROKE}\N{LATIN SMALL LETTER K WITH HOOK}"
  • ϜυϚΚ   is U+3DC.3C5.3DA.39A, "\N{GREEK LETTER DIGAMMA}\N{GREEK SMALL LETTER UPSILON}\N{GREEK LETTER STIGMA}\N{GREEK CAPITAL LETTER KAPPA}"
  • ЖↃUᆿ   is U+416.2183.55.11BF, "\N{CYRILLIC CAPITAL LETTER ZHE}\N{ROMAN NUMERAL REVERSED ONE HUNDRED}\N{LATIN CAPITAL LETTER U}\N{HANGUL JONGSEONG KHIEUKH}"
  • ʞɔnɟ   is U+29E.254.6E.25F, "\N{LATIN SMALL LETTER TURNED K}\N{LATIN SMALL LETTER OPEN O}\N{LATIN SMALL LETTER N}\N{LATIN SMALL LETTER DOTLESS J WITH STROKE}"

It Gets Worse

And if you think those are easy, just try coping with all of these:

???? 00 Ↄ ʞ, F ᵾ ⒞ K, K ⓒ Ц ⒡ , ???? ???? K ????, ғ ∞ Ϛ k, f ???? Ꮯ K, ⓕ oo ɔ ⓚ , ɟ ⒰ ¢ K, ???? ???? ȼ ????, ???? Ù ȼ ⒦ ,
f ???? ⒞ ƙ, F ???? ᶜ ????, F ∞ ???? Ж , ???? @ Ꮯ ????, ɟ ᵘ ???? ????, F Ц ¢ ????, f oo Ꮯ ʞ, ???? oo ¢ Ж , ???? υ ᶜ Κ , Ϝ ú * ʞ,
ꜰ ???? c K, ƒ ᵘ ȼ k, ???? U ȼ ????, Ж ɔ μ ƒ, F ⓤ ⒞ k, ƒ ???? C ƙ, ғ 00 ɔ Ѥ, ƒ U c ᴋ, ???? ∞ Ꮶ ⓒ , ꜰ ???? ᴄ ⒦ ,
???? ⒰ Ꮯ Ѥ, ꜰ ᴜ ???? ⒦ , F ???? ???? ʞ, f 00 ???? ????, ғ u С K, f ???? ɔ Κ , f μ Ↄ K, ɟ ???? c ʞ, f ???? Ↄ ????, F μ ¢ ????,
ᆿ ???? ᴄ ⒦ , Κ ¢ oo ɟ, ᶠ μ ᶜ Ѥ, ᶠ ⓤ Ꮯ Ж , ???? ⒞ ᵘ F, F @ C ⓚ , Ѥ ᴄ u F, ⒡ ᵾ C k, ƒ μ ᶜ ᴋ, F ???? C ????,
f ᵘ ¢ ᵏ, ᆿ 00 ???? ????, ꜰ υ ȼ K, Ϝ ???? ȼ К , ???? oo ɕ ᴋ, ғ ???? Ꮯ ᴋ, ꜰ n ???? K, ꜰ μ Ϛ К , F ∞ ȼ ????, ⒡ ???? Ↄ Κ ,
ƒ ???? ⒞ ????, ᶠ U C Ꮶ, ᶠ υ Ↄ ƙ, ???? ???? C ????, Ϝ U ???? Ѥ, Ϝ U Ↄ ????, ???? U ⒞ ᵏ, F @ C К , ғ ᴜ ???? ᴋ, ⒡ U ???? К ,
ɟ U * ᵏ, ???? Ц c Κ , ғ U Ↄ ????, ƒ ⒰ ???? ᵏ, ғ ???? * K, ???? n ???? ⓚ , ᶠ 00 С К , ???? Ц ???? k, ƙ c Ц ᶠ, ???? ⒰ Ѥ ????,
ꜰ ǔ ᴄ ⒦ , F ???? Ↄ ????, ???? ???? υ ꜰ, ???? ???? * ᵏ, ???? 00 ???? Ж , Κ C ???? ????, ᶠ U С K, ꜰ ???? ???? Κ , ɟ U ᶜ ⓚ , ???? ∞ ȼ ᴋ,
ƒ U К ć, ƒ υ ȼ ᴋ, ⒡ ∞ Ж ɕ, ???? ᵘ ???? ᵏ, F U Ϛ ʞ, ⓕ ???? ???? Ж , ???? ???? ???? Ↄ, Ϝ n * K, ???? oo c ⓚ , ƒ U ¢ ʞ,
ƒ u C ʞ, K ¢ μ ⒡ , ɟ ⒰ K ɔ, F U c k, F Ц ???? ⓚ , ???? U ᴋ ɔ, ???? ???? Ꮯ ????, ???? ???? ???? ⓚ , ⓕ ???? C К , ɟ ᵾ * ⒦ ,
ᶠ ᵘ ⒞ ⒦ , ƒ ⒰ ᴄ ᵏ, ⒡ ⒰ С K, ???? ⒰ * ᴋ, ᆿ ∞ ʞ ɕ, ???? n * Ѥ, Ϝ μ ᴄ ????, k ć ᵘ ƒ, ???? ᵘ ɕ ????, ɟ Ц Ꮶ ᴄ, ???? ᵾ ⒞ ᵏ,
ғ ᵘ ???? ᵏ, ???? ᵾ * Ѥ, F ???? Ꮯ K, ғ ⓤ ???? ᴋ, ƒ u ɕ ????, ƙ c ⒰ F, ???? ???? ⓒ Κ , K ᶜ Ц ????, ɟ ???? c ⒦ , ƒ @ c Κ ,
Ϝ Ц ȼ Ḱ, ⒡ ᵘ ???? ⒦ , ɟ ᵾ Ѥ ¢, F ???? Ↄ ????, Ϝ ᴜ ???? ????, Ϝ ???? ⒞ ????, ???? U Ꮯ ʞ, ƒ υ Ꮯ ᵏ, F ᵾ Ꮯ Κ , Ϝ ᵘ ⓒ ʞ,
???? ⓤ ᶜ ƙ, ᆿ ???? ⒞ ????, f ???? Ↄ Ѥ, ???? U ???? K, Ϝ ᴜ * ????, ꜰ @ ⓒ ʞ, ƒ u ⓒ ????, f U ⒞ k, ???? 00 ᴄ Ѥ, ???? υ С K,
F ᴜ ᴄ ????, ⓕ oo Ↄ ⓚ , ⒡ ᵘ ɕ ????, ⓕ υ ᴄ Κ , ᆿ U Ꮯ ????, ???? ???? Ꮯ Ꮶ, ???? ???? Ć ????, ???? Ц ɕ К , f @ Ↄ ⓚ , ᴋ ᶜ U ꜰ,
???? ᴜ c ⒦ , F ᵘ C ????, ???? 00 ???? Ꮶ, ꜰ 00 ???? К , Ϝ ???? Ϛ ᵏ, F ???? c Ѥ, ⓕ oo Ↄ K, f ᵾ С ᵏ, ⓕ Ц c ????, ???? ???? c Ж ,
ⓕ ???? ???? ƙ, ⓚ C n ғ, ɟ U ȼ ????, ???? 00 K ȼ, ???? ???? ᴄ ????, ???? Ц C ????, ???? Ц ¢ ????, Ϝ ᵘ c k, ⒡ ???? ¢ k, ƒ ⓤ ⓚ Ↄ, ???? ???? ???? k,
ƒ U Ↄ K, ???? ???? ᴄ Ꮶ, ᆿ ⓤ ???? ⒦ , Ж ɔ U ????, ƒ υ * ᴋ, ƒ ???? ???? k, ???? U С ⒦ , ???? ???? C Ж , ƒ μ Ꮯ ƙ, ⓕ n ᴄ ⒦ ,
ⓕ μ ⓒ Ж , ⒡ 00 ɕ ????, ???? ᴜ ᶜ ????, ᆿ Ù Ж ????, ⒦ ȼ U ????, k C ⓤ ᆿ, Ϝ n ȼ ᵏ, ᴋ ȼ ᵾ ɟ, F ???? ȼ Ѥ, ғ ⒰ ȼ ????,
f U Ж ⒞ , F ῠ ???? ᵏ, F u ???? Κ , F 00 ȼ ????, ꜰ μ Ϛ Ꮶ, ᆿ ???? ???? K, ⒡ n Ↄ Ж , F @ ???? ƙ, ᶠ ὺ ???? К ,
???? U C ᵏ, F U ???? ⒦ , ???? 00 Ↄ ????, ᶠ ???? c К , ғ ⓤ ???? ????, ???? ⓤ ???? Κ , ???? U ???? Ж , ⒡ ???? ɔ Ꮶ, ⓚ ɔ ???? f, ???? U C K,
F @ C Ѥ, ғ ᴜ С k, ɟ u * ƙ, ⓕ ᵾ ɕ ????, ???? 00 ȼ K, ???? υ ???? ????, ƒ ⒰ * ʞ, ⓕ U Ↄ Ж , ꜰ U ȼ ƙ, ⒡ u С ⒦ ,
ꜰ ᴜ ???? Ќ, ᆿ μ ???? ⒦ , ⓕ @ ᴄ К , ᶠ υ ɔ ᵏ, ƙ Ↄ oo ꜰ, F ᴜ ???? ????, ???? ⒰ C ᵏ, ???? U ???? ƙ, ƒ ∞ C Ꮶ, ???? ⒰ * K,
???? u Ↄ ᴋ, ᆿ U ⓒ ????, ᆿ U Ꮶ ????, ???? n ???? ????, ƒ Ц C ƙ, ⒦ ???? ???? ꜰ, K ¢ ᵘ f, ???? ⒰ ???? Ꮶ, ???? ᴄ 00 ????, Ϝ U ???? k,
???? u ¢ ⒦ , ???? ???? * Ѥ, ƒ ???? С ᴋ, ???? ???? C Ꮶ, ???? @ ???? Κ , ʞ С ???? ᶠ, ???? ᵾ Ϛ Ꮶ, ᶠ ⒰ ɔ ????, F Ц ⒞ ʞ, ⒡ ⒰ К ɔ,
ɟ υ ¢ ????, Ѥ ȼ U ᆿ, ???? ᴜ Ↄ ʞ, ғ ???? * K, ???? ???? ᴄ ʞ, F ???? ???? ʞ, ???? @ ȼ ????, ???? ⒰ * ????, ???? ᵾ ȼ ????, F ???? ¢ Ѥ,
ꜰ ⓤ ƙ Ϛ, ⓕ 00 c ʞ, ???? 00 Ϛ K, ???? υ Ↄ Κ , ꜰ μ ⓒ Ж , ???? ᵘ Ϛ ʞ, Ϝ ᵘ Ↄ ᵏ, ⒡ ᵾ Ꮯ ????, Ϝ ⒰ ȼ Ѥ, ƒ n ???? Ѥ,
ᆿ μ ⓒ k, ???? Ц ɕ Κ , ғ μ ???? Ѥ, f ⓤ Ꮯ ????, ᵏ ???? μ ƒ, ᵏ С ???? ????, ᆿ ∞ ???? ????, ғ ᵘ Ꮯ ????, ƒ μ Ↄ k, f oo K ȼ,
ɟ ???? ???? С , ꜰ n ???? K, ???? 00 ???? ᵏ, ᶠ μ ⓒ ????, ???? c ∞ Ϝ, ᆿ Ц Ć ⒦ , ???? ᵘ ᴄ ????, F 00 ???? ⓚ , ᶠ @ ȼ К , ...

And that’s not all: there are at least a bazingatillion more where those came from. Do you see now why this fundamentally cannot be done?

Full Disclosure

Because I don't believe in security through obscurity, here's the program that generates all those:

#!/usr/bin/env perl
#
# unifuck - print infinite permutations of fuck in unicode aliases
#
# Tom Christiansen <[email protected]>
# Mon May 23 09:37:27 MDT 2011

use strict;
use warnings;
use charnames ":full";

use Unicode::Normalize;

binmode(STDOUT, ":utf8");

our(@diddle, @fuck, %fuck); # initted down below
while (my($f,$u,$c,$k) = splice(@fuck, 0, 4)) {
    $fuck{F}{$f}++;
    $fuck{U}{$u}++;
    $fuck{C}{$c}++;
    $fuck{K}{$k}++;
} 

my @F = keys %{ $fuck{F} };
my @U = keys %{ $fuck{U} };
my @C = keys %{ $fuck{C} };
my @K = keys %{ $fuck{K} };

while (1) { 
    my $f = $F[rand @F];
    my $u = $U[rand @U];
    my $c = $C[rand @C];
    my $k = $K[rand @K];

    for ($f,$u,$c,$k) {  
        next if length > 1;
        next if /\p{EA=W}/;
        next if /\pM/;
        next if /\p{InEnclosedAlphanumerics}/;
        s/$/$diddle[rand @diddle]/          if rand(100) < 15;
        s/$/\N{COMBINING ENCLOSING KEYCAP}/ if rand(100) <  1;
    }

    if    (             0) {                                       }
    elsif (rand(100) <  5) {     $u        = q(@)                  } 
    elsif (rand(100) <  5) {        $c     = q(*)                  } 
    elsif (rand(100) < 10) {       ($c,$k) = ($k,$c)               } 
    elsif (rand(100) < 15) { ($f,$u,$c,$k) = reverse ($f,$u,$c,$k) }

    print NFC("$f $u $c $k\n");
}

BEGIN {

    # ok to have repeats in each position, since they'll be counted only once
    # per unique strings
    @fuck = (

        "\N{LATIN CAPITAL LETTER F}",
        "\N{LATIN CAPITAL LETTER U}",
        "\N{LATIN CAPITAL LETTER C}",
        "\N{LATIN CAPITAL LETTER K}",

        "\N{LATIN SMALL LETTER F}",
        "\N{LATIN SMALL LETTER U}",
        "\N{LATIN SMALL LETTER C}",
        "\N{LATIN SMALL LETTER K}",

        "\N{LATIN SMALL LETTER F}",
        "\N{INFINITY}",
        "\N{LATIN SMALL LETTER C}",
        "\N{LATIN SMALL LETTER K}",

        "\N{LATIN SMALL LETTER F}",
        "\N{LATIN SMALL LETTER O}\N{LATIN SMALL LETTER O}",
        "\N{LATIN SMALL LETTER C}",
        "\N{KELVIN SIGN}",

        "\N{LATIN SMALL LETTER F}",
        "\N{DIGIT ZERO}\N{DIGIT ZERO}",
        "\N{CENT SIGN}",
        "\N{LATIN CAPITAL LETTER K}",

        "\N{LATIN LETTER SMALL CAPITAL F}",
        "\N{LATIN LETTER SMALL CAPITAL U}",
        "\N{LATIN LETTER SMALL CAPITAL C}",
        "\N{LATIN LETTER SMALL CAPITAL K}",

        "\N{MODIFIER LETTER SMALL F}",
        "\N{MODIFIER LETTER SMALL U}",
        "\N{MODIFIER LETTER SMALL C}",
        "\N{MODIFIER LETTER SMALL K}",

        "\N{MATHEMATICAL SCRIPT SMALL F}",
        "\N{MATHEMATICAL SCRIPT SMALL U}",
        "\N{MATHEMATICAL SCRIPT SMALL C}",
        "\N{MATHEMATICAL SCRIPT SMALL K}",

        "\N{MATHEMATICAL BOLD FRAKTUR CAPITAL F}",
        "\N{MATHEMATICAL BOLD FRAKTUR CAPITAL U}",
        "\N{MATHEMATICAL BOLD FRAKTUR CAPITAL C}",
        "\N{MATHEMATICAL BOLD FRAKTUR CAPITAL K}",

        "\N{MATHEMATICAL BOLD FRAKTUR SMALL F}",
        "\N{MATHEMATICAL BOLD FRAKTUR SMALL U}",
        "\N{MATHEMATICAL BOLD FRAKTUR SMALL C}",
        "\N{MATHEMATICAL BOLD FRAKTUR SMALL K}",

        "\N{MATHEMATICAL BOLD SCRIPT CAPITAL F}",
        "\N{MATHEMATICAL SCRIPT CAPITAL U}",
        "\N{MATHEMATICAL SCRIPT CAPITAL C}",
        "\N{MATHEMATICAL SCRIPT CAPITAL K}",

        "\N{CIRCLED LATIN SMALL LETTER F}",
        "\N{CIRCLED LATIN SMALL LETTER U}",
        "\N{CIRCLED LATIN SMALL LETTER C}",
        "\N{CIRCLED LATIN SMALL LETTER K}",

        "\N{PARENTHESIZED LATIN SMALL LETTER F}",
        "\N{PARENTHESIZED LATIN SMALL LETTER U}",
        "\N{PARENTHESIZED LATIN SMALL LETTER C}",
        "\N{PARENTHESIZED LATIN SMALL LETTER K}",

        "\N{GREEK CAPITAL LETTER GAMMA}\N{COMBINING SHORT STROKE OVERLAY}",
        "\N{GOTHIC LETTER QAIRTHRA}",
        "\N{CHEROKEE LETTER TLI}",
        "\N{CHEROKEE LETTER TSO}",

        "\N{LATIN SMALL LETTER F WITH HOOK}",
        "\N{GREEK SMALL LETTER MU}",
        "\N{LATIN SMALL LETTER C WITH CURL}",
        "\N{CYRILLIC CAPITAL LETTER IOTIFIED E}",

        "\N{CYRILLIC CAPITAL LETTER GHE}\N{COMBINING SHORT STROKE OVERLAY}",
        "\N{CYRILLIC CAPITAL LETTER TSE}",
        "\N{CYRILLIC CAPITAL LETTER ES}",
        "\N{CYRILLIC CAPITAL LETTER KA}",

        "\N{CYRILLIC SMALL LETTER GHE WITH STROKE}",
        "\N{LATIN SMALL CAPITAL LETTER U WITH STROKE}",
        "\N{LATIN SMALL LETTER C WITH STROKE}",
        "\N{LATIN SMALL LETTER K WITH HOOK}",

        "\N{GREEK LETTER DIGAMMA}",
        "\N{GREEK SMALL LETTER UPSILON}",
        "\N{GREEK LETTER STIGMA}",
        "\N{GREEK CAPITAL LETTER KAPPA}",

        "\N{HANGUL JONGSEONG KHIEUKH}",
        "\N{LATIN CAPITAL LETTER U}",
        "\N{ROMAN NUMERAL REVERSED ONE HUNDRED}",
        "\N{CYRILLIC CAPITAL LETTER ZHE}",

        "\N{LATIN SMALL LETTER DOTLESS J WITH STROKE}",
        "\N{LATIN SMALL LETTER N}",
        "\N{LATIN SMALL LETTER OPEN O}",
        "\N{LATIN SMALL LETTER TURNED K}",

        "\N{FULLWIDTH LATIN CAPITAL LETTER F}",
        "\N{FULLWIDTH LATIN CAPITAL LETTER U}",
        "\N{FULLWIDTH LATIN CAPITAL LETTER C}",
        "\N{FULLWIDTH LATIN CAPITAL LETTER K}",

    );

    @diddle = (
        "\N{COMBINING GRAVE ACCENT}",
        "\N{COMBINING ACUTE ACCENT}",
        "\N{COMBINING CIRCUMFLEX ACCENT}",
        "\N{COMBINING TILDE}",
        "\N{COMBINING BREVE}",
        "\N{COMBINING DOT ABOVE}",
        "\N{COMBINING DIAERESIS}",
        "\N{COMBINING CARON}",
        "\N{COMBINING CANDRABINDU}",
        "\N{COMBINING INVERTED BREVE}",
        "\N{COMBINING GRAVE TONE MARK}",
        "\N{COMBINING ACUTE TONE MARK}",
        "\N{COMBINING GREEK PERISPOMENI}",
        "\N{COMBINING FERMATA}",
        "\N{COMBINING SUSPENSION MARK}",
    );

}
谁人与我共长歌 2024-11-15 19:03:11

\b 检查字边界。根据 http://www.regular-expressions.info/wordboundaries.html

共有三种不同的位置可以作为单词边界:

  • 如果第一个字符是单词字符,则在字符串中的第一个字符之前。
  • 在字符串中的最后一个字符之后,如果最后一个字符是单词字符。
  • 字符串中的两个字符之间,其中一个是单词字符,另一个不是单词字符。

“单词字符”是字母、数字和下划线,因此在字符串“a$$”中,单词边界出现在“a”之后,而不是第二个“$”之后。

您可能需要使用类(例如,[- '"])显式指定您认为是“单词边界”的字符。

\b checks for a word boundary. According to http://www.regular-expressions.info/wordboundaries.html:

There are three different positions that qualify as word boundaries:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

"Word characters" are letters, digits, and underscores, so in the string "a$$", the word boundary occurs after the "a", not after the second "$".

You will probably need to explicitly specify the characters you consider to be "word boundaries" by using a class (e.g., [- '"]).

风情万种。 2024-11-15 19:03:11

现在,当你说它在单词末尾不起作用时,我看到了问题。 $@ 或任何其他此类特殊字符不是单词的一部分(因此 \b 在 'a$$' 的情况下会中断 'a' 之后的单词,如果输入字符串中后面没有任何其他字母)。我建议使用 [^az] 标记单词结尾来修复它。

preg_match_all("/\b".$f."(?:ing|er|es|s)?[^a-z]/si",$t,$m,PREG_SET_ORDER);

Now, when you said that it doesn't work at the end of the word I see the problem. $@ or any other such special characters aren't part of the word (so \b breaks the word after 'a' in case of 'a$$' if it isn't followed by any other letters in the input string). I suggest using [^a-z] to mark the end of the word to fix it.

preg_match_all("/\b".$f."(?:ing|er|es|s)?[^a-z]/si",$t,$m,PREG_SET_ORDER);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文