修复 php 标记化模式

发布于 2024-11-28 07:04:12 字数 419 浏览 0 评论 0原文

我有一个 Perl 脚本,它标记一个字符串

@s=split /([^a-zA-Z \t\-\'\,\.]+)/, $_[0];  # tokenized with separators

,因此如果我有一个字符串 $s="The large [[bear]] isangered."

它将返回 array("The large", " [[", "bear", "]] ", "isanger", ".")

但是当我在 php 脚本上使用正则表达式模式时,它似乎不起作用。

$tokens = preg_split("/[^a-z \t\-\'\,\.]+/i", $s);

有人对这个问题有任何想法吗?

i have a Perl script which tokenize a string

@s=split /([^a-zA-Z \t\-\'\,\.]+)/, $_[0];  # tokenized with separators

so if i have a string $s="The large [[bear]] is dangerous."

it will return array("The large", " [[", "bear", "]] ", "is dangerous", ".")

But the regex pattern doesn't seem to work when i used it on a php script.

$tokens = preg_split("/[^a-z \t\-\'\,\.]+/i", $s);

does anyone have any idea about the problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一身仙ぐ女味 2024-12-05 07:04:12

让它运行(演示):

$s="The large [[bear]] is dangerous.";
$pattern = '/([^a-zA-Z \t\-\\\'\,\.]+)/';    
$tokens = preg_split($pattern, $s, NULL, PREG_SPLIT_DELIM_CAPTURE);    
print_r($tokens);

输出:

Array
(
    [0] => The large
    [1] => [[
    [2] => bear
    [3] => ]]
    [4] =>  is dangerous.
)

详细信息:

  • 模式必须正确表达为字符串在 PHP 中,请注意字符串转义序列。在单引号字符串中,\' 写为 \\\'
  • 如果您想要包含分隔符进行拆分,则需要使用 PREG_SPLIT_DELIM_CAPTURE 标志。

请参阅 单引号字符串 文档preg_split文档

编辑: 要在标点符号处拆分,只需删除它们来自模式(演示):

$pattern = '/([^a-zA-Z \t]+)/';

Got it to run (Demo):

$s="The large [[bear]] is dangerous.";
$pattern = '/([^a-zA-Z \t\-\\\'\,\.]+)/';    
$tokens = preg_split($pattern, $s, NULL, PREG_SPLIT_DELIM_CAPTURE);    
print_r($tokens);

Output:

Array
(
    [0] => The large
    [1] => [[
    [2] => bear
    [3] => ]]
    [4] =>  is dangerous.
)

Details:

  • The pattern must be properly expressed as a string in PHP, mind the string escape sequences. In a single quoted string \' is written as \\\'.
  • You want to split including the delimiters, you need to use the PREG_SPLIT_DELIM_CAPTURE flag.

See Single quoted StringsDocs and preg_splitDocs

Edit: To split at the punctuation, just remove them from the pattern (Demo):

$pattern = '/([^a-zA-Z \t]+)/';
请帮我爱他 2024-12-05 07:04:12

这:

\'

当您的字符串由双引号分隔时,这是不正确的。它将是一个实际的反斜杠,后跟一个引号字符。

还有这些:

\-\'\,\.

直接写,不带反斜杠。

This:

\'

Is not correct when your string is delimited by double quotes. It will be an actual backslash followed by a quote character.

And these as well:

\-\'\,\.

Just write them direct without the backslash.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文