修复 php 标记化模式
我有一个 Perl 脚本,它标记一个字符串
@s=split /([^a-zA-Z \t\-\'\,\.]+)/, $_[0]; # tokenized with separators
,因此如果我有一个字符串 $s="The large [[bear]] isangered."
它将返回 array("The large", " [[", "bear", "]] ", "isanger", ".")
但是当我在 php 脚本上使用正则表达式模式时,它似乎不起作用。
$tokens = preg_split("/[^a-z \t\-\'\,\.]+/i", $s);
有人对这个问题有任何想法吗?
i have a Perl script which tokenize a string
@s=split /([^a-zA-Z \t\-\'\,\.]+)/, $_[0]; # tokenized with separators
so if i have a string $s="The large [[bear]] is dangerous."
it will return array("The large", " [[", "bear", "]] ", "is dangerous", ".")
But the regex pattern doesn't seem to work when i used it on a php script.
$tokens = preg_split("/[^a-z \t\-\'\,\.]+/i", $s);
does anyone have any idea about the problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
让它运行(演示):
输出:
详细信息:
\'
写为\\\'
。PREG_SPLIT_DELIM_CAPTURE
标志。请参阅 单引号字符串 文档和
preg_split
文档编辑: 要在标点符号处拆分,只需删除它们来自模式(演示):
Got it to run (Demo):
Output:
Details:
\'
is written as\\\'
.PREG_SPLIT_DELIM_CAPTURE
flag.See Single quoted StringsDocs and
preg_split
DocsEdit: To split at the punctuation, just remove them from the pattern (Demo):
这:
当您的字符串由双引号分隔时,这是不正确的。它将是一个实际的反斜杠,后跟一个引号字符。
还有这些:
直接写,不带反斜杠。
This:
Is not correct when your string is delimited by double quotes. It will be an actual backslash followed by a quote character.
And these as well:
Just write them direct without the backslash.