按单词和标点符号分解字符串
为了分割一个字符串,我想出了......
<php
preg_match_all('/(\w)|(,.!?;)/', "I'm a little teapot, short and stout.", $matches);
print_r($matches[0]);
我认为这会分隔每个单词(\w)和指定的标点符号(,.!?;)。 例如:["I'm", "a", "little", "teapot", ",", "short", "and", "stout", "."]
相反我得到:
Array
(
[0] => I
[1] => m
[2] => a
[3] => l
[4] => i
[5] => t
[6] => t
[7] => l
[8] => e
[9] => t
[10] => e
[11] => a
[12] => p
[13] => o
等等...
我在这里做错了什么?
提前致谢。
To split up a string, I come up with...
<php
preg_match_all('/(\w)|(,.!?;)/', "I'm a little teapot, short and stout.", $matches);
print_r($matches[0]);
I thought this would separate each word (\w) and the specified punctuation (,.!?;).
For example: ["I'm", "a", "little", "teapot", ",", "short", "and", "stout", "."]
Instead I get:
Array
(
[0] => I
[1] => m
[2] => a
[3] => l
[4] => i
[5] => t
[6] => t
[7] => l
[8] => e
[9] => t
[10] => e
[11] => a
[12] => p
[13] => o
etc...
What am I doing wrong here?
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您有两个错误:
\w
仅匹配单个字符。您想通过\w+
匹配多个。此外,\w
仅匹配字母数字字符。如果您想匹配其他字符,例如'
,则需要包含它们:[\w']
。(,.!?;)
与字符序列,.!?;
匹配。相反,您想使用[,.!?;]
匹配这些字符中的任何一个。正确的正则表达式是:
如果你想更宽松,你应该使用 unicode 字符类(允许字母、数字、组合标记、破折号字符以及单词的撇号和标点符号):
You have two faults:
\w
matches only a single character. You want to match multiple by\w+
. Furthermore\w
matches only alphanumeric characters. If you want to match other characters like'
you will need to include them:[\w']
.(,.!?;)
matches the character sequence,.!?;
. Instead you want to match any of these characters using[,.!?;]
.The correct regex is:
If you want to be more permissive you should use unicode character classes instead (allows letters, numbers, combining marks, dash characters and the apostrophe for words and punctuation for punctuation):
试试这个 - 确保它按照您想要的方式工作:
还想与您分享一项非常有用的服务 - 在线正则表达式测试器
Try this - sure it works as you want:
Also want to share with you one very useful service - online regex tester
您可能想尝试以下操作:
You may want to try something like: