按单词和标点符号分解字符串

发布于 2024-10-31 10:55:03 字数 624 浏览 1 评论 0原文

为了分割一个字符串,我想出了......

<php
    preg_match_all('/(\w)|(,.!?;)/', "I'm a little teapot, short and stout.", $matches);
    print_r($matches[0]);

我认为这会分隔每个单词(\w)和指定的标点符号(,.!?;)。 例如:["I'm", "a", "little", "teapot", ",", "short", "and", "stout", "."]

相反我得到:

Array
(
    [0] => I
    [1] => m
    [2] => a
    [3] => l
    [4] => i
    [5] => t
    [6] => t
    [7] => l
    [8] => e
    [9] => t
    [10] => e
    [11] => a
    [12] => p
    [13] => o

等等...

我在这里做错了什么?

提前致谢。

To split up a string, I come up with...

<php
    preg_match_all('/(\w)|(,.!?;)/', "I'm a little teapot, short and stout.", $matches);
    print_r($matches[0]);

I thought this would separate each word (\w) and the specified punctuation (,.!?;).
For example: ["I'm", "a", "little", "teapot", ",", "short", "and", "stout", "."]

Instead I get:

Array
(
    [0] => I
    [1] => m
    [2] => a
    [3] => l
    [4] => i
    [5] => t
    [6] => t
    [7] => l
    [8] => e
    [9] => t
    [10] => e
    [11] => a
    [12] => p
    [13] => o

etc...

What am I doing wrong here?

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

じ违心 2024-11-07 10:55:03

您有两个错误:

  1. \w 仅匹配单个字符。您想通过 \w+ 匹配多个。此外,\w 仅匹配字母数字字符。如果您想匹配其他字符,例如 ',则需要包含它们:[\w']
  2. (,.!?;) 与字符序列 ,.!?; 匹配。相反,您想使用 [,.!?;] 匹配这些字符中的任何一个。

正确的正则表达式是:

'/[\w\']+|[,.!?;]/'

如果你想更宽松,你应该使用 unicode 字符类(允许字母、数字、组合标记、破折号字符以及单词的撇号和标点符号):

'/[\pL\pN\pM\pPd\']+|\pP/u'

You have two faults:

  1. The \w matches only a single character. You want to match multiple by \w+. Furthermore \w matches only alphanumeric characters. If you want to match other characters like ' you will need to include them: [\w'].
  2. The (,.!?;) matches the character sequence ,.!?;. Instead you want to match any of these characters using [,.!?;].

The correct regex is:

'/[\w\']+|[,.!?;]/'

If you want to be more permissive you should use unicode character classes instead (allows letters, numbers, combining marks, dash characters and the apostrophe for words and punctuation for punctuation):

'/[\pL\pN\pM\pPd\']+|\pP/u'
反目相谮 2024-11-07 10:55:03

试试这个 - 确保它按照您想要的方式工作:

([\w]+)|[,.!?;]+

还想与您分享一项非常有用的服务 - 在线正则表达式测试器

Try this - sure it works as you want:

([\w]+)|[,.!?;]+

Also want to share with you one very useful service - online regex tester

酒浓于脸红 2024-11-07 10:55:03

您可能想尝试以下操作:

/([^,.!?; ]+)|(,.!?;)/

You may want to try something like:

/([^,.!?; ]+)|(,.!?;)/
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文