java 字符串分割 +图案
我正在使用此方法来分割一些文本:
String[] parts = sentence.split("[,\\s\\-:\\?\\!\\«\\»\\'\\´\\`\\\"\\.\\\\\\/]");
这将根据指定的符号分割文本。其中一个符号是“-”,因为我的文本有这样奇怪的东西:“----------------单词---words2 --words3--words4”。这将符合我的需求,因为它不会像这样划分(如果我不添加“-”):“---words3---words4(如果我不添加“-”,它将被视为一个单词)。
但是有是一件棘手的事情。我想允许这样的词:“aaa-bbb”,这是通过以下模式验证的:
Pattern pattern = Pattern.compile("(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])");
允许:aaa-bb,aaa-bbbbbbbb 不允许:aaa--bb,aa--bbb-cc
所以我的问题是,是否可以应用上面的分割来分割我的文本,但也考虑到这个模式是一个单词分隔符(对于像aaa-bbb这样的单词)?
预先感谢, 理查德
I'm using this method to split some text:
String[] parts = sentence.split("[,\\s\\-:\\?\\!\\«\\»\\'\\´\\`\\\"\\.\\\\\\/]");
Which will split me the text according to the specified symbols. One of the symbols is "-", because my text have weird things like this: "-------------- words --- words2 --words3--words4". Which will match my needs because it wont divide like this (in case i dont add "-"): "---words3---words4 (which will be considered a word in case i dont add "-").
But there is a tricky thing. I want to allow words like this: "aaa-bbb", which is is verified by this pattern:
Pattern pattern = Pattern.compile("(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])");
allow: aaa-bb, aaa-bbbbbbb
not allow: aaa--bb, aa--bbb-cc
So my question is, is it possible to split my text applying the split above, but also considering this pattern is a word separator(for words like aaa-bbb) ?
Thanks in advances,
Richard
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
据我所知,您正在追求以下内容:
From what I gather you are after the following: