如何用正则表达式匹配英文文章中的英文单词?谢谢^_^
题目描述
需求:Java写一个程序,汇总文章中每个英文单词的个数。判断一个单词时,需要考虑前后的空格,换行字符以及连接”-”符号,连接符会将一个词组成一个整体,用正则表达式实现,具体规则如下:
- 以下当作一个词:
don't, doesn't, didn't, can't, couldn't, wouldn't, isn't, aren't, wasn't, weren't - 以下当作一个词处理:
he's, she's, I'm, you're, we're, they're - 以下不计入统计,删除
Shawn's, apple's, Jonas’, what's, 'twas - ice-cream 如果不在行尾换行时,当作一个词,但是不能删掉中间连接符
题目来源及自己的思路
看了一些资料,写了一个初稿,
(?:she's|he's|they're|we're|you're|I'm|It's)|(?:isn't|aren't|doesn't|don't|didn't|haven't|hadn't|hasn't|can't|couldn't|wasn't|weren't|wouldn't )
测试字符串为:
She's"1.tom:'what's your name.' Jame's Janes', didn't, character,wasn't,
ice-cream,
相关代码
(?:she's|he's|they're|we're|you're|I'm|It's)|(?:isn't|aren't|doesn't|don't|didn't|haven't|hadn't|hasn't|can't|couldn't|wasn't|weren't|wouldn't )
你期待的结果是什么?实际看到的错误信息又是什么?
但是不能正确判断单词、连接符和换行符。
谢谢老司机领路!帮我设计这个正则表达式 ^_^
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
基本上满足你的要求
这样子写可以吗?稍微麻烦了点但是正确率应该算高
while (tokenizer1.hasMoreTokens()) {