如何使用分隔符来隔离单词(Java)
我正在编写一个程序,扫描文本文件,然后将每个单词写入哈希映射。
Scanner 类有一个默认的空格分隔符。但我最终将我的话与标点符号一起存储起来。我希望扫描仪能够识别句号、逗号和其他类型的常见标点符号作为停止标记的标志。这是我尝试过的:
Scanner line_scanner = new Scanner(line).useDelimiter("[.,:;()?!\" \t]+~\\s");
扫描仪基本上忽略了所有空格,即使我将“\\s”作为表达式的一部分。抱歉,但我对正则表达式几乎一无所知。
I am writing a program that scans text files and then writes each word into a Hashmap.
The Scanner class has a defualt delimiter of space. But I ended up having my words stored with punctuations attached to them. I want the scanner to recognize periods, comas and other types of common punctuations as a sign to stop the token. Here's what I have attempted:
Scanner line_scanner = new Scanner(line).useDelimiter("[.,:;()?!\" \t]+~\\s");
The scanner basically ignored all the spaces even though I have '\\s' as part of the expression. Sorry, but I have hardly any understanding of regex.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可能不选择 unicode 字母:
([^...] 不是,大写 p 表示 Unicode 类别,L 是字母,M 是变音组合标记(重音符号)。)
You might go for no unicode letters:
([^...] is not, Capital p means Unicode category, L are the letters, M the diacritical combining marks (accents).)