计算文件中的单词数
我在计算文件中的字数时遇到问题。我采取的方法是,当我看到空格或换行符时,我就知道要计算单词数。
问题是,如果我在段落之间有多行,那么我最终也会将它们算作单词。如果您查看 readFile() 方法,您就可以看到我在做什么。
您能帮助我并指导我如何解决这个问题吗?
输入文件示例(包括空行):
word word word
word word
word word word
I'm having a problem counting the number of words in a file. The approach that I am taking is when I see a space or a newLine then I know to count a word.
The problem is that if I have multiple lines between paragraphs then I ended up counting them as words also. If you look at the readFile() method you can see what I am doing.
Could you help me out and guide me in the right direction on how to fix this?
Example input file (including a blank line):
word word word
word word
word word word
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
您可以将 Scanner 与 FileInputStream 一起使用,而不是将 BufferedReader 与 FileReader 一起使用。例如:-
You can use a Scanner with a FileInputStream instead of BufferedReader with a FileReader. For example:-
我会稍微改变一下你的方法。首先,我将使用
BufferedReader
使用readLine()
逐行读取文件。然后使用 String.split("\\s") 在空白处分割每一行,并使用结果数组的大小来查看该行有多少个单词。要获取字符数,您可以查看每行或每个拆分单词的大小(取决于您是否要将空格计为字符)。I would change your approach a bit. First, I would use a
BufferedReader
to read the file file in line-by-line usingreadLine()
. Then split each line on whitespace usingString.split("\\s")
and use the size of the resulting array to see how many words are on that line. To get the number of characters you could either look at the size of each line or of each split word (depending of if you want to count whitespace as characters).这只是一个想法。有一种非常简单的方法可以做到这一点。如果您只需要单词数而不是实际单词,那么只需使用 Apache WordUtils
This is just a thought. There is one very easy way to do it. If you just need number of words and not actual words then just use Apache WordUtils
只需保留一个布尔标志,让您知道前一个字符是否为空格(伪代码如下):
Just keep a boolean flag around that lets you know if the previous character was whitespace or not (pseudocode follows):
我认为正确的方法是使用正则表达式:
希望它有帮助。 “\s+”的含义在 Pattern javadoc< /a>
I think a correct approach would be by means of Regex:
Hope it helps. The "\s+" meaning is in Pattern javadoc
黑客解决方案
您可以将文本文件读入字符串变量。然后使用单个空格作为分隔符将字符串拆分为数组 StringVar.Split(" ")。
数组计数将等于文件中“单词”的数量。
当然,这不会给你一个行号的计数。
Hack solution
You can read the text file into a String var. Then split the String into an array using a single whitespace as the delimiter StringVar.Split(" ").
The Array count would equal the number of "Words" in the file.
Of course this wouldnt give you a count of line numbers.
3步:消耗所有空白,检查是否是一行,消耗所有非空白。3
3 steps: Consume all the white spaces, check if is a line, consume all the nonwhitespace.3
文件字数统计
如果单词之间有一些符号,那么您可以拆分并计算单词数。
File Word-Count
If in between words having some symbols then you can split and count the number of Words.
看看我这里的解决方案,它应该有效。这个想法是从单词中删除所有不需要的符号,然后将这些单词分开并将它们存储在其他变量中,我使用的是 ArrayList。通过调整“excludedSymbols”变量,您可以添加更多您希望从单词中排除的符号。
Take a look at my solution here, it should work. The idea is to remove all the unwanted symbols from the words, then separate those words and store them in some other variable, i was using ArrayList. By adjusting the "excludedSymbols" variable you can add more symbols which you would like to be excluded from the words.
使用 Java 8 可以通过一种非常简单的方式来完成此操作:
This can be done in a very way using Java 8:
下面的代码在 Java 8 中支持
//将文件读入字符串
//通过用分隔符分割将它们保存到字符串列表中
The below code supports in Java 8
//Read file into String
//Keeping these into list of strings by splitting with a delimiter
如此简单,我们可以通过方法从文件中获取字符串: getText();
So easy we can get the String from files by method: getText();