计算文件中的单词数

发布于 2024-09-30 05:42:23 字数 243 浏览 0 评论 0原文

我在计算文件中的字数时遇到问题。我采取的方法是,当我看到空格或换行符时,我就知道要计算单词数。

问题是,如果我在段落之间有多行,那么我最终也会将它们算作单词。如果您查看 readFile() 方法,您就可以看到我在做什么。

您能帮助我并指导我如何解决这个问题吗?

输入文件示例(包括空行):

word word word
word word

word word word

I'm having a problem counting the number of words in a file. The approach that I am taking is when I see a space or a newLine then I know to count a word.

The problem is that if I have multiple lines between paragraphs then I ended up counting them as words also. If you look at the readFile() method you can see what I am doing.

Could you help me out and guide me in the right direction on how to fix this?

Example input file (including a blank line):

word word word
word word

word word word

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

蓝眼泪 2024-10-07 05:42:23

您可以将 Scanner 与 FileInputStream 一起使用,而不是将 BufferedReader 与 FileReader 一起使用。例如:-

File file = new File("sample.txt");
try(Scanner sc = new Scanner(new FileInputStream(file))){
    int count=0;
    while(sc.hasNext()){
        sc.next();
        count++;
    }
System.out.println("Number of words: " + count);
}

You can use a Scanner with a FileInputStream instead of BufferedReader with a FileReader. For example:-

File file = new File("sample.txt");
try(Scanner sc = new Scanner(new FileInputStream(file))){
    int count=0;
    while(sc.hasNext()){
        sc.next();
        count++;
    }
System.out.println("Number of words: " + count);
}
栀梦 2024-10-07 05:42:23

我会稍微改变一下你的方法。首先,我将使用 BufferedReader 使用 readLine() 逐行读取文件。然后使用 String.split("\\s") 在空白处分割每一行,并使用结果数组的大小来查看该行有多少个单词。要获取字符数,您可以查看每行或每个拆分单词的大小(取决于您是否要将空格计为字符)。

I would change your approach a bit. First, I would use a BufferedReader to read the file file in line-by-line using readLine(). Then split each line on whitespace using String.split("\\s") and use the size of the resulting array to see how many words are on that line. To get the number of characters you could either look at the size of each line or of each split word (depending of if you want to count whitespace as characters).

半枫 2024-10-07 05:42:23

这只是一个想法。有一种非常简单的方法可以做到这一点。如果您只需要单词数而不是实际单词,那么只需使用 Apache WordUtils

import org.apache.commons.lang.WordUtils;

public class CountWord {

public static void main(String[] args) {    
String str = "Just keep a boolean flag around that lets you know if the previous character was whitespace or not pseudocode follows";

    String initials = WordUtils.initials(str);

    System.out.println(initials);
    //so number of words in your file will be
    System.out.println(initials.length());    
  }
}

This is just a thought. There is one very easy way to do it. If you just need number of words and not actual words then just use Apache WordUtils

import org.apache.commons.lang.WordUtils;

public class CountWord {

public static void main(String[] args) {    
String str = "Just keep a boolean flag around that lets you know if the previous character was whitespace or not pseudocode follows";

    String initials = WordUtils.initials(str);

    System.out.println(initials);
    //so number of words in your file will be
    System.out.println(initials.length());    
  }
}
美胚控场 2024-10-07 05:42:23

只需保留一个布尔标志,让您知道前一个字符是否为空格(伪代码如下):

boolean prevWhitespace = false;
int wordCount = 0;
while (char ch = getNextChar(input)) {
  if (isWhitespace(ch)) {
    if (!prevWhitespace) {
      prevWhitespace = true;
      wordCount++;
    }
  } else {
    prevWhitespace = false;
  }
}

Just keep a boolean flag around that lets you know if the previous character was whitespace or not (pseudocode follows):

boolean prevWhitespace = false;
int wordCount = 0;
while (char ch = getNextChar(input)) {
  if (isWhitespace(ch)) {
    if (!prevWhitespace) {
      prevWhitespace = true;
      wordCount++;
    }
  } else {
    prevWhitespace = false;
  }
}
苏辞 2024-10-07 05:42:23

我认为正确的方法是使用正则表达式:

String fileContent = <text from file>;    
String[] words = Pattern.compile("\\s+").split(fileContent);
System.out.println("File has " + words.length + " words");

希望它有帮助。 “\s+”的含义在 Pattern javadoc< /a>

I think a correct approach would be by means of Regex:

String fileContent = <text from file>;    
String[] words = Pattern.compile("\\s+").split(fileContent);
System.out.println("File has " + words.length + " words");

Hope it helps. The "\s+" meaning is in Pattern javadoc

金兰素衣 2024-10-07 05:42:23
import java.io.BufferedReader;
import java.io.FileReader;

public class CountWords {

    public static void main (String args[]) throws Exception {

       System.out.println ("Counting Words");       
       FileReader fr = new FileReader ("c:\\Customer1.txt");        
       BufferedReader br = new BufferedReader (fr);     
       String line = br.readLin ();
       int count = 0;
       while (line != null) {
          String []parts = line.split(" ");
          for( String w : parts)
          {
            count++;        
          }
          line = br.readLine();
       }         
       System.out.println(count);
    }
}
import java.io.BufferedReader;
import java.io.FileReader;

public class CountWords {

    public static void main (String args[]) throws Exception {

       System.out.println ("Counting Words");       
       FileReader fr = new FileReader ("c:\\Customer1.txt");        
       BufferedReader br = new BufferedReader (fr);     
       String line = br.readLin ();
       int count = 0;
       while (line != null) {
          String []parts = line.split(" ");
          for( String w : parts)
          {
            count++;        
          }
          line = br.readLine();
       }         
       System.out.println(count);
    }
}
靖瑶 2024-10-07 05:42:23

黑客解决方案

您可以将文本文件读入字符串变量。然后使用单个空格作为分隔符将字符串拆分为数组 StringVar.Split(" ")。

数组计数将等于文件中“单词”的数量。
当然,这不会给你一个行号的计数。

Hack solution

You can read the text file into a String var. Then split the String into an array using a single whitespace as the delimiter StringVar.Split(" ").

The Array count would equal the number of "Words" in the file.
Of course this wouldnt give you a count of line numbers.

若水微香 2024-10-07 05:42:23

3步:消耗所有空白,检查是否是一行,消耗所有非空白。3

while(true){
    c = inFile.read();                
    // consume whitespaces
    while(isspace(c)){ inFile.read() }
    if (c == '\n'){ numberLines++; continue; }
    while (!isspace(c)){
         numberChars++;
         c = inFile.read();
    }
    numberWords++;
}

3 steps: Consume all the white spaces, check if is a line, consume all the nonwhitespace.3

while(true){
    c = inFile.read();                
    // consume whitespaces
    while(isspace(c)){ inFile.read() }
    if (c == '\n'){ numberLines++; continue; }
    while (!isspace(c)){
         numberChars++;
         c = inFile.read();
    }
    numberWords++;
}
电影里的梦 2024-10-07 05:42:23

文件字数统计

如果单词之间有一些符号,那么您可以拆分并计算单词数。

Scanner sc = new Scanner(new FileInputStream(new File("Input.txt")));
        int count = 0;
        while (sc.hasNext()) {

            String[] s = sc.next().split("d*[.@:=#-]"); 

            for (int i = 0; i < s.length; i++) {
                if (!s[i].isEmpty()){
                    System.out.println(s[i]);
                    count++;
                }   
            }           
        }
        System.out.println("Word-Count : "+count);

File Word-Count

If in between words having some symbols then you can split and count the number of Words.

Scanner sc = new Scanner(new FileInputStream(new File("Input.txt")));
        int count = 0;
        while (sc.hasNext()) {

            String[] s = sc.next().split("d*[.@:=#-]"); 

            for (int i = 0; i < s.length; i++) {
                if (!s[i].isEmpty()){
                    System.out.println(s[i]);
                    count++;
                }   
            }           
        }
        System.out.println("Word-Count : "+count);
时光清浅 2024-10-07 05:42:23

看看我这里的解决方案,它应该有效。这个想法是从单词中删除所有不需要的符号,然后将这些单词分开并将它们存储在其他变量中,我使用的是 ArrayList。通过调整“excludedSymbols”变量,您可以添加更多您希望从单词中排除的符号。

public static void countWords () {
    String textFileLocation ="c:\\yourFileLocation";
    String readWords ="";
    ArrayList<String> extractOnlyWordsFromTextFile = new ArrayList<>();
    // excludedSymbols can be extended to whatever you want to exclude from the file 
    String[] excludedSymbols = {" ", "," , "." , "/" , ":" , ";" , "<" , ">", "\n"};
    String readByteCharByChar = "";
    boolean testIfWord = false;


    try {
        InputStream inputStream = new FileInputStream(textFileLocation);
        byte byte1 = (byte) inputStream.read();
        while (byte1 != -1) {

            readByteCharByChar +=String.valueOf((char)byte1);
            for(int i=0;i<excludedSymbols.length;i++) {
            if(readByteCharByChar.equals(excludedSymbols[i])) {
                if(!readWords.equals("")) {
                extractOnlyWordsFromTextFile.add(readWords);
                }
                readWords ="";
                testIfWord = true;
                break;
            }
            }
            if(!testIfWord) {
                readWords+=(char)byte1;
            }
            readByteCharByChar = "";
            testIfWord = false;
            byte1 = (byte)inputStream.read();
            if(byte1 == -1 && !readWords.equals("")) {
                extractOnlyWordsFromTextFile.add(readWords);
            }
        }
        inputStream.close();
        System.out.println(extractOnlyWordsFromTextFile);
        System.out.println("The number of words in the choosen text file are: " + extractOnlyWordsFromTextFile.size());
    } catch (IOException ioException) {

        ioException.printStackTrace();
    }
}

Take a look at my solution here, it should work. The idea is to remove all the unwanted symbols from the words, then separate those words and store them in some other variable, i was using ArrayList. By adjusting the "excludedSymbols" variable you can add more symbols which you would like to be excluded from the words.

public static void countWords () {
    String textFileLocation ="c:\\yourFileLocation";
    String readWords ="";
    ArrayList<String> extractOnlyWordsFromTextFile = new ArrayList<>();
    // excludedSymbols can be extended to whatever you want to exclude from the file 
    String[] excludedSymbols = {" ", "," , "." , "/" , ":" , ";" , "<" , ">", "\n"};
    String readByteCharByChar = "";
    boolean testIfWord = false;


    try {
        InputStream inputStream = new FileInputStream(textFileLocation);
        byte byte1 = (byte) inputStream.read();
        while (byte1 != -1) {

            readByteCharByChar +=String.valueOf((char)byte1);
            for(int i=0;i<excludedSymbols.length;i++) {
            if(readByteCharByChar.equals(excludedSymbols[i])) {
                if(!readWords.equals("")) {
                extractOnlyWordsFromTextFile.add(readWords);
                }
                readWords ="";
                testIfWord = true;
                break;
            }
            }
            if(!testIfWord) {
                readWords+=(char)byte1;
            }
            readByteCharByChar = "";
            testIfWord = false;
            byte1 = (byte)inputStream.read();
            if(byte1 == -1 && !readWords.equals("")) {
                extractOnlyWordsFromTextFile.add(readWords);
            }
        }
        inputStream.close();
        System.out.println(extractOnlyWordsFromTextFile);
        System.out.println("The number of words in the choosen text file are: " + extractOnlyWordsFromTextFile.size());
    } catch (IOException ioException) {

        ioException.printStackTrace();
    }
}
十级心震 2024-10-07 05:42:23

使用 Java 8 可以通过一种非常简单的方式来完成此操作:

Files.lines(Paths.get(file))
    .flatMap(str->Stream.of(str.split("[ ,.!?\r\n]")))
    .filter(s->s.length()>0).count();

This can be done in a very way using Java 8:

Files.lines(Paths.get(file))
    .flatMap(str->Stream.of(str.split("[ ,.!?\r\n]")))
    .filter(s->s.length()>0).count();
彩虹直至黑白 2024-10-07 05:42:23
BufferedReader bf= new BufferedReader(new FileReader("G://Sample.txt"));
        String line=bf.readLine();
        while(line!=null)
        {
            String[] words=line.split(" ");
            System.out.println("this line contains " +words.length+ " words");
            line=bf.readLine();
        }
BufferedReader bf= new BufferedReader(new FileReader("G://Sample.txt"));
        String line=bf.readLine();
        while(line!=null)
        {
            String[] words=line.split(" ");
            System.out.println("this line contains " +words.length+ " words");
            line=bf.readLine();
        }
终难遇 2024-10-07 05:42:23

下面的代码在 Java 8 中支持

//将文件读入字符串

String fileContent=new String(Files.readAlBytes(Paths.get("MyFile.txt")),StandardCharacters.UFT_8);

//通过用分隔符分割将它们保存到字符串列表中

List<String> words = Arrays.asList(contents.split("\\PL+"));

int count=0;
for(String x: words){
 if(x.length()>1) count++;
}

sop(x);

The below code supports in Java 8

//Read file into String

String fileContent=new String(Files.readAlBytes(Paths.get("MyFile.txt")),StandardCharacters.UFT_8);

//Keeping these into list of strings by splitting with a delimiter

List<String> words = Arrays.asList(contents.split("\\PL+"));

int count=0;
for(String x: words){
 if(x.length()>1) count++;
}

sop(x);
西瓜 2024-10-07 05:42:23

如此简单,我们可以通过方法从文件中获取字符串: getText();

public class Main {

    static int countOfWords(String str) {
        if (str.equals("") || str == null) {
            return 0;
        }else{
            int numberWords = 0;
            for (char c : str.toCharArray()) {
                if (c == ' ') {
                    numberWords++;
                }
            }

            return ++numberWordss;
        }
    }
}

So easy we can get the String from files by method: getText();

public class Main {

    static int countOfWords(String str) {
        if (str.equals("") || str == null) {
            return 0;
        }else{
            int numberWords = 0;
            for (char c : str.toCharArray()) {
                if (c == ' ') {
                    numberWords++;
                }
            }

            return ++numberWordss;
        }
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文