在多行文本文件上使用 StringTokenizer 时出错

发布于 2024-11-04 08:21:49 字数 1487 浏览 0 评论 0原文

我正在尝试读取一个文本文件并使用java中的字符串分词器实用程序单独分割单词。

文本文件如下所示；

现在，我想做的是从文本文件中获取每个单独的字符并将其存储到数组列表中。然后我尝试最后打印数组列表中的每个元素。

这是我的代码；

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.StringTokenizer;

public static void main(String[] args) {

    String fileSpecified = args[0];

    fileSpecified = fileSpecified.concat(".txt");
    String line;
    System.out.println ("file Specified = " + fileSpecified);

    ArrayList <String> words = new ArrayList<String> ();


    try {
        FileReader fr = new FileReader (fileSpecified);
        BufferedReader br = new BufferedReader (fr);
        line = br.readLine();

        StringTokenizer token;
        while ((line  = br.readLine()) != null) {
            token = new StringTokenizer (line);
            words.add(token.nextToken());
        }
    } catch (IOException e) {
        System.out.println (e.getMessage());
    }

    for (int i = 0; i < words.size(); i++) {
        System.out.println ("words = " + words.get(i));
    }



}

我收到的错误消息是这样的；

Exception in thread "main" java.util.NoSuchElementException   
                at java.util.StringTokenizer.nextToken<Unknown Source>  
                at getWords.main<getWords.java:32>

其中“getWords”是我的 java 文件的名称。

谢谢。

原文

I'm trying to read a text file and split the words individually using string tokenizer utility in java.

The text file looks like this;

Now, what I'm trying to do is get each individual character from the text file and store it into an array list. I then try and print every element in the arraylist in the end.

Here is my code;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.StringTokenizer;

public static void main(String[] args) {

    String fileSpecified = args[0];

    fileSpecified = fileSpecified.concat(".txt");
    String line;
    System.out.println ("file Specified = " + fileSpecified);

    ArrayList <String> words = new ArrayList<String> ();


    try {
        FileReader fr = new FileReader (fileSpecified);
        BufferedReader br = new BufferedReader (fr);
        line = br.readLine();

        StringTokenizer token;
        while ((line  = br.readLine()) != null) {
            token = new StringTokenizer (line);
            words.add(token.nextToken());
        }
    } catch (IOException e) {
        System.out.println (e.getMessage());
    }

    for (int i = 0; i < words.size(); i++) {
        System.out.println ("words = " + words.get(i));
    }



}

The error message I get is this;

Exception in thread "main" java.util.NoSuchElementException   
                at java.util.StringTokenizer.nextToken<Unknown Source>  
                at getWords.main<getWords.java:32>

Where 'getWords' is the name of my java file.

Thankyou.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风吹雪碎 2024-11-11 08:21:49

a）您始终必须检查首先是StringTokenizer.hasMoreTokens()。如果没有更多可用的标记，则抛出 NoSuchElementException 是记录的行为：

token = new StringTokenizer (line);
while(token.hasMoreTokens())
    words.add(token.nextToken());

b) 不要为每一行创建一个新的 Tokenizer，除非您的文件太大而无法放入内存。将整个文件读取为字符串并让分词器对其进行处理

a) You always have to check StringTokenizer.hasMoreTokens() first. Throwing NoSuchElementException is the documented behaviour if no more tokens are available:

token = new StringTokenizer (line);
while(token.hasMoreTokens())
    words.add(token.nextToken());

b) don't create a new Tokenizer for every line, unless your file is too large to fit into memory. Read the entire file to a String and let the tokenizer work on that

回复收藏 0 原文

箜明 2024-11-11 08:21:49

您的一般方法似乎是合理的，但您的代码中有一个基本问题。

您的解析器很可能在输入文件的第二行失败。该行是一个空行，因此当您调用 words.add(token.nextToken()); 时，您会收到错误，因为没有标记。这也意味着您只能获得每行的第一个标记。

您应该像这样迭代令牌：

while(token.hasMoreTokens())
{
    words.add(token.nextToken())
}

您可以在此处的 javadocs 中找到更通用的示例：

http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html

Your general approach seems sound, but you have a basic problem in your code.

Your parser is most likely failing on the second line of your input file. This line is a blank line, so when you call words.add(token.nextToken()); you get an error, because there are no tokens. This also means you'll only ever get the first token on each line.

You should iterate on the tokes like this:

while(token.hasMoreTokens())
{
    words.add(token.nextToken())
}

You can find a more general example in the javadocs here:

http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html

回复收藏 0 原文

裂开嘴轻声笑有多痛 2024-11-11 08:21:49

这个问题是由于您在尝试获取下一个令牌之前没有测试是否有下一个令牌。在调用 nextToken() 之前，您应该始终测试 hasMoreTokens() 是否返回 true。

但是您还有其他错误：

第一行被读取，但未标记化
您只将每行的第一个单词添加到单词列表中
不好的做法：标记变量应该在循环内部声明，而不是在
您不关闭的外部声明你的读者在finally块中

回复收藏 0 原文

妳是的陽光 2024-11-11 08:21:49

您需要使用 hasMoreTokens() 方法。还解决了 JB Nizet 指出的代码中的各种编码标准问题

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.StringTokenizer;

public class TestStringTokenizer {

    /**
     * @param args
     * @throws IOException 
     */
    public static void main(String[] args) throws IOException {
        String fileSpecified = args[0];

        fileSpecified = fileSpecified.concat(".txt");
        String line;
        System.out.println ("file Specified = " + fileSpecified);

        ArrayList <String> words = new ArrayList<String> ();

        BufferedReader br =  new BufferedReader (new FileReader (fileSpecified));
        try{
            while ((line  = br.readLine()) != null) {
                StringTokenizer token = new StringTokenizer (line);
                while(token.hasMoreTokens())
                    words.add(token.nextToken());
            }
        } catch (IOException e) {
            System.out.println (e.getMessage());
            e.printStackTrace();
        } finally {
            br.close();
        }

        for (int i = 0; i < words.size(); i++) {
            System.out.println ("words = " + words.get(i));
        }
    }
}

You need to use hasMoreTokens() method. Also addressed various coding standard issues in your code as pointed out by JB Nizet

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.StringTokenizer;

public class TestStringTokenizer {

    /**
     * @param args
     * @throws IOException 
     */
    public static void main(String[] args) throws IOException {
        String fileSpecified = args[0];

        fileSpecified = fileSpecified.concat(".txt");
        String line;
        System.out.println ("file Specified = " + fileSpecified);

        ArrayList <String> words = new ArrayList<String> ();

        BufferedReader br =  new BufferedReader (new FileReader (fileSpecified));
        try{
            while ((line  = br.readLine()) != null) {
                StringTokenizer token = new StringTokenizer (line);
                while(token.hasMoreTokens())
                    words.add(token.nextToken());
            }
        } catch (IOException e) {
            System.out.println (e.getMessage());
            e.printStackTrace();
        } finally {
            br.close();
        }

        for (int i = 0; i < words.size(); i++) {
            System.out.println ("words = " + words.get(i));
        }
    }
}

回复收藏 0 原文

~没有更多了~