创建 Java 程序来搜索文件中的特定单词

发布于 2024-10-05 15:49:21 字数 177 浏览 0 评论 0原文

我刚刚学习该语言,想知道更有经验的 Java 程序员在以下情况下会做什么?

我想创建一个java程序,它将搜索指定文件中特定单词的所有实例。

你会如何处理这个问题,Java API 是否附带一个提供文件扫描功能的类,或者我是否必须编写自己的类来执行此操作?

感谢您的任何意见,
多姆.

I am just learning that language and was wondering what a more experience Java programmer would do in the following situation?

I would like to create a java program that will search a specified file for all instanced for a specific word.

How would you go about this, does that Java API come with a class that provides file scanning capabilities or would i have to write my own class to do this?

Thanks for any input,
Dom.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

万人眼中万个我 2024-10-12 15:49:21

java API 确实提供了 java.lang. util.Scanner类允许您扫描输入文件。

然而,根据您打算如何使用它,这可能不是最好的主意。文件很大吗?您是否只搜索一个文件,或者是否尝试保留多个文件的数据库并在其中搜索文件?在这种情况下,您可能需要使用更充实的引擎,例如 lucene

The java API does offer the java.util.Scannerclass which will allow you to scan across an input file.

Depending on how you intend to use this, however, this might not be the best idea. Is the file very large? Are you searching only one file or are you trying to keep a database of many files and search for files within that? In that case, you might want to use a more fleshed out engine such as lucene.

彼岸花似海 2024-10-12 15:49:21

除非文件非常大,否则要

String text = IOUtils.toString(new FileReader(filename));
boolean foundWord = text.matches("\\b" + word+ "\\b");

查找单词之间的所有文本,您可以使用 split() 并使用字符串的长度来确定位置。

Unless the file is very large, I would

String text = IOUtils.toString(new FileReader(filename));
boolean foundWord = text.matches("\\b" + word+ "\\b");

To find all the text between your word you can use split() and use the length of the strings to determine the position.

看透却不说透 2024-10-12 15:49:21

正如其他人指出的那样,您可以使用 Scanner 类。

我将您的问题放入文件 data.txt 中,并运行以下程序:

import java.io.*;
import java.util.Scanner;
import java.util.regex.MatchResult;

public class Test {
    public static void main(String[] args) throws FileNotFoundException {
        Scanner s = new Scanner(new File("data.txt"));
        while (null != s.findWithinHorizon("(?i)\\bjava\\b", 0)) {
            MatchResult mr = s.match();
            System.out.printf("Word found: %s at index %d to %d.%n", mr.group(),
                    mr.start(), mr.end());
        }
        s.close();
    }
}

输出为:

Word found: Java at index 74 to 78.
Word found: java at index 153 to 157.
Word found: Java at index 279 to 283.

搜索到的模式,(?i)\bjava\b,意思如下:

  • (?i) 打开不区分大小写开关
  • \b 表示单词边界
  • java 是搜索到的字符串
  • \b 又是一个单词边界。

如果搜索词来自用户,或者由于其他原因可能包含特殊字符,我建议您在字符串周围使用 \Q\E,因为它引号之间的所有字符(如果您真的很挑剔,请确保输入不包含 \E 本身)。

As others have pointed out, you could use the Scanner class.

I put your question in a file, data.txt, and ran the following program:

import java.io.*;
import java.util.Scanner;
import java.util.regex.MatchResult;

public class Test {
    public static void main(String[] args) throws FileNotFoundException {
        Scanner s = new Scanner(new File("data.txt"));
        while (null != s.findWithinHorizon("(?i)\\bjava\\b", 0)) {
            MatchResult mr = s.match();
            System.out.printf("Word found: %s at index %d to %d.%n", mr.group(),
                    mr.start(), mr.end());
        }
        s.close();
    }
}

The output is:

Word found: Java at index 74 to 78.
Word found: java at index 153 to 157.
Word found: Java at index 279 to 283.

The pattern searched for, (?i)\bjava\b, means the following:

  • (?i) turn on the case-insensitive switch
  • \b means a word boundry
  • java is the string searched for
  • \b a word boundry again.

If the search term comes from the user, or if it for some other reason may contain special characters, I suggest you use \Q and \E around the string, as it quotes all characters in between, (and if you're really picky, make sure the input doesn't contain \E itself).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文