如何使用lucene搜索文件

发布于 2024-12-28 02:11:33 字数 1750 浏览 2 评论 0原文

我想在文件“fdictionary.txt”中搜索查询，该文件包含逐行写入的单词列表（230,000 个单词）。有什么建议为什么这段代码不起作用？拼写检查部分正在运行，并为我提供了建议列表（我将列表的长度限制为 1）。我想做的是搜索该词典，如果该词已经在其中，则不要调用拼写检查。我的搜索功能不起作用。它不会给我错误！这是我已经实现的：

public class SpellCorrection {

public static File indexDir = new File("/../idxDir");

public static void main(String[] args) throws IOException, FileNotFoundException, CorruptIndexException, ParseException {

    Directory directory = FSDirectory.open(indexDir);
    SpellChecker spell = new SpellChecker(directory);

    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_20, null);
    File dictionary = new File("/../fdictionary00.txt");
    spell.indexDictionary(new PlainTextDictionary(dictionary), config, true);


    String query = "red"; //kne, console
    String correctedQuery = query; //kne, console

    if (!search(directory, query)) {
        String[] suggestions = spell.suggestSimilar(query, 1);
        if (suggestions != null) {correctedQuery=suggestions[0];}
    }

    System.out.println("The Query was: "+query);
    System.out.println("The Corrected Query is: "+correctedQuery);
}

public static boolean search(Directory directory, String queryTerm) throws FileNotFoundException, CorruptIndexException, IOException, ParseException {
    boolean isIn = false;

    IndexReader indexReader = IndexReader.open(directory);
    IndexSearcher indexSearcher = new IndexSearcher(indexReader);
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_20);

    Term term = new Term(queryTerm);
    Query termQuery = new TermQuery(term);
    TopDocs hits = indexSearcher.search(termQuery, 100);
    System.out.println(hits.totalHits);


    if (hits.totalHits > 0) {
        isIn = true;
    }
    return isIn;
}
}

原文

I want to do a search for a query within a file "fdictionary.txt" containing a list of words (230,000 words) written line by line. any suggestion why this code is not working?
The spell checking part is working and gives me the list of suggestions (I limited the length of the list to 1). what I want to do is to search that fdictionary and if the word is already in there, do not call spell checking. My Search function is not working. It does not give me error! Here is what I have implemented:

public class SpellCorrection {

public static File indexDir = new File("/../idxDir");

public static void main(String[] args) throws IOException, FileNotFoundException, CorruptIndexException, ParseException {

    Directory directory = FSDirectory.open(indexDir);
    SpellChecker spell = new SpellChecker(directory);

    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_20, null);
    File dictionary = new File("/../fdictionary00.txt");
    spell.indexDictionary(new PlainTextDictionary(dictionary), config, true);


    String query = "red"; //kne, console
    String correctedQuery = query; //kne, console

    if (!search(directory, query)) {
        String[] suggestions = spell.suggestSimilar(query, 1);
        if (suggestions != null) {correctedQuery=suggestions[0];}
    }

    System.out.println("The Query was: "+query);
    System.out.println("The Corrected Query is: "+correctedQuery);
}

public static boolean search(Directory directory, String queryTerm) throws FileNotFoundException, CorruptIndexException, IOException, ParseException {
    boolean isIn = false;

    IndexReader indexReader = IndexReader.open(directory);
    IndexSearcher indexSearcher = new IndexSearcher(indexReader);
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_20);

    Term term = new Term(queryTerm);
    Query termQuery = new TermQuery(term);
    TopDocs hits = indexSearcher.search(termQuery, 100);
    System.out.println(hits.totalHits);


    if (hits.totalHits > 0) {
        isIn = true;
    }
    return isIn;
}
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

千と千尋 2025-01-04 02:11:33

您在哪里索引 fdictionary00.txt 的内容？

仅当您有index.html时，您才可以使用IndexSearcher进行搜索。如果您是 lucene 新手，您可能需要查看一些快速教程。（如http://lucenetutorial.com/lucene-in-5-minutes.html)

回复收藏 0 原文

倾城花音 2025-01-04 02:11:33

您从未建立过索引。

您需要设置索引...

Directory directory = FSDirectory.open(indexDir);
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_20);
IndexWriter writer = new IndexWriter(directory,analyzer,true,IndexWriter.MaxFieldLength.UNLIMITED );

然后需要创建一个文档并将每个术语作为分析字段添加到文档中。

 Document doc = new Document();
 doc.Add(new Field("name", word , Field.Store.YES, Field.Index.ANALYZED));

然后将文档添加到索引中

writer.AddDocument(doc);

writer.Optimize();

现在构建索引并关闭索引编写器。

writer.Commit();
writer.Close();

You never built the index.

You need to setup the index...

Directory directory = FSDirectory.open(indexDir);
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_20);
IndexWriter writer = new IndexWriter(directory,analyzer,true,IndexWriter.MaxFieldLength.UNLIMITED );

You then need to create a document and add each term to the document as an analyzed field..

 Document doc = new Document();
 doc.Add(new Field("name", word , Field.Store.YES, Field.Index.ANALYZED));

Then add the document to the index

writer.AddDocument(doc);

writer.Optimize();

Now build the index and close the index writer.

writer.Commit();
writer.Close();

回复收藏 0 原文

贪了杯 2025-01-04 02:11:33

您可以使您的 SpellChecker 实例在服务中可用并使用 spellChecker.exist(word)。

请注意，SpellChecker 不会索引 2 个字符或更少的单词。要解决此问题，您可以在创建索引后将它们添加到索引中（将它们添加到 SpellChecker.F_WORD 字段中）。

如果您想添加到实时索引并使其可用于 exist(word)，那么您需要将它们添加到 SpellChecker.F_WORD 字段。当然，因为您没有添加到所有其他字段（例如克/开始/结束等），所以您的单词不会显示为其他拼写错误单词的建议。

在这种情况下，您必须将单词添加到文件中，以便在您重新创建索引时将其作为建议使用。如果该项目将 SpellChecker.createDocument(...) 设为公共/受保护，而不是私有，那就太好了，因为此方法通过添加单词来完成所有操作。

毕竟，您需要调用 spellChecker.setSpellIndex(directory)。

回复收藏 0 原文

~没有更多了~

关于作者

烧了回忆取暖

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

如何使用lucene搜索文件

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如何使用lucene搜索文件

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。