如何持久化Lucene文档索引,使得每次程序启动时不需要加载文档?

发布于 2025-01-02 17:51:21 字数 2057 浏览 2 评论 0原文

我正在尝试设置 Lucene 来处理数据库中存储的一些文档。我从这个 HelloWorld 示例开始。但是,创建的索引不会持久保存在任何地方,并且需要在每次运行程序时重新创建。有没有办法保存Lucene创建的索引,这样每次程序启动时就不需要将文档加载到其中?

public class HelloLucene {
  public static void main(String[] args) throws IOException, ParseException {
    // 0. Specify the analyzer for tokenizing text.
    //    The same analyzer should be used for indexing and searching
    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);

    // 1. create the index
    Directory index = new RAMDirectory();

    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35, analyzer);

    IndexWriter w = new IndexWriter(index, config);
    addDoc(w, "Lucene in Action");
    addDoc(w, "Lucene for Dummies");
    addDoc(w, "Managing Gigabytes");
    addDoc(w, "The Art of Computer Science");
    w.close();

    // 2. query
    String querystr = args.length > 0 ? args[0] : "lucene";

    // the "title" arg specifies the default field to use
    // when no field is explicitly specified in the query.
    Query q = new QueryParser(Version.LUCENE_35, "title", analyzer).parse(querystr);

    // 3. search
    int hitsPerPage = 10;
    IndexReader reader = IndexReader.open(index);
    IndexSearcher searcher = new IndexSearcher(reader);
    TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
    searcher.search(q, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;

    // 4. display results
    System.out.println("Found " + hits.length + " hits.");
    for(int i=0;i<hits.length;++i) {
      int docId = hits[i].doc;
      Document d = searcher.doc(docId);
      System.out.println((i + 1) + ". " + d.get("title"));
    }

    // searcher can only be closed when there
    // is no need to access the documents any more. 
    searcher.close();
  }

  private static void addDoc(IndexWriter w, String value) throws IOException {
    Document doc = new Document();
    doc.add(new Field("title", value, Field.Store.YES, Field.Index.ANALYZED));
    w.addDocument(doc);
  }
}

I am trying to set up Lucene to process some documents stored in the database. I started with this HelloWorld sample. However, the index that is created is not persisted anywhere and needs to be re-created each time the program is run. Is there a way to save the index that Lucene creates so that the documents do not need to be loaded into it each time the program starts up?

public class HelloLucene {
  public static void main(String[] args) throws IOException, ParseException {
    // 0. Specify the analyzer for tokenizing text.
    //    The same analyzer should be used for indexing and searching
    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);

    // 1. create the index
    Directory index = new RAMDirectory();

    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35, analyzer);

    IndexWriter w = new IndexWriter(index, config);
    addDoc(w, "Lucene in Action");
    addDoc(w, "Lucene for Dummies");
    addDoc(w, "Managing Gigabytes");
    addDoc(w, "The Art of Computer Science");
    w.close();

    // 2. query
    String querystr = args.length > 0 ? args[0] : "lucene";

    // the "title" arg specifies the default field to use
    // when no field is explicitly specified in the query.
    Query q = new QueryParser(Version.LUCENE_35, "title", analyzer).parse(querystr);

    // 3. search
    int hitsPerPage = 10;
    IndexReader reader = IndexReader.open(index);
    IndexSearcher searcher = new IndexSearcher(reader);
    TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
    searcher.search(q, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;

    // 4. display results
    System.out.println("Found " + hits.length + " hits.");
    for(int i=0;i<hits.length;++i) {
      int docId = hits[i].doc;
      Document d = searcher.doc(docId);
      System.out.println((i + 1) + ". " + d.get("title"));
    }

    // searcher can only be closed when there
    // is no need to access the documents any more. 
    searcher.close();
  }

  private static void addDoc(IndexWriter w, String value) throws IOException {
    Document doc = new Document();
    doc.add(new Field("title", value, Field.Store.YES, Field.Index.ANALYZED));
    w.addDocument(doc);
  }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

你穿错了嫁妆 2025-01-09 17:51:21

您正在 RAM 中创建索引:

Directory index = new RAMDirectory();

http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/store/RAMDirectory.html

IIRC,您只需将其切换到基于文件系统的目录实现之一。
http://lucene.apache.org /java/3_0_1/api/core/org/apache/lucene/store/Directory.html

You're creating the index in RAM:

Directory index = new RAMDirectory();

http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/store/RAMDirectory.html

IIRC, you just need to switch that to one of the filesystem based Directory implementations.
http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/store/Directory.html

天气好吗我好吗 2025-01-09 17:51:21

如果您想继续使用 RAMDirectory 在搜索期间(由于性能优势),但不希望每次都从头开始构建索引,您可以首先使用基于文件系统的目录创建索引,例如 NIOFSDirectory (不如果您在 Windows 上,请不要使用)。然后进行搜索,使用构造函数 RAMDirectory(目录目录)

If you want to keep using RAMDirectory during searching (due to performance benefits) but don't want the index to be built from scratch every time, you can first create your index using a file system based directory like NIOFSDirectory (don't use if you're on windows). And then come search time, open a copy of the original directory using the constructor RAMDirectory(Directory dir).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文