当前位置：文江博客话题详情

从列表中识别文档中是否存在关键字

发布于 2025-01-04 17:40:26 字数 301 浏览 1 评论 0原文

我想根据预先确定的列表为 Lucene 文档创建标签列表。

因此，如果我们有一个文档，其文本为

寻找具有 Lucene 经验的 Java 程序员，

并且我们有关键字列表（大约 1000 项）

java、php、lucene、c# [.. .]

我想识别文档中存在关键字Java和Lucene。仅执行 java OR php OR lucene 是行不通的，因为那样我将不知道哪个关键字生成了命中。

关于如何在 Lucene 中实现这一点有什么建议吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦里人 2025-01-11 17:40:26

我假设您有一个或多个索引字段，并且您希望根据关键字与文档的索引术语的交集来构建标签云。

您的问题与突出显示非常相似，因此相同的想法适用，您可以：

重新分析 Lucene 文档的存储字段，
使用术语矢量，用于快速访问文档的存储字段。

请注意，如果要使用术语向量，则需要在编译时启用它们（请参阅 Field.TermVector.YES 文档和字段构造函数）。

回复收藏 0 原文

从来不烧饼 2025-01-11 17:40:26

是的，这有效

FullTextSession fts = Search.getFullTextSession(getSessionFactory().getCurrentSession());

Query q = fts.getSearchFactory().buildQueryBuilder()
    .forEntity(Offer.class).get()
    .keyword()
    .onField("id")
    .matching(myId)
    .createQuery();
Object[] dId = (Object[]) fts.createFullTextQuery(q, Offer.class)
    .setProjection(ProjectionConstants.DOCUMENT_ID)
    .uniqueResult();

if(dId != null){

    IndexReader indexReader = fts.getSearchFactory().getIndexReaderAccessor().open(Offer.class);

    TermFreqVector freq = indexReader.getTermFreqVector((Integer) dId[0], "description");

}

您必须记住在该字段的休眠搜索注释中使用 TermVector.YES 对该字段进行索引。

Yes, this works

FullTextSession fts = Search.getFullTextSession(getSessionFactory().getCurrentSession());

Query q = fts.getSearchFactory().buildQueryBuilder()
    .forEntity(Offer.class).get()
    .keyword()
    .onField("id")
    .matching(myId)
    .createQuery();
Object[] dId = (Object[]) fts.createFullTextQuery(q, Offer.class)
    .setProjection(ProjectionConstants.DOCUMENT_ID)
    .uniqueResult();

if(dId != null){

    IndexReader indexReader = fts.getSearchFactory().getIndexReaderAccessor().open(Offer.class);

    TermFreqVector freq = indexReader.getTermFreqVector((Integer) dId[0], "description");

}

You have to remember to index the field with TermVector.YES in your hibernate search annotation for the field.

回复收藏 0 原文

~没有更多了~