Lucene：更改默认的分面分隔符？

发布于 2024-12-26 11:03:46 字数 2401 浏览 7 评论 0原文

在这个精彩网站上的第一篇文章！

我的目标是使用分层方面来使用 Lucene 搜索索引。但是，我的方面需要用“/”以外的字符分隔（在本例中为“~”）。示例：

类别类别~类别1 Categories~Category2

我创建了一个实现 FacetIndexingParams 接口的类（DefaultFacetIndexingParams 的副本，其中 DEFAULT_FACET_DELIM_CHAR 参数设置为“~”）。

释义索引代码：（使用 FSDirectory 进行索引和分类）

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34)
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer)
IndexWriter writer = new IndexWriter(indexDir, config)
TaxonomyWriter taxo = new LuceneTaxonomyWriter(taxDir, OpenMode.CREATE)

Document doc = new Document()
// Add bunch of Fields... hidden for the sake of brevity
List<CategoryPath> categories = new ArrayList<CategoryPath>()
row.tags.split('\\|').each{ tag ->
    def cp = new CategoryPath()
    tag.split('~').each{
        cp.add(it)
    }
    categories.add(cp)
}
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
DocumentBuilder categoryDocBuilder = new CategoryDocumentBuilder(taxo, facetIndexingParams)
categoryDocBuilder.setCategoryPaths(categories).build(doc)
writer.addDocument(doc)

// Commit and close both writer and taxo.

释义搜索代码：

// Create index and taxonomoy readers to get info from index and taxonomy
IndexReader indexReader = IndexReader.open(indexDir)
TaxonomyReader taxo = new LuceneTaxonomyReader(taxDir)
Searcher searcher = new IndexSearcher(indexReader)

QueryParser parser = new QueryParser(Version.LUCENE_34, "content", new StandardAnalyzer(Version.LUCENE_34))
parser.setAllowLeadingWildcard(true)
Query q = parser.parse(query)
TopScoreDocCollector tdc = TopScoreDocCollector.create(10, true)
List<FacetResult> res = null
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
FacetSearchParams facetSearchParams = new FacetSearchParams(facetIndexingParams)
CountFacetRequest cfr = new CountFacetRequest(new CategoryPath(""), 99)
cfr.setDepth(2)
cfr.setSortBy(SortBy.VALUE)
facetSearchParams.addFacetRequest(cfr)
FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader, taxo)

def cp = new CategoryPath("Category~Category1", (char)'~')
searcher.search(DrillDown.query(q, cp), MultiCollector.wrap(tdc, facetsCollector))

结果始终返回“Category/Category1”形式的构面列表。

我已经使用 Luke 工具查看了索引，看来各个方面是由索引中的“~”字符分隔的。

做到这一点的最佳途径是什么？非常感谢任何帮助！

原文

First post on this wonderful site!

My goal is to use hierarchical facets for searching an index using Lucene. However, my facets need to be delimited by a character other than '/', (in this case, '~'). Example:

Categories
Categories~Category1
Categories~Category2

I have created a class that implements FacetIndexingParams interface (a copy of DefaultFacetIndexingParams with the DEFAULT_FACET_DELIM_CHAR param set to '~').

Paraphrased indexing code : (using FSDirectory for both index and taxonomy)

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34)
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer)
IndexWriter writer = new IndexWriter(indexDir, config)
TaxonomyWriter taxo = new LuceneTaxonomyWriter(taxDir, OpenMode.CREATE)

Document doc = new Document()
// Add bunch of Fields... hidden for the sake of brevity
List<CategoryPath> categories = new ArrayList<CategoryPath>()
row.tags.split('\\|').each{ tag ->
    def cp = new CategoryPath()
    tag.split('~').each{
        cp.add(it)
    }
    categories.add(cp)
}
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
DocumentBuilder categoryDocBuilder = new CategoryDocumentBuilder(taxo, facetIndexingParams)
categoryDocBuilder.setCategoryPaths(categories).build(doc)
writer.addDocument(doc)

// Commit and close both writer and taxo.

Search code paraphrased:

// Create index and taxonomoy readers to get info from index and taxonomy
IndexReader indexReader = IndexReader.open(indexDir)
TaxonomyReader taxo = new LuceneTaxonomyReader(taxDir)
Searcher searcher = new IndexSearcher(indexReader)

QueryParser parser = new QueryParser(Version.LUCENE_34, "content", new StandardAnalyzer(Version.LUCENE_34))
parser.setAllowLeadingWildcard(true)
Query q = parser.parse(query)
TopScoreDocCollector tdc = TopScoreDocCollector.create(10, true)
List<FacetResult> res = null
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
FacetSearchParams facetSearchParams = new FacetSearchParams(facetIndexingParams)
CountFacetRequest cfr = new CountFacetRequest(new CategoryPath(""), 99)
cfr.setDepth(2)
cfr.setSortBy(SortBy.VALUE)
facetSearchParams.addFacetRequest(cfr)
FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader, taxo)

def cp = new CategoryPath("Category~Category1", (char)'~')
searcher.search(DrillDown.query(q, cp), MultiCollector.wrap(tdc, facetsCollector))

The results always return a list of facets in the form of "Category/Category1".

I have used the Luke tool to look at the index and it appears the facets are being delimited by the '~' character in the index.

What is the best route to do this? Any help is greatly appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

平定天下 2025-01-02 11:03:46

我已经弄清楚了这个问题。搜索和索引正在按预期工作。问题在于我如何获得方面结果。我正在使用：

res = facetsCollector.getFacetResults()
res.each{ result ->
    result.getFacetResultNode().getLabel().toString()
}

我需要使用的是：

res = facetsCollector.getFacetResults()
res.each{ result ->
    result.getFacetResultNode().getLabel().toString((char)'~')
}

区别在于发送到 toString 函数的参数！

很容易被忽视，很难找到。

希望这对其他人有帮助。

I have figured out the issue. The search and indexing are working as they are supposed to. It is how I have been getting the facet results that is the issue. I was using :

res = facetsCollector.getFacetResults()
res.each{ result ->
    result.getFacetResultNode().getLabel().toString()
}

What I needed to use was :

res = facetsCollector.getFacetResults()
res.each{ result ->
    result.getFacetResultNode().getLabel().toString((char)'~')
}

The difference being the paramter sent to the toString function!

Easy to overlook, tough to find.

Hope this helps others.

回复收藏 0 原文

~没有更多了~

关于作者

烂柯人

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

Lucene：更改默认的分面分隔符？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

Lucene：更改默认的分面分隔符？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。