Lucene:更改默认的分面分隔符?

发布于 2024-12-26 11:03:46 字数 2401 浏览 7 评论 0原文

在这个精彩网站上的第一篇文章!

我的目标是使用分层方面来使用 Lucene 搜索索引。但是,我的方面需要用“/”以外的字符分隔(在本例中为“~”)。示例:

类别 类别~类别1 Categories~Category2

我创建了一个实现 FacetIndexingParams 接口的类(DefaultFacetIndexingParams 的副本,其中 DEFAULT_FACET_DELIM_CHAR 参数设置为“~”)。

释义索引代码:(使用 FSDirectory 进行索引和分类)

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34)
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer)
IndexWriter writer = new IndexWriter(indexDir, config)
TaxonomyWriter taxo = new LuceneTaxonomyWriter(taxDir, OpenMode.CREATE)

Document doc = new Document()
// Add bunch of Fields... hidden for the sake of brevity
List<CategoryPath> categories = new ArrayList<CategoryPath>()
row.tags.split('\\|').each{ tag ->
    def cp = new CategoryPath()
    tag.split('~').each{
        cp.add(it)
    }
    categories.add(cp)
}
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
DocumentBuilder categoryDocBuilder = new CategoryDocumentBuilder(taxo, facetIndexingParams)
categoryDocBuilder.setCategoryPaths(categories).build(doc)
writer.addDocument(doc)

// Commit and close both writer and taxo.

释义搜索代码:

// Create index and taxonomoy readers to get info from index and taxonomy
IndexReader indexReader = IndexReader.open(indexDir)
TaxonomyReader taxo = new LuceneTaxonomyReader(taxDir)
Searcher searcher = new IndexSearcher(indexReader)

QueryParser parser = new QueryParser(Version.LUCENE_34, "content", new StandardAnalyzer(Version.LUCENE_34))
parser.setAllowLeadingWildcard(true)
Query q = parser.parse(query)
TopScoreDocCollector tdc = TopScoreDocCollector.create(10, true)
List<FacetResult> res = null
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
FacetSearchParams facetSearchParams = new FacetSearchParams(facetIndexingParams)
CountFacetRequest cfr = new CountFacetRequest(new CategoryPath(""), 99)
cfr.setDepth(2)
cfr.setSortBy(SortBy.VALUE)
facetSearchParams.addFacetRequest(cfr)
FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader, taxo)

def cp = new CategoryPath("Category~Category1", (char)'~')
searcher.search(DrillDown.query(q, cp), MultiCollector.wrap(tdc, facetsCollector))

结果始终返回“Category/Category1”形式的构面列表。

我已经使用 Luke 工具查看了索引,看来各个方面是由索引中的“~”字符分隔的。

做到这一点的最佳途径是什么?非常感谢任何帮助!

First post on this wonderful site!

My goal is to use hierarchical facets for searching an index using Lucene. However, my facets need to be delimited by a character other than '/', (in this case, '~'). Example:

Categories
Categories~Category1
Categories~Category2

I have created a class that implements FacetIndexingParams interface (a copy of DefaultFacetIndexingParams with the DEFAULT_FACET_DELIM_CHAR param set to '~').

Paraphrased indexing code : (using FSDirectory for both index and taxonomy)

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34)
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer)
IndexWriter writer = new IndexWriter(indexDir, config)
TaxonomyWriter taxo = new LuceneTaxonomyWriter(taxDir, OpenMode.CREATE)

Document doc = new Document()
// Add bunch of Fields... hidden for the sake of brevity
List<CategoryPath> categories = new ArrayList<CategoryPath>()
row.tags.split('\\|').each{ tag ->
    def cp = new CategoryPath()
    tag.split('~').each{
        cp.add(it)
    }
    categories.add(cp)
}
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
DocumentBuilder categoryDocBuilder = new CategoryDocumentBuilder(taxo, facetIndexingParams)
categoryDocBuilder.setCategoryPaths(categories).build(doc)
writer.addDocument(doc)

// Commit and close both writer and taxo.

Search code paraphrased:

// Create index and taxonomoy readers to get info from index and taxonomy
IndexReader indexReader = IndexReader.open(indexDir)
TaxonomyReader taxo = new LuceneTaxonomyReader(taxDir)
Searcher searcher = new IndexSearcher(indexReader)

QueryParser parser = new QueryParser(Version.LUCENE_34, "content", new StandardAnalyzer(Version.LUCENE_34))
parser.setAllowLeadingWildcard(true)
Query q = parser.parse(query)
TopScoreDocCollector tdc = TopScoreDocCollector.create(10, true)
List<FacetResult> res = null
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
FacetSearchParams facetSearchParams = new FacetSearchParams(facetIndexingParams)
CountFacetRequest cfr = new CountFacetRequest(new CategoryPath(""), 99)
cfr.setDepth(2)
cfr.setSortBy(SortBy.VALUE)
facetSearchParams.addFacetRequest(cfr)
FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader, taxo)

def cp = new CategoryPath("Category~Category1", (char)'~')
searcher.search(DrillDown.query(q, cp), MultiCollector.wrap(tdc, facetsCollector))

The results always return a list of facets in the form of "Category/Category1".

I have used the Luke tool to look at the index and it appears the facets are being delimited by the '~' character in the index.

What is the best route to do this? Any help is greatly appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

平定天下 2025-01-02 11:03:46

我已经弄清楚了这个问题。搜索和索引正在按预期工作。问题在于我如何获得方面结果。我正在使用:

res = facetsCollector.getFacetResults()
res.each{ result ->
    result.getFacetResultNode().getLabel().toString()
}

我需要使用的是:

res = facetsCollector.getFacetResults()
res.each{ result ->
    result.getFacetResultNode().getLabel().toString((char)'~')
}

区别在于发送到 toString 函数的参数!

很容易被忽视,很难找到。

希望这对其他人有帮助。

I have figured out the issue. The search and indexing are working as they are supposed to. It is how I have been getting the facet results that is the issue. I was using :

res = facetsCollector.getFacetResults()
res.each{ result ->
    result.getFacetResultNode().getLabel().toString()
}

What I needed to use was :

res = facetsCollector.getFacetResults()
res.each{ result ->
    result.getFacetResultNode().getLabel().toString((char)'~')
}

The difference being the paramter sent to the toString function!

Easy to overlook, tough to find.

Hope this helps others.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文