构建索引时提升 Lucene 术语
创建索引时(而不是查询索引时)是否可以确定特定术语比其他术语更重要?
例如,考虑同义词过滤器:
文档 1:“这是一辆好车”
文档 2:“这是一辆不错的车辆”
我想将术语“车辆”添加到第一个文档,将术语“汽车”添加到第二个文档, 但我希望,如果稍后使用单词 car 查询索引,那么第一个文档的得分将高于第二个文档,如果查询车辆,则相反。
在将字段添加到各自的文档之前调用 setBoost 可以解决问题吗?
或者也许我应该将同义词添加到不同的字段名称?
还是我从错误的角度看待这个问题?
谢谢
Is it possible to determine that specific terms are more important then other when creating the index (not when querying it) ?
Consider for example a synonym filter:
doc 1: "this is a nice car"
doc 2: "this is a nice vehicle"
I want to add the term vehicle to the first doc and the term car to the second doc,
but I want that if later the index is queried with the word car then the first document will be scored higher then the second one and if queried for vehicle it will be the other way around.
Will calling setBoost on the fields before adding them to their respective documents do the trick?
Or maybe I should add the synonyms to a different field name?
Or am I looking at this from a wrong point of view ?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在字段上设置提升会影响该字段中的所有术语,因此这在您的情况下不起作用。
但使用 Lucene 有效负载(可以为每个术语设置的字节数组)应该是可能的。您可以使用它们来设置特定于术语的增强(例如,对于文档 1,车辆为 0.5)。然后,您将实现自己的
Similarity
并重写scorePayload()
方法来解码该提升,然后使用PayloadTermQuery
这允许您为分数基于您在该术语的有效负载中拥有的靴子。Setting boost on a filed affects all terms in that field so this wouldn't work in your case.
But it should be posible using Lucene payloads (a byte array that can be set for every term). You would use them to set term specific boosts (vehicle to 0.5 for doc 1, for example). Then you'll implement your own
Similarity
and overridescorePayload()
method to decode that boost and then usePayloadTermQuery
which allows you to contribute to the score based on the boots you have in the payload for that term.