TokenStream 中存储值的 Lucene 字段
我有一个需要来自令牌流的字段;它不能用字符串实例化然后分析为标记。例如,我可能想将多个列(在我的 RDBMS 中)的数据组合到单个 Lucene 字段中,但我想以自己的方式分析每个列。因此,我不能简单地将它们全部连接为单个字符串,然后分析生成的字符串。
我现在遇到的问题是无法存储从令牌流创建的字段,这在一般情况下是有意义的,因为流可能没有明显的字符串表示形式。但是,我知道字符串表示形式,并且我想存储它。
我尝试添加相同的字段两次,一次是存储它并具有字符串数据,一次是来自令牌流,但似乎无法做到这一点。除了添加一个名为“myfield__stored”的字段之类的黑客之外,还有其他方法可以做到这一点吗?
我正在使用2.9.2。
I have a field which needs to come from a token stream; it cannot be instantiated with a string and then analyzed into tokens. For example, I might want to combine the data from multiple columns (in my RDBMS) into a single Lucene field, but I want to analyze each column in its own way. So I cannot simply concat them all as a single string then analyze the resulting string.
The problem I am running into now is that fields created from token streams cannot be stored, which makes sense in the general case since the stream may not have an obvious string representation. However, I know the string representation, and I would like to store that.
I tried adding the same field twice, once with it being stored and having string data and once with it coming from a token stream, but it seems that this can't be done. Apart from some hack like adding a field with a name of "myfield__stored" is there a way to do this?
I am using 2.9.2.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我找到了办法。您可以通过将其实例化为普通字段但稍后调用
SetTokenStream
来潜入它:因为仅当令牌流值为 null 时才对读取器/字符串值建立索引,因此将对令牌流值建立索引。无论令牌流如何,存储方法都会查看字符串/读取器,因此将存储该值。
I found a way. You can sneak it in by instantiating it as a normal field but calling
SetTokenStream
later:Because the reader/string value is only indexed if the token stream value is null, the token stream value will be indexed. The store methods look at string/reader regardless of token stream, so it will be this value which is stored.