Lucene TermPositionVector 和检索索引位置处的术语
我一直在寻找这个答案,但我仍然一无所知:
我正在使用
int[] getTermPositions(int index)
一个 TermPositionVector 字段(已设置为存储偏移量和位置)来获取术语位置我有兴趣在上下文中突出显示作为关键字的术语。
问题:这些位置对应什么?显然不是
String[] getTerms()
TermFreqVector 接口返回的,因为它只包含我的术语的原始计数。
我正在寻找的是一种获取字段的“标记化”数组的方法,以便我可以提取 getTermPositions(int index)
返回的索引值周围的术语
帮助?非常感谢。
I've been looking like mad for an answer to this however I'm still in the dark:
i am using
int[] getTermPositions(int index)
of a TermPositionVector I have for a field (which has been set to store both offsets and positions) to get the term positions of the terms I'm interested in highlighting as keyword in context.
The question: What do these positions correspond to? Obviously not the
String[] getTerms()
that is returned by the TermFreqVector interface, as that contains just raw counts of my terms.
What I'm looking for is a way to get the "tokenized" array of my field so I can then pull out the surrounding terms around the index values returned by getTermPositions(int index)
Help? Thanks a bunch.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
返回第 i 项的项位置数组。您可以使用 TermFreqVector 的方法获取索引 i
。术语位置是给定术语出现的位置(以术语为单位)。例如,
returns an array of the term positions of term i. You can get the index i using the
method of TermFreqVector. The term positions are the positions (with term as the unit) at which the given term occurs. For example,
好吧,这将实现我想要的:
http://lucene .apache.org/java/3_0_2/lucene-contrib/index.html#highlighter
Well, this will accomplish what I wanted:
http://lucene.apache.org/java/3_0_2/lucene-contrib/index.html#highlighter