Lucene TermPositionVector 和检索索引位置处的术语

发布于 2024-09-15 09:23:30 字数 405 浏览 9 评论 0原文

我一直在寻找这个答案，但我仍然一无所知：

我正在使用

int[] getTermPositions(int index)

一个 TermPositionVector 字段（已设置为存储偏移量和位置）来获取术语位置我有兴趣在上下文中突出显示作为关键字的术语。

问题：这些位置对应什么？显然不是

String[] getTerms()

TermFreqVector 接口返回的，因为它只包含我的术语的原始计数。

我正在寻找的是一种获取字段的“标记化”数组的方法，以便我可以提取 getTermPositions(int index) 返回的索引值周围的术语

帮助？非常感谢。

原文

I've been looking like mad for an answer to this however I'm still in the dark:

i am using

int[] getTermPositions(int index)

of a TermPositionVector I have for a field (which has been set to store both offsets and positions) to get the term positions of the terms I'm interested in highlighting as keyword in context.

The question: What do these positions correspond to? Obviously not the

String[] getTerms()

that is returned by the TermFreqVector interface, as that contains just raw counts of my terms.

What I'm looking for is a way to get the "tokenized" array of my field so I can then pull out the surrounding terms around the index values returned by getTermPositions(int index)

Help? Thanks a bunch.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

另类 2024-09-22 09:23:34

int[] getTermPositions(int index)

返回第 i 项的项位置数组。您可以使用 TermFreqVector 的方法获取索引 i

int indexOf(String term)

。术语位置是给定术语出现的位置（以术语为单位）。例如，

// source text:
// term position 0   1     2     3   4     5    6   7    8
//               the quick brown fox jumps over the lazy dog

// terms:
// term index 0     1   2   3    4    5    6     7
//            brown dog fox jump lazy over quick the

// Suppose we want to find the positions where "the" occurs

int index = termPositionVector.indexOf("the"); // 7
int positions = termPositionVector.getTermPositions(index); // {0, 6}

int[] getTermPositions(int index)

returns an array of the term positions of term i. You can get the index i using the

int indexOf(String term)

method of TermFreqVector. The term positions are the positions (with term as the unit) at which the given term occurs. For example,

// source text:
// term position 0   1     2     3   4     5    6   7    8
//               the quick brown fox jumps over the lazy dog

// terms:
// term index 0     1   2   3    4    5    6     7
//            brown dog fox jump lazy over quick the

// Suppose we want to find the positions where "the" occurs

int index = termPositionVector.indexOf("the"); // 7
int positions = termPositionVector.getTermPositions(index); // {0, 6}

回复收藏 0 原文