Lucene TermPositionVector 和检索索引位置处的术语

发布于 2024-09-15 09:23:30 字数 405 浏览 9 评论 0原文

我一直在寻找这个答案,但我仍然一无所知:

我正在使用

int[] getTermPositions(int index)

一个 TermPositionVector 字段(已设置为存储偏移量和位置)来获取术语位置我有兴趣在上下文中突出显示作为关键字的术语。

问题:这些位置对应什么?显然不是

String[] getTerms()

TermFreqVector 接口返回的,因为它只包含我的术语的原始计数。

我正在寻找的是一种获取字段的“标记化”数组的方法,以便我可以提取 getTermPositions(int index) 返回的索引值周围的术语

帮助?非常感谢。

I've been looking like mad for an answer to this however I'm still in the dark:

i am using

int[] getTermPositions(int index)

of a TermPositionVector I have for a field (which has been set to store both offsets and positions) to get the term positions of the terms I'm interested in highlighting as keyword in context.

The question: What do these positions correspond to? Obviously not the

String[] getTerms()

that is returned by the TermFreqVector interface, as that contains just raw counts of my terms.

What I'm looking for is a way to get the "tokenized" array of my field so I can then pull out the surrounding terms around the index values returned by getTermPositions(int index)

Help? Thanks a bunch.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

另类 2024-09-22 09:23:34
int[] getTermPositions(int index)

返回第 i 项的项位置数组。您可以使用 TermFreqVector 的方法获取索引 i

int indexOf(String term)

。术语位置是给定术语出现的位置(以术语为单位)。例如,

// source text:
// term position 0   1     2     3   4     5    6   7    8
//               the quick brown fox jumps over the lazy dog

// terms:
// term index 0     1   2   3    4    5    6     7
//            brown dog fox jump lazy over quick the

// Suppose we want to find the positions where "the" occurs

int index = termPositionVector.indexOf("the"); // 7
int positions = termPositionVector.getTermPositions(index); // {0, 6}
int[] getTermPositions(int index)

returns an array of the term positions of term i. You can get the index i using the

int indexOf(String term)

method of TermFreqVector. The term positions are the positions (with term as the unit) at which the given term occurs. For example,

// source text:
// term position 0   1     2     3   4     5    6   7    8
//               the quick brown fox jumps over the lazy dog

// terms:
// term index 0     1   2   3    4    5    6     7
//            brown dog fox jump lazy over quick the

// Suppose we want to find the positions where "the" occurs

int index = termPositionVector.indexOf("the"); // 7
int positions = termPositionVector.getTermPositions(index); // {0, 6}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文