获取搜索字段中突出显示片段的开始和结束索引
“我的搜索返回一个字段中突出显示的片段。我想知道在特定搜索文档的该字段中,该片段在哪里开始和结束?”
例如。
考虑我正在上面的行中搜索“突出显示的片段”(将上面的段落视为单个文档)。
我将分段器设置为:
SimpleFragmenter fragmenter =
new SimpleFragmenter(30);
现在 GetBestFragment 的输出有点像:“从返回突出显示的片段”
是否可以开始以及上面文本中该片段的结束索引(假设开始是 10,结束是 45)
"My search returns a highlighted fragment from a field. I want to know that in that field of particular searched document, where does that fragment starts and ends ?"
for instance.
consider i am searching "highlighted fragment" in above lines (consider the above para as single document).
I am setting my fragmenter as :
SimpleFragmenter fragmenter =
new SimpleFragmenter(30);
now the output of GetBestFragment is somewhat like : "returns a highlighted fragment from"
Is it possible to get the starting and ending index of this fragment in the text above (say starting is 10 and ending is 45)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当您使用 getBestFragment 方法时,荧光笔不会返回该信息。在幕后,荧光笔使用 TokenGroup 类
获取每个片段的开始和结束索引。您可能可以使用该类。
The Highlighter does not returns that information when you use the methods getBestFragment. Behind the scene the Highlighter uses the TokenGroup class
to get the start and the end index of each fragment. You could probably use that class.
几个月前我就这么做了。您必须构建自定义 格式化程序和编码器。
基本上,在荧光笔内部,格式化程序处理选择用于突出显示的标记,而编码器处理其余标记。在您的情况下,您需要编码器在每次调用时发出空值,并且格式化程序发出开始索引和结束索引。它们确实存储在突出显示部分的 TokenGroup 中。您的荧光笔应该使用这些自定义格式化程序和编码器来构建。
I did just that a few months ago. You have to build custom Formatter and Encoder.
Basically, inside the highlighter, the formatter processes the tokens chosen for highlighting, while the encoder processes the rest of the tokens. In your case, you need the encoder to emit the empty each time it is called, and the formatter to emit the start index and the end index. They are indeed stored in the TokenGroup of the highlighted parts. Your highlighter should be constructed using these custom formatter and encoder.