“令牌”列表在Lucene 3上
我是 Lucene 的新手,我开始学习版本 3 分支,但有一件事我不明白(显然是因为我在该主题上没有经验)。
在 Lucene 2.9 中,如果我想要一个令牌列表,我会创建一个 Token 类的 ArrayList,例如 ArrayList。这对我来说非常直观,而且代币的概念也非常清晰。
既然不鼓励使用 Token 类,而转而使用基于属性的 API,我是否必须创建自己的类来封装我想要的属性?如果是的话,那不是几乎重新创建了 Lucene 的 Token 类吗?
我正在上一堂课来测试分析器,我想,拥有一个结果标记列表可以更容易测试。
任何帮助将不胜感激;) 谢谢你!
I'm new to Lucene, i started learning the version 3 branch and there's one thing i don't understand (obviously because i'm not experienced in the subject).
In Lucene 2.9, if i wanted a list of tokens i would create an ArrayList of Token class, ArrayList for example. That's pretty intuitive for me and the concept of token is very clear.
Now that the use of Token class is disencouraged in favour of the Attribute based API, do i have to create my own class to encapsulate the attributes i want? If yes, isn't that almost recreating the Lucene's Token class?
I'm doing a class to test analyzers, and having a list of resulting tokens makes it easier to test, i guess.
Any help would be appreciated ;)
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
根据 Token Javadoc ,
“尽管不再需要使用 Token,但通过新的 TokenStream API,它可以用作实现所有属性的便利类,这对于轻松从旧的 TokenStream API 切换到新的 TokenStream API 特别有用。”
我建议你继续使用Token。与上面的描述相符。
According to the Token Javadoc,
"Even though it is not necessary to use Token anymore, with the new TokenStream API it can be used as convenience class that implements all Attributes, which is especially useful to easily switch from the old to the new TokenStream API."
I suggest you keep using a Token. It matches the description above.
使用 TermAttribute 类:
Use the
TermAttribute
class:我认为你可以这样做:
TokenStream tkst =analyzer.tokenStream("字段", "文本");
Token token = tkst.getAttribute(Token.class);
while (tkst.incrementToken()) {
// 用令牌做一些事情。
}
正确的文档位于分析包中: http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/analysis/package-summary.html
I think you can do something like this:
TokenStream tkst = analyzer.tokenStream("field", "text");
Token token = tkst.getAttribute(Token.class);
while (tkst.incrementToken()) {
// Do something with token.
}
The proper documentation is in the analysis package: http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/analysis/package-summary.html