修改通过标准标记生成器生成的标记
我试图了解标准分词器的工作原理。下面是我的 tokenizerFactory 文件中的代码:
package pl.allegro.tech.elasticsearch.index.analysis.pl;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.IndexSettings;
import org.elasticsearch.index.analysis.AbstractTokenizerFactory;
public class UrlTokenizerFactory extends AbstractTokenizerFactory {
public UrlTokenizerFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) {
super(indexSettings, name, settings);
}
@Override
public Tokenizer create() {
StandardTokenizer t = new StandardTokenizer();
return t;
}
}
我想修改通过标准标记生成器生成的每个标记。例如,只是为了测试我是否可以修改令牌;我想在每个标记的末尾添加一个“a”或任何其他字符。我尝试使用“+”运算符在上述创建函数的返回语句中连接令牌末尾的“a”字符,但它不起作用。有人知道如何实现这个吗?
I was trying to understand the working of standard tokenizer. Below is the code inside my tokenizerFactory file:
package pl.allegro.tech.elasticsearch.index.analysis.pl;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.IndexSettings;
import org.elasticsearch.index.analysis.AbstractTokenizerFactory;
public class UrlTokenizerFactory extends AbstractTokenizerFactory {
public UrlTokenizerFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) {
super(indexSettings, name, settings);
}
@Override
public Tokenizer create() {
StandardTokenizer t = new StandardTokenizer();
return t;
}
}
I want to modify each and every token generated through the standard tokenizer. For example, just to test that I can modify the tokens; I want to add an "a" or any other character at the end of every token. I tried to concatenate the "a" character at the end of the token in the return statement of the above create function using the "+" operator but it didn't worked. Anyone have any idea on how to implement this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以定义模式替换字符过滤器< /a> 使用自定义分析器。这将添加
_a
以及生成的所有令牌,您无需在 Java 代码中更新它。输出:
You can define Pattern replace char filter with custom analyzer. this will add the
_a
with all the token generated and you not need to update it in Java code.Output: