使用什么合适的 lucene 分析器?
我在用数字和符号索引项目名称时遇到问题。我的数据示例如下所示:
ANGLE BARS ORANGE - 4.0MM 2 - 1/2"
B.I SQUARE TUBING 2" X 3"
B.I. PIPE S-40 10MM 3/8"
B.I SQUARE TUBING 1" X 2"
PLYWOOD MARINE 3/4X4X8
PLYWOOD STA. CLARA 1/8X4X8
PLYWOOD STA. CLARA 3/16X4X8
我想在白色或尾随空格中标记我的数据,而不删除符号,因为这些符号非常重要。这样,每当我搜索“胶合板 sta. clara”、“bi square 2” X 3“”或“角度橙色 2 - 1/2”时,都会给我一个结果。我尝试使用空白分析器,但符号被删除。我也尝试过标准分析器,但停用词和符号也被删除。最好使用什么分析仪?
i have problems with regards to indexing item names with numbers and symbols. a sample of my data is shown below:
ANGLE BARS ORANGE - 4.0MM 2 - 1/2"
B.I SQUARE TUBING 2" X 3"
B.I. PIPE S-40 10MM 3/8"
B.I SQUARE TUBING 1" X 2"
PLYWOOD MARINE 3/4X4X8
PLYWOOD STA. CLARA 1/8X4X8
PLYWOOD STA. CLARA 3/16X4X8
i want to tokenize my data in white or trailing spaces without dropping the symbols because these symbols are very essential. so that whenever i search for "plywood sta. clara", "b.i square 2" X 3"", or "angle orange 2 - 1/2" will give me a result. i tried to used whitespace analyzer but the symbols are dropped. i also tried standardanalyzer but stop words and symbols are also dropped. what is the best analyzer to use instead?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用 PatternAnalyzer 通过编写正则表达式或创建 自定义分析器。
You can use PatternAnalyzer by writing regular expression or create Custom Analyzer.
尝试使用 org.apache.lucene.analysis.miscellaneous.PatternAnalyzer。您可以提供正则表达式来定义标记分隔符。
Try using a org.apache.lucene.analysis.miscellaneous.PatternAnalyzer. You can supply a regular expression to define token delimiters.