Solr Tokenizer无需做任何事情
我想将一个solr字符串字段“ content”归为“一个sikenized”。 因此,例如:
{
"content":"Hello World this is a Test",
"tokenized":["hello", "world", "this", ...]
}
为此,我使用的
<field name="content" type="string" indexed="true" stored="true"/>
<field name="tokenized" type="customType" indexed="true" stored="true"/>
<copyField source="content" dest="tokenized"/>
自定义字段类型是,
<fieldType name="customType" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
我的理解是,在提交所有内容后,将用指定的令牌化将所有内容归为令牌,然后将其作为令牌列表放入令牌化字段中。但是,令牌化字段仅包含列表中的内容,例如:
{
"content":"Hello World this is a Test",
"tokenized":["Hello World this is a Test"]
}
我需要做出一些全局配置以使Tokenizers工作吗?
I want to tokenize one solr string field "content" to another field "tokenized".
So e.g.:
{
"content":"Hello World this is a Test",
"tokenized":["hello", "world", "this", ...]
}
For that i use
<field name="content" type="string" indexed="true" stored="true"/>
<field name="tokenized" type="customType" indexed="true" stored="true"/>
<copyField source="content" dest="tokenized"/>
and the custom field type
<fieldType name="customType" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
My understanding was that upon committing all contents are tokenized with the specified tokenizer and then put, as a list of tokens, into the tokenized field. However the tokenized field only contains the content in a list, e.g.:
{
"content":"Hello World this is a Test",
"tokenized":["Hello World this is a Test"]
}
Is there some global configuration i need to make to get tokenizers to work?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
令牌仅在内部存储在Lucene和Solr中。他们不会更改以任何方式返回给您的存储文本。文本是逐字存储的 - 即您发送的文本是返回给您的内容。
在后台生成并存储在索引中的令牌会影响您可以搜索存储的内容以及处理方式的方式,它不会影响字段的显示值。
您可以在Solr的管理页面下使用分析页面,以确切查看在存储在索引中之前,如何将字段的文本处理到令牌中。
这样做的原因是,您通常有兴趣将实际的文本返回给用户,使令牌化和处理的值可见,对于返回到人类的文档而言并没有真正的意义。
Tokens are only stored internally in Lucene and Solr. They do not change the stored text that gets returned to you in any way. The text is stored verbatim - i.e. the text you sent in is what gets returned to you.
The tokens generated in the background and stored in the index affect how you can search against the content you've stored and how it's processed, it does not affect the display value of the field.
You can use the Analysis page under Solr's admin page to see exactly how text for a field gets processed into tokens before being stored in the index.
The reason for this is that you're usually interested in returning the actual text to the user, making the tokenized and processed values visible doesn't really make sense for a document that gets returned to a human.