SOLR DIH 导入 MySQL“文本”作为 BLOB 的列
我在这里看到了一些关于为相应的 MySQL 列数据类型选择正确的字段类型的问题,但我的问题有点奇怪。我在 MySQL 中有一列 text
类型的帖子,我在 Solr schema.xml
中尝试了相应的 field-type
,例如字符串、文本、text-ws
。但每当我使用 DIH 导入它时,它都会作为 BLOB 对象导入。我检查过,这种情况仅发生在 text
类型的列上,而不是 varchar
上(它们被索引为字符串)。因此,帖子字段不可搜索。
在多次搜索失败后,当我在 Solr 上进行 *:*
查询搜索时,我发现了这个问题。响应示例:
<result name="response" numFound="223" start="0" maxScore="1.0">
<doc>
<float name="score">1.0</float>
<str name="solr_post_bio">[B@10a33ce2</str>
<date name="solr_post_created_at">2011-02-21T07:02:55Z</date>
<str name="solr_post_email">[email protected]</str>
<str name="solr_post_first_name">Test</str>
<str name="solr_post_last_name">Account</str>
<str name="solr_post_message">[B@2c93c4f1</str>
<str name="solr_post_status_message_id">1</str>
</doc>
编辑:
对于未提供以下详细信息表示歉意。
data-config.xml
:
<document>
<entity name="posts" dataSource="jdbc" query="select
p.person_id as solr_post_person_id,
pr.first_name as solr_post_first_name,
pr.last_name as solr_post_last_name,
u.email as solr_post_email,
p.message as solr_post_message,
p.id as solr_post_status_message_id,
p.created_at as solr_post_created_at,
pr.bio as solr_post_bio
from posts p,users u,profiles pr where p.person_id = u.id and p.person_id = pr.person_id and p.type='StatusMessage'">
<field column="solr_post_person_id" />
<field column="solr_post_first_name"/>
<field column="solr_post_last_name" />
<field column="solr_post_email" />
<field column="solr_post_message" />
<field column="solr_post_status_message_id" />
<field column="solr_post_created_at" />
<field column="solr_post_bio"/>
</entity>
</document>
schema.xml
:
<fields>
<field name="solr_post_status_message_id" type="string" indexed="true" stored="true" required="true" />
<field name="solr_post_message" type="text_ws" indexed="true" stored="true" required="true" />
<field name="solr_post_bio" type="text" indexed="false" stored="true" />
<field name="solr_post_first_name" type="string" indexed="false" stored="true" />
<field name="solr_post_last_name" type="string" indexed="false" stored="true" />
<field name="solr_post_email" type="string" indexed="false" stored="true" />
<field name="solr_post_created_at" type="date" indexed="false" stored="true" />
</fields>
<uniqueKey>solr_post_status_message_id</uniqueKey>
<defaultSearchField>solr_post_message</defaultSearchField>
I've seen few questions here on choosing correct field-type for the corresponding MySQL column data type but my problem is a bit weird. I've a column for posts in MySQL of type text
, I've tried corresponding field-type
for it in Solr schema.xml
e.g. string, text, text-ws
. But whenever I'm importing it using the DIH, it's getting imported as a BLOB object. I checked, this thing is happening only for columns of type text
and not for varchar
(they are getting indexed as string). Hence, the posts field is not becoming searchable.
I found about this issue, after repeated search failures, when I did a *:*
query search on Solr. A sample response:
<result name="response" numFound="223" start="0" maxScore="1.0">
<doc>
<float name="score">1.0</float>
<str name="solr_post_bio">[B@10a33ce2</str>
<date name="solr_post_created_at">2011-02-21T07:02:55Z</date>
<str name="solr_post_email">[email protected]</str>
<str name="solr_post_first_name">Test</str>
<str name="solr_post_last_name">Account</str>
<str name="solr_post_message">[B@2c93c4f1</str>
<str name="solr_post_status_message_id">1</str>
</doc>
EDIT :
Apologizing for not providing following details.
The data-config.xml
:
<document>
<entity name="posts" dataSource="jdbc" query="select
p.person_id as solr_post_person_id,
pr.first_name as solr_post_first_name,
pr.last_name as solr_post_last_name,
u.email as solr_post_email,
p.message as solr_post_message,
p.id as solr_post_status_message_id,
p.created_at as solr_post_created_at,
pr.bio as solr_post_bio
from posts p,users u,profiles pr where p.person_id = u.id and p.person_id = pr.person_id and p.type='StatusMessage'">
<field column="solr_post_person_id" />
<field column="solr_post_first_name"/>
<field column="solr_post_last_name" />
<field column="solr_post_email" />
<field column="solr_post_message" />
<field column="solr_post_status_message_id" />
<field column="solr_post_created_at" />
<field column="solr_post_bio"/>
</entity>
</document>
The schema.xml
:
<fields>
<field name="solr_post_status_message_id" type="string" indexed="true" stored="true" required="true" />
<field name="solr_post_message" type="text_ws" indexed="true" stored="true" required="true" />
<field name="solr_post_bio" type="text" indexed="false" stored="true" />
<field name="solr_post_first_name" type="string" indexed="false" stored="true" />
<field name="solr_post_last_name" type="string" indexed="false" stored="true" />
<field name="solr_post_email" type="string" indexed="false" stored="true" />
<field name="solr_post_created_at" type="date" indexed="false" stored="true" />
</fields>
<uniqueKey>solr_post_status_message_id</uniqueKey>
<defaultSearchField>solr_post_message</defaultSearchField>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我也有同样的问题。我的所有配置和模式都是正确的,但我仍然在短文本字段中收到斑点。
经过一番绞尽脑汁后,我终于偶然发现了这个交换: http://qnalist.com/questions/624892/solr-dih-importing-mysql-text-column-as-a-blob
事实证明,MySQL 或 JDBC 中存在错误,导致在极少数情况下,CHAR 或 VARCHAR 字段会显示为 BLOB。我怀疑该错误与 MySQL 相关,因为我正在使用相当旧的版本。
就我而言,解决方法是将值包装在 CONCAT() 中,并将该值包装在 CAST() 中。这最终让 MySQL 确信,是的,我的文本列确实是文本。
我不知道你是否找到了问题的解决方案,但是当我遇到这个问题时,这个页面经常出现在我的谷歌搜索中,所以我希望下一个可怜的灵魂会发现这篇文章有帮助。
I had this same problem. All my configs and schemas were correct, but I was still getting blobs in a short text field.
After much head-scratching, I finally stumbled upon this exchange: http://qnalist.com/questions/624892/solr-dih-importing-mysql-text-column-as-a-blob
It turns out there was a bug either in MySQL or JDBC causing CHAR or VARCHAR fields in rare situations to show up as BLOBs instead. I suspect the bug was with MySQL, as I'm working with a rather old version.
In my case, the workaround was to wrap the value in a CONCAT(), and wrap that in a CAST(). This finally convinced MySQL that yes, my text column really really is text.
I don't know if you ever found a solution to your problem, but when I ran into it, this page came up often in my Google searches, so I hope the next poor soul finds this post helpful.