Solr：多语言索引和 DIH多值字段？

发布于 2024-10-01 19:01:40 字数 904 浏览 9 评论 0原文

我有一个 MySQL 表：

CREATE TABLE documents (
    id INT NOT NULL AUTO_INCREMENT,
    language_code CHAR(2),
    tags CHAR(30),
    text TEXT,
    PRIMARY KEY (id)
);

我有 2 个关于 Solr DIH 的问题：

1）langauge_code 字段指示 text 字段使用的语言。根据语言，我想要将 text 索引到不同的 Solr 字段。

# pseudo code

if langauge_code == "en":
    index "text" to Solr field "text_en"
elif langauge_code == "fr":
    index "text" to Solr field "text_fr"
elif langauge_code == "zh":
    index "text" to Solr field "text_zh"
...

DIH 可以处理这样的用例吗？我该如何配置它才能做到这一点？

2) tags 字段需要索引到 Solr multiValued 字段中。多个值存储在一个字符串中，并用逗号分隔。例如，如果 tags 包含字符串 "blue, green, Yellow" 那么我想索引 3 个值 "blue", “绿色”、“黄色” 到 Solr 多值字段中。

我该如何使用 DIH 做到这一点？

谢谢。

原文

I have a MySQL table:

CREATE TABLE documents (
    id INT NOT NULL AUTO_INCREMENT,
    language_code CHAR(2),
    tags CHAR(30),
    text TEXT,
    PRIMARY KEY (id)
);

I have 2 questions about Solr DIH:

1) The langauge_code field indicates what language the text field is in. And depending on the language, I want to index text to different Solr fields.

# pseudo code

if langauge_code == "en":
    index "text" to Solr field "text_en"
elif langauge_code == "fr":
    index "text" to Solr field "text_fr"
elif langauge_code == "zh":
    index "text" to Solr field "text_zh"
...

Can DIH handle a usecase like this? How do I configure it to do so?

2) The tags field needs to be indexed into a Solr multiValued field. Multiple values are stored in a string, separated by a comma. For example, if tags contains the string "blue, green, yellow" then I want to index the 3 values "blue", "green", "yellow" into a Solr multiValued field.

How do I do that with DIH?

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

愁杀 2024-10-08 19:01:40

首先，您的架构需要允许使用如下内容：

<dynamicField name="text_*" type="string" indexed="true" stored="true" />

然后在您的 DIH 配置中，如下所示：

<entity name="document" dataSource="ds1" transformer="script:ftextLang" query="SELECT * FROM documents" />

在数据源下方定义脚本：

<script><![CDATA[
  function ftextLang(row){
     var name = row.get('language_code');
     var value = row.get('text');
     row.put('text_'+name, value); return row;
  }
]]></script>

First your schema needs to allow it with something like this:

<dynamicField name="text_*" type="string" indexed="true" stored="true" />

Then in your DIH config something like this:

<entity name="document" dataSource="ds1" transformer="script:ftextLang" query="SELECT * FROM documents" />

With the script being defined just below the datasource:

<script><![CDATA[
  function ftextLang(row){
     var name = row.get('language_code');
     var value = row.get('text');
     row.put('text_'+name, value); return row;
  }
]]></script>

回复收藏 0 原文