EdgeNGramFilterFactory 不工作（不建立索引？）

发布于 2024-10-30 23:10:08 字数 1267 浏览 3 评论 0原文

我在让 ngram 工作时遇到问题。这是我的 schema.xml：

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />

  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
  </analyzer>
</fieldType>

我的数据库有一堆条目

“伊丽莎白”

和

“伊丽莎白”

当我尝试查询“伊丽莎白”时，我只得到“伊丽莎白”，而不是“伊丽莎白”。奇怪的是，当我检查 solr 管理时，分析页面显示 EdgenGramFilterFactory 确实可用，并导致“Elizabeths”扩展为

伊尔·伊利·伊莉莎·伊莉莎·伊莉莎布·伊莉莎贝·伊莉莎贝·伊丽莎白

索引器似乎没有注意到这一点。当我将同义词过滤器从查询块移动到索引块时，我遇到了同样的问题。也就是说，当我在查询块中有同义词过滤器时，它起作用，但是当我将其放入索引块时，它没有效果。

我已重新启动 Sunspot 并重新索引多次。没有骰子。有什么想法吗？如何直接查看索引词列表？

原文

I am having trouble getting ngrams to work. Here's my schema.xml:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />

  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
  </analyzer>
</fieldType>

My database has a bunch of entries with

"Elizabeth"

and

"Elizabeths"

When I try to query on "Elizabeth" I get only "Elizabeth" and not "Elizabeths".
The odd thing is, when I check out the solr admin, the Analysis page shows that the EdgenGramFilterFactory is indeed available, and results in "Elizabeths" being expanded into

e el eli eliz eliza elizab elizabe elizabet elizabeth

It seems like the indexer isn't picking up on this. I have the same problem when I move the synonyms filter from the query block to the index block. That is to say, when I have the synonyms filter in the query block, it works, but when I put it in the index block, it has no effect.

I have restarted Sunspot and reindexed multiple times. No dice. Any ideas? How can I directly check the indexed words list?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我纯我任性 2024-11-06 23:10:08

我想我发现了这个问题，它看起来像是一个菜鸟错误。

在我的模型中，根据教程之一使用以下构造：

class Institution < ActiveRecord::Base
 .
 .
 .
end

Sunspot.setup(Institution) do
  text :name
end

当我开始、停止或重新索引时，这似乎没有抛出任何错误。让我感到奇怪的是，我在停止 Solr 后能够立即重新索引。

。

class Institution < ActiveRecord::Base
  .
  .
  .
  searchable do
    text :name
  end
endH

当我这样做时，我发现停止Solr后无法重新索引然而，当我启动 Solr 并重新建立索引时，索引似乎真的是
刷新后，我的查询终于按预期运行。

I think I found the problem and it looks like a noob error.

In my model, is was using the following construct as per one of the tutorials:

class Institution < ActiveRecord::Base
 .
 .
 .
end

Sunspot.setup(Institution) do
  text :name
end

This did not seem to throw any errors when I started, stopped, or reindexed. It struck me as strange that I was able to reindex immediately after stopping Solr.

I switched to

class Institution < ActiveRecord::Base
  .
  .
  .
  searchable do
    text :name
  end
endH

When I did this, I found that I could not reindex after stopping Solr. However, when I started Solr and reindexed, the index appeared to be truly
refreshed and my queries finally behaved as expected.

回复收藏 0 原文

~没有更多了~