Solr3.2 Carrot2 聚类除了“其他主题”之外什么都没有

发布于 2024-11-17 07:11:34 字数 2643 浏览 8 评论 0原文

据说自 Solr 3.2 发布以来,Carrot 与 Solr 的集成得到了改进,但对我来说却有所不同。我有一个绝对相同配置的 Solr 1.4.1 服务器正在运行,Carrot 运行良好,而 Solr 3.2 除了“其他主题”之外什么也没有给我。这让我发疯,因为除此之外我没有遇到任何例外或任何不寻常的事情。甚至结果 xml 看起来都一样......

但是我没有对集群组件的标准配置进行太多更改:

 <searchComponent name="clustering" 
                   enable="${solr.clustering.enabled:true}"
                   class="solr.clustering.ClusteringComponent" >
    <lst name="engine">
      <str name="name">default</str>

      <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>

      <str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>
          <!--custom-->
      <str name="LingoClusteringAlgorithm.phraseLabelBoost">8.00</str>
      <str name="TermDocumentMatrixBuilder.titleWordsBoost">6.00</str>


      <str name="carrot.lexicalResourcesDir">clustering/carrot2</str>

      <str name="MultilingualClustering.defaultLanguage">ENGLISH</str>
    </lst>
    <lst name="engine">
      <str name="name">stc</str>
      <str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
    </lst>
  </searchComponent>
  <requestHandler name="/clustering"
                  startup="lazy"
                  enable="${solr.clustering.enabled:true}"
                  class="solr.SearchHandler">
    <lst name="defaults">
      <bool name="clustering">true</bool>
      <str name="clustering.engine">default</str>
      <bool name="clustering.results">true</bool>
       <str name="carrot.title">autocomplete</str>
       <str name="carrot.url">autocomplete</str>
       <str name="carrot.snippet">autocomplete</str>
       <bool name="carrot.outputSubClusters">true</bool>

       <str name="defType">edismax</str>
       <str name="qf">
          text^0.5 autocomplete^1.2 ata^1.0 raum^1.0 system^1.0 assy^1.0 unit^1.0
       </str>
       <str name="q.alt">*:*</str>
       <str name="rows">10</str>
       <str name="fl">*,score</str>
    </lst>     
    <arr name="last-components">
      <str>clustering</str>
    </arr>
  </requestHandler>

我最好的猜测是胡萝卜没有与 edismax 一起正常工作(在 Solr 1.4 中没有实现)。 1)但这可能会产生误导。

我已经重新索引了我的数据,只是为了确保这不是问题。

在胡萝卜工作台中,聚类与 Lingo 作为算法运行良好。当我选择“按源”时,我得到“其他主题”,如 xml 中所示。可能Lingo配置不好?除了 solrconfig.xml 之外还需要配置其他内容来解决这个问题吗?

我很感谢任何帮助。

it is said that the Carrot integration into Solr was improved since the release of Solr 3.2 but it turns out to be different for me. I had a absolutly same configurated Solr 1.4.1 Server running were Carrot was working great and Solr 3.2 just gives me nothing but "other topics". This ist driving me crazy because beside I get no exceptions or anything unusual. Even the result xml looks the same...

However I didn't make many changes to the standard configuration of the clustering component:

 <searchComponent name="clustering" 
                   enable="${solr.clustering.enabled:true}"
                   class="solr.clustering.ClusteringComponent" >
    <lst name="engine">
      <str name="name">default</str>

      <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>

      <str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>
          <!--custom-->
      <str name="LingoClusteringAlgorithm.phraseLabelBoost">8.00</str>
      <str name="TermDocumentMatrixBuilder.titleWordsBoost">6.00</str>


      <str name="carrot.lexicalResourcesDir">clustering/carrot2</str>

      <str name="MultilingualClustering.defaultLanguage">ENGLISH</str>
    </lst>
    <lst name="engine">
      <str name="name">stc</str>
      <str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
    </lst>
  </searchComponent>
  <requestHandler name="/clustering"
                  startup="lazy"
                  enable="${solr.clustering.enabled:true}"
                  class="solr.SearchHandler">
    <lst name="defaults">
      <bool name="clustering">true</bool>
      <str name="clustering.engine">default</str>
      <bool name="clustering.results">true</bool>
       <str name="carrot.title">autocomplete</str>
       <str name="carrot.url">autocomplete</str>
       <str name="carrot.snippet">autocomplete</str>
       <bool name="carrot.outputSubClusters">true</bool>

       <str name="defType">edismax</str>
       <str name="qf">
          text^0.5 autocomplete^1.2 ata^1.0 raum^1.0 system^1.0 assy^1.0 unit^1.0
       </str>
       <str name="q.alt">*:*</str>
       <str name="rows">10</str>
       <str name="fl">*,score</str>
    </lst>     
    <arr name="last-components">
      <str>clustering</str>
    </arr>
  </requestHandler>

My best guess was that carrot is not woking properly together with edismax (which wasn't implemented in Solr 1.4.1) but that might be missleading.

I allready reindexed my data just to make sure that this is not the issue.

In the carrot workbench clustering is working well with Lingo as the algorithm. when I chose "by source" I get "other topics" as in the xml. Might Lingo be not configured well? Do have to configure anything besides solrconfig.xml to fix that?

I'm thankful for any help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

嗼ふ静 2024-11-24 07:11:34

如果您尝试聚类的“片段”从不存在差异或差异很小,就会发生这种情况。尝试将“clustering.snippet=”添加到您的请求参数中。在您的设置中,有一个默认为“自动完成”的字段。该字段有任何有意义的文本吗?

使这种行为对我来说消失的示例:

http:// /localhost:8983/solr/clustering?q=peter&rows=200&carrot.snippet=summary

最诚挚的问候,

/彼得·W

This happens if the 'snippet' you are trying to cluster on never differs or differs very little. Try adding 'clustering.snippet=' to your request parameters. In your settings there is a field called 'autocomplete' that it defaults to. Does this field have any meaningful text?

Example that makes this behaviour go away for me:

http://localhost:8983/solr/clustering?q=peter&rows=200&carrot.snippet=summary

Best regards,

/Peter W

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文