Solr3.2 Carrot2 聚类除了“其他主题”之外什么都没有
据说自 Solr 3.2 发布以来,Carrot 与 Solr 的集成得到了改进,但对我来说却有所不同。我有一个绝对相同配置的 Solr 1.4.1 服务器正在运行,Carrot 运行良好,而 Solr 3.2 除了“其他主题”之外什么也没有给我。这让我发疯,因为除此之外我没有遇到任何例外或任何不寻常的事情。甚至结果 xml 看起来都一样......
但是我没有对集群组件的标准配置进行太多更改:
<searchComponent name="clustering"
enable="${solr.clustering.enabled:true}"
class="solr.clustering.ClusteringComponent" >
<lst name="engine">
<str name="name">default</str>
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>
<!--custom-->
<str name="LingoClusteringAlgorithm.phraseLabelBoost">8.00</str>
<str name="TermDocumentMatrixBuilder.titleWordsBoost">6.00</str>
<str name="carrot.lexicalResourcesDir">clustering/carrot2</str>
<str name="MultilingualClustering.defaultLanguage">ENGLISH</str>
</lst>
<lst name="engine">
<str name="name">stc</str>
<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
</lst>
</searchComponent>
<requestHandler name="/clustering"
startup="lazy"
enable="${solr.clustering.enabled:true}"
class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<str name="clustering.engine">default</str>
<bool name="clustering.results">true</bool>
<str name="carrot.title">autocomplete</str>
<str name="carrot.url">autocomplete</str>
<str name="carrot.snippet">autocomplete</str>
<bool name="carrot.outputSubClusters">true</bool>
<str name="defType">edismax</str>
<str name="qf">
text^0.5 autocomplete^1.2 ata^1.0 raum^1.0 system^1.0 assy^1.0 unit^1.0
</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
我最好的猜测是胡萝卜没有与 edismax 一起正常工作(在 Solr 1.4 中没有实现)。 1)但这可能会产生误导。
我已经重新索引了我的数据,只是为了确保这不是问题。
在胡萝卜工作台中,聚类与 Lingo 作为算法运行良好。当我选择“按源”时,我得到“其他主题”,如 xml 中所示。可能Lingo配置不好?除了 solrconfig.xml 之外还需要配置其他内容来解决这个问题吗?
我很感谢任何帮助。
it is said that the Carrot integration into Solr was improved since the release of Solr 3.2 but it turns out to be different for me. I had a absolutly same configurated Solr 1.4.1 Server running were Carrot was working great and Solr 3.2 just gives me nothing but "other topics". This ist driving me crazy because beside I get no exceptions or anything unusual. Even the result xml looks the same...
However I didn't make many changes to the standard configuration of the clustering component:
<searchComponent name="clustering"
enable="${solr.clustering.enabled:true}"
class="solr.clustering.ClusteringComponent" >
<lst name="engine">
<str name="name">default</str>
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>
<!--custom-->
<str name="LingoClusteringAlgorithm.phraseLabelBoost">8.00</str>
<str name="TermDocumentMatrixBuilder.titleWordsBoost">6.00</str>
<str name="carrot.lexicalResourcesDir">clustering/carrot2</str>
<str name="MultilingualClustering.defaultLanguage">ENGLISH</str>
</lst>
<lst name="engine">
<str name="name">stc</str>
<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
</lst>
</searchComponent>
<requestHandler name="/clustering"
startup="lazy"
enable="${solr.clustering.enabled:true}"
class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<str name="clustering.engine">default</str>
<bool name="clustering.results">true</bool>
<str name="carrot.title">autocomplete</str>
<str name="carrot.url">autocomplete</str>
<str name="carrot.snippet">autocomplete</str>
<bool name="carrot.outputSubClusters">true</bool>
<str name="defType">edismax</str>
<str name="qf">
text^0.5 autocomplete^1.2 ata^1.0 raum^1.0 system^1.0 assy^1.0 unit^1.0
</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
My best guess was that carrot is not woking properly together with edismax (which wasn't implemented in Solr 1.4.1) but that might be missleading.
I allready reindexed my data just to make sure that this is not the issue.
In the carrot workbench clustering is working well with Lingo as the algorithm. when I chose "by source" I get "other topics" as in the xml. Might Lingo be not configured well? Do have to configure anything besides solrconfig.xml to fix that?
I'm thankful for any help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您尝试聚类的“片段”从不存在差异或差异很小,就会发生这种情况。尝试将“clustering.snippet=”添加到您的请求参数中。在您的设置中,有一个默认为“自动完成”的字段。该字段有任何有意义的文本吗?
使这种行为对我来说消失的示例:
http:// /localhost:8983/solr/clustering?q=peter&rows=200&carrot.snippet=summary
最诚挚的问候,
/彼得·W
This happens if the 'snippet' you are trying to cluster on never differs or differs very little. Try adding 'clustering.snippet=' to your request parameters. In your settings there is a field called 'autocomplete' that it defaults to. Does this field have any meaningful text?
Example that makes this behaviour go away for me:
http://localhost:8983/solr/clustering?q=peter&rows=200&carrot.snippet=summary
Best regards,
/Peter W