如何针对多种语言环境和内容类型构建 Solr 核心?
我希望运行 Solr 服务器来统一公共网站多个不同方面的搜索。首先,有多种语言环境(美国、爱尔兰、日本等)和多种类型的内容(论坛、常规网页、帮助页面、产品等) .)
我希望能够对单个区域设置执行搜索,但返回多种内容类型的结果,以便我可以将它们显示为选项卡式结果集。
可能的选项:
- 为每个区域设置一个核心,并使用同一索引中的字段来区分内容类型。
- 每种内容类型都有一个核心。
- 每种内容类型/区域设置组合都有一个核心。
- 一切皆单核/单索引。
注意事项:
Solr wiki 提到,多核在处理大约 1000 万个文档时开始提供性能提升,我认为即使考虑到所有区域设置和内容类型,我们也可能远远低于这一点。然而,将所有数据粉碎到单个索引中的解决方案似乎有点混乱,并且可能难以分片/扩展。然而,单核非常适合获取单个结果集,因为我不必跨核进行多重搜索。
有谁用过多核可以给我建议吗?
I'm looking to run a Solr server to unify search across several different aspects of a public website. First of all, there are several locales (US, Ireland, Japan, etc.) and several types of content (Forums, Regular web pages, Help pages, Products, etc.)
I'd like to be able to have searches perform for a single locale, but return results for multiple content types, so that I can display them as a tabbed result set.
Possible options:
- Have one core for each locale, and differentiate between content types using fields in the same index.
- Have one core for each content type.
- Have one core for each content type / locale combination.
- Single core / single index for everything.
Considerations:
The Solr wiki mentions that multi-core starts to give a performance gain at about 10 million documents, and I think we're probably well under that, even given all locales and content types. However, the solution of just smashing all the data into a single index seems a little messy and potentially difficult to shard / scale. Single core is great for getting a single result set however, as I don't have to multisearch across cores ever.
Anyone used multi-core who can advise me?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来有人对这个问题感兴趣,所以我想我应该开始用我的一些发现来更新答案。
首先,按区域设置分离核心有一些真正的优势,因为它使每种语言可以轻松拥有自己的停用词和设置。就我而言,我永远不会跨区域设置进行搜索,所以这是合乎逻辑的。此外,它可能会给我带来一些速度提升,因为每个核心的索引大小较小。
至于按核心划分内容类型,我仍在尝试一种内容类型,因此我会在扩展时进行更新。
Looks like there's some interest in this question so I thought I'd start updating an answer with some of my findings.
First of all, there's some real advantages to separating cores by locale, since it makes it easy for each language to have its own stop words and settings. In my case I'm never going to be searching across locales so it's logical. Also, it's likely to give me some speed increase because the index size for each core is smaller.
As for splitting up content types by core, I'm still experimenting with one content type, so I'll update when I expand.