无重复数据的分面搜索(无 ETL)
到目前为止,我见过的所有解决方案都涉及使用 nosql 或数据仓库来复制数据。 有没有更有效的方法?
2011-06-07 编辑:当我说不重复时,我的意思是不ETL 之一。我想直接从主数据库提取数据。这是相关的,但我及时改变了。
All solutions I've seen so far involve duplication of data by using nosql or datawarehousing.
Are there more efficient ways?
2011-06-07 EDIT: When I say no duplication I mean no ETL either. I would like to extract data directly from main database. It's relational but I'm in time to change.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Solr 有一个补丁,它添加了字段折叠功能。它工作得相当好,除了当返回的结果集有数百万个文档长时报告问题之外。
此外,它不能非常精确地计算构面数量 - 有时所有构面的总数与集合中的文档数量不相符。然而,差异似乎总是没有那么大——我注意到 10000-50000 个文档的结果集的波动小于 100。
显然,要使用此补丁,您必须构建自己的 Solr 版本。如果您对此不满意,可以尝试我正在使用的已构建版本。我已经上传到
我的 SkyDrive 都有一个修补过的 .war 文件和我的“lib”文件夹(不确定是否
后者是必要的,如果补丁对库进行了任何更改,但只是
万一他们也在那里)。另外我需要提到应该使用这个版本
您只需自行承担风险 - 他们为我服务,没有任何严重的投诉,但我不能
保证对其他人也同样如此。这是下载链接。
或者,您可以等待 Solr 4 发布 - 它将包括字段崩溃,但在我上次检查时它仍然存在未解决的关键问题。顺便说一句,它的折叠搜索参数与上述补丁不兼容,因此您使用第一个,然后使用另一个,您还需要修改代码。
There is a patch for Solr that adds field collapsing. It works fairly well except the problems are reported when the returned result set is millions documents long.
Also, it doesn't calculate facet numbers very precisely - sometimes the total number of all the facets doesn't tally with the number of documents in the set. However, the difference always seems to be not that big - I noticed the fluctuations of less than 100 for result set of 10000-50000 documents.
Obviously, to use this patch you'll have to build your own version of Solr. If you're not comfortable with that you can try the already built version I am using. I have uploaded to
my SkyDrive both a patched .war file and my "lib" folder (not sure if the
latter is necessary and if the patch does any changes to libraries, but just
in case they also there). Also I need to mention that this version should be used
on your own risk only - they serve me without any serious complaints, but I can't
guarantee the same for others. Here's the download link.
Alternatively, you can wait for Solr 4 to be released - it will include field collapsing but it still bore unresolved critical issues last time I checked. By the way, its collapsing search parameters won't be compatible with the patch described above, so you use first one and then another you'll need to amend your code as well.