使用 Ferret 构建独特的标签云
我一直在我正在从事的一个小项目中使用 Ferret 作为我的全文搜索引擎。
通过文档和一些在线示例,我已经能够使用全文索引组合一个标签云生成器,以帮助使用 IndexReader.terms
方法生成标签云。
到目前为止,当我想根据搜索结果获取术语数据时,它的效果非常好。
例如,如果用户搜索“蛋糕”,我想向他们显示与术语“蛋糕”相关的术语标签云。
我一直在寻找 terms
方法可以与搜索结果集或类似内容结合使用的示例?
目前,我正在使用以下方法来生成标签列表:
reader = Ferret::Index::IndexReader.new(Scrape.find_last_index_version)
terms = []
reader.terms(:all_quotes).each do |term, doc_freq|
terms << [term, doc_freq]
end
干杯。
I've been using Ferret as my full-text search engine in a small project I'm working on.
Through the documentation and a few examples online, i've been able to pull together a tag cloud generator using the full-text index to help with tag cloud generation using the IndexReader.terms
method.
It's worked quite well up to now, when I want to get term data based on a search result.
For example, if the user searches for "cake", I want to show them a tag cloud of terms used in association with the term "cake".
I've been looking for examples of where the terms
method can be used in association with a search result set or similar?
Currently I'm using the following method to generate my list of tags:
reader = Ferret::Index::IndexReader.new(Scrape.find_last_index_version)
terms = []
reader.terms(:all_quotes).each do |term, doc_freq|
terms << [term, doc_freq]
end
Cheers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
它更像是一个词频图表(如 wordle)而不是标签云?或者这些在标签字段中?无论如何,索引不会跟踪每个可能的文档子集中的术语频率(例如搜索结果),因此即使该方法存在,该方法也不会很快。对于单个文档,您可以获取 TermFreqVector 并提供与该文档中其他常用术语良好匹配的建议文档。因此,您可以获取一些最重要的结果,从每个结果中获取术语向量,然后将它们相加,但这些聚合函数本身并不存在(它们通常尝试不将缓慢的操作放在那里。)
It's more like a term frequency chart (like a wordle) than a tag cloud? Or are these in a tag field? Anyway, the index doesn't keep track of term frequency within each possible document subset (such as the results of a search), so that method wouldn't be fast, even if it existed. For a single document, you can get the TermFreqVector and provide suggested documents that are good matches for other frequent terms in that document. So, you could take some of the top results, grab the term vectors from each one, and just add them up, but those aggregate functions don't exist natively (they generally try not to put slow operations in there.)