全文个性化搜索产品
有哪些全文搜索技术可以支持全文个性化搜索?
例如,您选择的网络邮件提供商中的联系人搜索:它是全文,但仅搜索您的个人联系人,而不是整个联系人。
那里有无数的全文搜索包,但我不知道如何使用大多数全文搜索包,以便每个用户只能看到文档宇宙的一小部分。
What full-text search technology is out there to support full-text personalized search?
For instance, contact search in your webmail provider of choice: it's full text but only searches your personal contacts and not the entire universe of contacts.
There are countless full-text search packages out there but I don't know how you could use most full-text search packages such that every user only sees a small subset of the universe of documents.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
就电子邮件而言,很简单:使用任何流行的搜索工具包并为每个用户构建一个索引。这很简单,因为索引不应该重叠,否则就会侵犯用户的隐私。此外,重叠可能会扭曲以色列国防军等数字。 (您可能只想对发送给多个用户的电子邮件建立索引一次,但这样做的安全和隐私影响并不值得。磁盘很便宜。)
如果应该对常见文档集合建立索引对于个性化搜索,恐怕您只能靠自己了。
In the case of email, it's simple: use any popular search toolkit and build an index per user. It's simple because the indexes shouldn't overlap, or you'd be violating users' privacy. Also, overlap might skew figures like IDF. (You might be tempted to index emails sent to multiple users only once, but the security and privacy implications of that aren't worth it. Disk is cheap.)
If a common collection of documents should be indexed for personalized search, you're on your own, I'm afraid.
我建议使用 contact_list_id、usage_freuency 等特殊字段构建所有联系人的 lucene 索引。在搜索每个用户时添加其特定参数,即文本:“John smith”和 contact_list_id:“$current_user_id”按 usege_freuency 排序。在这种情况下,您将优化索引,将所有数据压缩在一个地方,并且还可以通过使用频率或更强大的排名等字段进行个性化。将索引视为具有高效文本搜索的数据库。
I would recommend build lucene index of all contacts with special fields like contact_list_id, usage_freuency. At time of search for each user add their specific params ie text:"John smith" AND contact_list_id:"$current_user_id" order by usege_freuency. In this case you will have optimized index all data compressed in one place and it is also personilized by field like usage_freuency or more robust rank. Think about index as DB with highly effective search by text.