如何构建即时搜索引擎? (具有排名/相关性)
我是 Sphinx 和 Lucene 的重度用户。 Sphinx 只需要一个数据库,并为其建立索引。然后你打电话给 Sphinx 来获取 ID。
但是,如果我想创建一个非常很小的搜索引擎该怎么办?只是几行数据和几段文字?诀窍在于,数据行是不断变化的。所以,我不能有“索引”。
我希望能够按照相关性进行排名,就像 Sphinx 一样。我怎样才能做到这一点? 当然,我不会通过索引......
I was a heavy user in Sphinx and Lucene.
Sphinx just takes a database, indexes it. And you call Sphinx to get the ID's.
But what if I want to create a search engine that's very tiny. Just a few rows of data and a few paragraphs of words? The trick is, the rows of data is constantly changing. So, I can't have an "index".
I want to be able to rank by relevancy, just like Sphinx. How can I do that?
Of course, I wouldn't go through the indexing...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您只有几行数据和每行数据中的几段文字,请将其全部保留在内存中,并使用对您的内容最有意义的文本算法。
If you only have a few rows of data and a few paragraphs of words in each, keep it all in memory and use whatever text algorithm makes the most sense for your content.
您如何在不查看所有内容的情况下确定相关性?
如果只有很少的数据,并且变化很大,以至于维护索引是不切实际的,那么您可以在想要搜索数据时生成索引,查询它,然后在下次数据更改时删除索引。对于小数据集、频繁更新和不频繁查找,这可能比维护索引更有效。
How are you going to determine the relevancy without looking at everything?
If there is only a tiny bit of data, and it's changing so much that maintaining an index is impractical, you could instead generate the index whenever you want to search the data, query it, and then drop the index the next time the data changes. With a small data set, frequent updates, and infrequent lookups, this could be more efficient than maintaining the index.