solr lucene 中的索引
我有一个网站,用户可以在其中发布一些问题,所以我在 mysql 中有一个像这样的表
question_id、user_id、标签、视图、创建日期
我想要的是能够
执行搜索,并根据这些返回question_ids 标签
并按
排序- 观看次数
- 日期(例如最新的、或本周、本月)
- 或搜索指定用户并再次返回 Question_ids 按观看次数和日期排序。
就索引而言,我应该以什么方式将所有内容引入 solr 中? 我必须索引标签、视图、日期吗?我应该索引什么才能获得最佳性能?
I have a site in which users can post some questions, so I a have a table in mysql like this
question_id, user_id, tags, views, creation_date
what I want is to be able to
perform searches which will return question_ids based on those
tagsand order them by
- Views
- date, (like newest, or this week, month)
- or searches for a specified user and return question_ids again
ordered by views and date.
In what way should I bring everything in solr, as far as indexing is concerned?
Will I have to index tags, views, date? What should I index so that I have maximal performance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
想一想,使用 lucene/solr 是否会给您带来好处。我不想被误解,但如果您想在 user_id 列中搜索特定的用户 ID,则不需要额外的全文搜索引擎。
不管怎样——也许你只喜欢有一个小项目来“玩”solr。
因此,以下是您的问题的答案:
把你需要搜索的所有东西都放到solr/lucene中。使用 DHI(数据导入处理程序)http://wiki.apache.org/solr/DataImportHandler 让 solr 遍历您的表并对数据建立索引。
是的。您必须索引所有您喜欢使用的东西。
顺便说一句:索引和存储数据之间是有区别的。您可以索引字段(如标签、user_id、视图等),但不需要将它们(附加)存储在 lucene 索引中。如果 lucene/solr 必须返回/传递搜索到的数据,则存储数据是必要的。
否则,solr 仅返回匹配文档的 uniqueKey(主键),并且您必须从 datebes 中获取数据(...其中 pk=< lucene result >)
因此,您不需要存储这些字段,这些字段仅与排序相关(例如)。
仅索引那些您需要使用(solr)的字段(列)。不要索引您永远不会要求/搜索的字段。
Think about, if using lucene/solr is relay a benefit for you. I don't wanna be misunderstood, but if you like to search inside an column user_id for an specific user ID, you don't need a addition fulltext-search engine.
Anyway - maybe you only like to have an little project to "play with" solr.
So here are the answers of your questions:
Put everything to solr/lucene, you need to search for. Use the DHI (data import handler) http://wiki.apache.org/solr/DataImportHandler to let solr walk trough your table and index the data.
Yes. You have to index all the things you like to work with.
btw: there is a difference between indexing and storing data. You can index fields (like tags, user_id, views,..) but you don't need to store them (additional) inside your lucene index. Storing data is necessary, if lucene/solr have to return/deliver the searched data.
Otherwise, solr only returns the uniqueKey (primary key) of the matches documents and you have to fetch the data from the datebes (...where pk=< lucene result >)
So you don't need to store those fields, which are only relevant for sorting (for example).
Index only those fields (columns), you need to work with (solr). Don't index field you will never ask for / search for.