Lucene - Zend_Search_Lucene - 如何为“标记”内容构建索引
我有以下问题,我需要为标记的文章建立 lucene 索引。
这是简化的数据结构和 lucene 提案:
article_id -> unindexed
article_title -> UnStored
article_content -> UnStored
article_tags -> ????? (here is the problem)
所以文章可以有多个标签。假设我们有一篇文章 A,它具有以下标签:T1、T2、T3。问题是 T1、T2、T3 由 ID(数字)表示,我无法将其作为文本表示形式存储在索引中,因为它可以更改(然后我需要重建索引,搜索具有已更改标签的所有文章删除它们并再次将其添加到索引中)。然后我需要在带有 T1 和 T2 标签的文章中搜索。分配给文章的标签数量是无限的(关系 1-n)。是否可以搜索带有特定标签(标签 ID)的文章?
希望我说清楚了。有人对这个问题有有效的解决方案吗?
提前致谢。
I have following problem, I need to build lucene index for articles which are tagged.
Here is simplified data structure and lucene proposal:
article_id -> unindexed
article_title -> UnStored
article_content -> UnStored
article_tags -> ????? (here is the problem)
So article can have multiple tags. Lets say we have an article A which has following tags: T1,T2,T3. Problem is that T1,T2,T3 are represented by ID (number), I can't store its in index as text representation because it can be changed (then I would need to rebuild index searching all articles with the tag which has been changed remove and add them to index one more time). Then I need to search within articles with T1 and T2 tag. Number of tags assigned to the article is unlimited (relation 1-n). Is there any possibility to search over articles with certain tags (tag ids)?
Hope I am clear. Does anybody have efficient solution for this problem?
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用 Lucene 来完成此操作。一种方法是为每个标签-文章对创建一个文档,并使用 AND 搜索标签。
你应该使用 Lucene 吗?我不确定。在您的描述中,您没有使用任何全文搜索功能。为什么不使用数据库呢?
我建议您阅读 搜索引擎与 DBMS 并根据那里定义的标准进行选择。
You can do this with Lucene. One way is to create a document for each tag-article pair, and search for the tags using AND.
Should you use Lucene? I am unsure. In your description you do not use any full-text search capability. Why not use a database?
I suggest you read Search Engine versus DBMS and choose according to the criteria defined there.