Lucene 索引的数据库表非规范化
我刚刚开始使用 Lucene,并且正在尝试对数据库建立索引,以便可以对内容执行搜索。我有兴趣对 3 个表建立索引:
1。图像表 - 这是一个表,其中每个条目代表一个图像。每个图像都有一个唯一的 ID 和一些其他信息(标题、描述等)。
2.人员表 - 这是一个表,其中每个条目代表一个人。每个人都有一个唯一的 ID 和其他信息,例如(姓名、地址、公司等)
3。致谢表 - 此表有 3 个字段(图像、人物和致谢类型)。其目的是将某些人与图像相关联,作为该图像的制作人员。每张图像可以有多个署名人员(包括导演、摄影师、道具艺术家等)。此外,一个人会出现在多个图像中。
我正在尝试对这些表建立索引,以便可以使用 Lucene 执行一些搜索,但正如我所读到的,我需要展平结构。
我想到的第一个解决方案是为图像/授信人的每个组合创建 Lucene 文档。恐怕这会在索引中创建大量重复内容(图像/人的所有详细信息都必须在每个处理图像的人的每个文档中重复)。
有没有对 Lucene 有经验的人可以帮助我解决这个问题?我知道非规范化没有通用的解决方案,这就是为什么我提供了一个更具体的示例。
谢谢,如果有人需要的话,我很乐意提供有关数据库的更多信息
PS:不幸的是,我无法更改数据库的结构(它属于客户端)。我必须利用我所拥有的东西来工作。
I am just starting up with Lucene, and I'm trying to index a database so I can perform searches on the content. There are 3 tables that I am interested in indexing:
1. Image table - this is a table where each entry represents an image. Each image has an unique ID and some other info (title, description, etc).
2. People table - this is a table where each entry represent a person. Each person has a unique ID and other info like (name, address, company, etc)
3. Credited table - this table has 3 fields (image, person, and credit type). It's purpose is to associate some people to a image as the credits for that image. Each image can have multiple credited people (there's the director, photographer, props artist, etc). Also, a person is credited in multiple images.
I'm trying to index these tables so I can perform some searching using Lucene but as I've read, I need to flatten the structure.
The first solution the came to me would be to create Lucene documents for each combination of Image/Credited Person. I'm afraid this will create a lot of duplicate content in the index (all the details of an image/person would have to be duplicated in each Document for each person that worked on the image).
Is there anybody experienced with Lucene that can help me with this? I know there is no generic solution to denormalization, that is why I provided a more specific example.
Thank you, and I will gladly provide more info on the database is anybody needs
PS: Unfortunately, there is no way for me to change the structure of the database (it belongs to the client). I have to work with what I have.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以为每个人创建一个
文档
,并将所有关联图像的描述连接起来(附加到人员信息或单独的字段
中)。或者,您可以为每个人创建一个最小的
文档
,为每个图像创建一个文档
,将创建者的姓名和信用信息放在图像的单独字段中Document
并通过将人员 ID(或人员Document
id)放入第三个非索引字段来链接它们。 (Lucene 面向平面文档索引,而不是关系数据,但可以手动定义关系。)这实际上是您要搜索的内容(图像或人物)以及每个内容是否包含足够的关键字以进行搜索的问题。尝试几个选项,看看它们是否足够好并且不会超出可用空间。
不过,信用表可能不适合构建
文档
。You could create a
Document
for each person with all the associated images' descriptions concatenated (either appended to the person info or in a separateField
).Or, you could create a minimal
Document
for each person, create aDocument
for each image, puts the creators' names and credit info in a separate field of the imageDocument
and link them by putting the person ID (or personDocument
id) a third, non-indexed field. (Lucene is geared toward flat document indexing, not relational data, but relations can be defined manually.)This is really a matter of what you want to search for, images or persons, and whether each contains enough keywords for search to function. Try several options, see if they work well enough and don't exceed the available space.
The credit table will probably not be a good candidate for
Document
construction, though.