使用 Lucene 索引和搜索数据的最佳方式是什么?
我在 SO 和其他地方发现了多个问题,这些问题提出的问题类似于“如何在 Lucene 中索引并搜索关系数据”。非常正确地,这些问题得到了标准的回答,即 Lucene 并不是为这样的数据建模而设计的。我发现的这句话总结了这一点……
Lucene 索引是一个文档存储。在文档存储中,单个 文档代表一个单一的概念,所有必要的数据都存储在 代表该概念(与传播的同一概念相比 跨 RDBMS 中的多个表需要多个联接 重新创建)。
所以我不会问这个问题,而是提供我的高水平要求,看看是否有任何 Lucene 专家可以帮助我。
- 我们有关于人的数据(姓名、性别、出生日期、国籍等)
- 和关于公司的数据(名称、国家/地区、城市等)。
- 我们还有关于人员在公司工作时这两种类型的实体如何相互关联的数据(人员、公司、角色、开始日期、结束日期等)。
我们有两个实体——个人和公司——它们有自己的属性,然后属性的存在是为了它们之间的多对多链接。
一些搜索示例如下:
- 查找澳大利亚的所有公司
- 查找在两个日期之间出生的所有人员
- 查找所有曾担任过 .Net 开发人员的人员
- 查找所有曾在伦敦担任过 .Net 开发人员的男性。
- 查找 2008 年至 2010 年间所有担任过 .Net 开发人员的人员
该标准涵盖所有三组数据。我们的要求是对接受各种属性的任意组合的数据提供分面搜索我已经举了一些例子。
我想为此使用 Lucene.Net 。我们是一家 .Net 软件公司,因此对 Java 感到有点害怕。但是,欢迎所有建议。
我知道索引的构建应该考虑到搜索。但我似乎无法想出一个合理的索引来满足所有搜索条件的组合:
- 哪些类是 Lucene 原生的,或者我们可以使用哪些扩展点。
- 是否有成熟的技术来做这种事情?
- 我错过了哪些第三个开源贡献可以帮助我们?
现在我不会描述我们考虑过的场景,因为我不想让这个问题变得过分膨胀并使其变得太令人生畏。如有必要,请要求我详细说明。
I’ve found multiple questions on SO and elsewhere that ask questions along the lines of “How can I index and then search relational data in Lucene”. Quite rightly these questions are met with the standard response that Lucene is not designed to model data like this. This quote I found sums it up…
A Lucene Index is a Document Store. In a Document Store, a single
document represents a single concept with all necessary data stored to
represent that concept (compared to that same concept being spread
across multiple tables in an RDBMS requiring several joins to
re-create).
So I will not ask that question and instead provide my high level requirements and see if any Lucene gurus out there can help me.
- We have data on People (Name, Gender, DOB, Nationality, etc)
- And data on Companies (Name, Country, City, etc).
- We also have data about how these two types of entity relate to each other where a person worked at the company (Person, Company, Role, Date Started, Date Ended, etc).
We have two entities – Person and Company – that have their own properties and then properties exist for the many-to-many link between them.
Some example searches could be as follows…
- Find all Companies in Australia
- Find all People born between two dates
- Find all People who have worked as a .Net Developer
- Find all males who have worked as a.Net Developer in London.
- Find all People who have worked as a .Net Developer between 2008 and 2010
The criteria span all the three sets of data. Our requirement is to provide a Faceted Search over the data that accepts any combination of the various properties, of which I have given some examples.
I would like to use Lucene.Net for this. We are a .Net software house and so feel slightly intimidated by java. However, all suggestions are welcome.
I am aware of the idea that the Index should be constructed with the search in mind. But I can’t seem to come up with a sensible index that would meet all the combinations of search criteria
- What classes native to Lucene or what extension points can we make use of.
- Are there are established techniques for doing this kind of thing?
- Are there any third open source contributions that I have missed that will help us here?
For now I won’t describe the scenarios we have considered because I don’t want to bloat out this question and make it too intimidating. Please ask me to elaborate where necessary.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
要将公司和人员存储在单个索引中,您可以创建带有
type
字段的文档,该字段标识它们所描述的实体的类型。生日可以存储为日期字段。
您可以为每个人提供一个简单的文本字段,其中包含他们所工作的公司名称。请注意,如果您输入的公司不在索引中的文档中,则不会出现错误。 Lucene 不是一个关系数据库工具,但您知道这一点。
(抱歉,我没有发布任何 API 链接;我熟悉 Lucene Core,但不熟悉 Lucene.NET。)
To store both companies and people in a single index, you could create documents with a
type
field that identifies the type of entities they describe.Birthdays can be stored as date fields.
You could give each person a simple text field containing the names of companies that they worked for. Note that you won't get an error if you enter a company that is not represented by a document in your index. Lucene is not a relational DB tool, but you knew that.
(Sorry that I've not posted any links to the API; I'm familiar with Lucene Core but not Lucene.NET.)