Lucene复杂结构搜索

发布于 2024-08-30 08:18:26 字数 978 浏览 6 评论 0原文

基本上，我确实有非常简单的数据库，我想用 Lucene 对其进行索引。域是：

// Person domain
class Person {
  Set<Pair> keys;
}

// Pair domain
class Pair {
  KeyItem keyItem;
  String value;
}

// KeyItem domain, name is unique field within the DB (!!)
class KeyItem{
  String name;
}

我有数千万个配置文件和数亿个 Pair，但是，由于大多数 KeyItem 的“名称”字段都是重复的，因此只有几十个 KeyItem 实例。到达该结构以保存 KeyItem 实例。

基本上任何具有任何字段的配置文件都可以保存到该结构中。假设我们有带有属性的配置文件

- name: Andrew Morton
- eduction:  University of New South Wales, 
- country: Australia, 
- occupation: Linux programmer.

为了存储它，我们将有单个 Profile 实例、4 个 KeyItem 实例：姓名、教育、国家/地区和职业，以及 4 个带有值的 Pair 实例：“Andrew Morton”、“新南威尔士大学” ”、“澳大利亚”和“Linux 程序员”。

所有其他个人资料将引用（全部或部分）KeyItem 的相同实例：姓名、教育、国家/地区和职业。

我的问题是，如何对所有这些进行索引，以便我可以在 Profile 中搜索 KeyItem::name 和 Pair::value 的某些特定值。理想情况下，我希望这种查询能够工作：

姓名：Andrew* AND 职业：Linux*

我应该创建自定义索引器和搜索器吗？或者我可以使用标准的，并以某种方式将 KeyItem 和 Pair 映射为 Lucene 组件？

原文

Basically I do have pretty simple database that I'd like to index with Lucene.
Domains are:

// Person domain
class Person {
  Set<Pair> keys;
}

// Pair domain
class Pair {
  KeyItem keyItem;
  String value;
}

// KeyItem domain, name is unique field within the DB (!!)
class KeyItem{
  String name;
}

I've tens of millions of profiles and hundreds of millions of Pairs, however, since most of KeyItem's "name" fields duplicates, there are only few dozens KeyItem instances.
Came up to that structure to save on KeyItem instances.

Basically any Profile with any fields could be saved into that structure.
Lets say we've profile with properties

- name: Andrew Morton
- eduction:  University of New South Wales, 
- country: Australia, 
- occupation: Linux programmer.

To store it, we'll have single Profile instance, 4 KeyItem instances: name, education,country and occupation, and 4 Pair instances with values: "Andrew Morton", "University of New South Wales", "Australia" and "Linux Programmer".

All other profile will reference (all or some) same instances of KeyItem: name, education, country and occupation.

My question is, how to index all of that so I can search for Profile for some particular values of KeyItem::name and Pair::value. Ideally I'd like that kind of query to work:

name:Andrew* AND occupation:Linux*

Should I create custom Indexer and Searcher? Or I could use standard ones and just map KeyItem and Pair as Lucene components somehow?

分享到QQ

分享到微博