使用 Lucene 查询 RDBMS 数据库
我浏览了 Lucene Java 版本的文档,但到目前为止我还没有真正看到顶级的“这就是它的工作原理”信息(我知道我需要 RTFM,我只是看不到树木为木)。
我了解 Lucene 使用搜索索引来返回结果。据我所知,它只返回这些索引的“命中”。如果我在构建索引时没有添加数据项,那么它不会被返回。
没关系,现在我想检查以下假设:
问:这是否意味着我想要在搜索页面上显示的任何数据都需要添加到 Lucene 索引中?
即
如果我想通过 sku、描述、类别名称等搜索 Product
,但我还想在搜索结果中显示它们所属的 Customer
,请执行以下操作I:
- 确保 Lucene 索引在索引中包含非规范化的
Customer
名称。 - 使用 Lucene 返回的命中以某种方式在数据库中查询实际产品记录,并使用
JOIN
获取Customer
的名称。
我假设它是选项1,因为我假设没有办法将 Lucene 查询的结果“连接”到 RDBMS,但想问一下我对一般用法的假设是否正确。
I've skimmed the docs for the Java version of Lucene, but I can't really see the top-level "this is how it works" info so far (I'm aware I need to RTFM, I just can't see the wood for the trees).
I understand Lucene uses search indexes to return results. As far as I know, it only returns "hits" from those indexes. If I haven't added an item of data when building the index then it won't be returned.
That's fine, so now I want to check the following assumption:
Q: Does that mean that any data I want displayed on a search page needs to be added to the Lucene index?
I.e.
If I want to search for Product
s by things like sku, description, category name, etc, but I also want to display the Customer
they belong to in search results, do I:
- Make sure the Lucene index has the denormalised
Customer
's name in the index. - Use the hits returned by Lucene to somehow query the database for the actual product records and use a
JOIN
to get theCustomer
's name.
I assume it's option 1, since I'm assuming there's no way to "join" the results of a Lucene query to an RDBMS, but wanted to ask it my assumptions about the general usage are correct.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
通常索引只包含您想要搜索的字段,不一定包含您想要显示的字段。索引应优化为尽可能小,以保持良好的搜索性能。
为了能够显示更多数据,请在索引中添加一个字段,以便您检索完整的文档/数据,即您的
Product
的唯一键(产品 ID?)。Usually the index would only contain the fields you want to search on, not necessarily the ones you want to display. Indexes should be optimized to be as small as possible, to keep search performance good.
To be able to display more data add a field to your index that allows you to retrieve your full document/data, i.e. a unique key for your
Product
(product id?).我一直在试图解决同样的问题,但我认为工作量太大了。我正在考虑将此作为替代方案。如果我的想法有误,请纠正我!
你的情况是这样的:
RDBMS 产品(很多)<------> (很多)客户
我建议不要只将客户放入 lucene 索引中以获取产品密钥,然后使用 IN Query 查询 RDBMS,而是使用 Product 和 Customer 的笛卡尔积创建 lucene 索引。
喜欢
客户_1、产品_1
客户_1、产品_2
customer_2、product_2..
这样,当您在 lucene 中搜索产品时,它会同时提供客户和产品 id.. 并且无需将它们加入 RDBMS,您只需查找这些客户以及如果需要,请从 RDBMS 产品获取更多信息。如果您使用缓存,那么额外的详细信息查找成本也会下降。
I have been trying to figure out the same problem, but I think that its too much work. I'm thinking of this as an alternative. Plse correct me if I'm wrong in my thinking!
Your situation is like this:
RDBMS product (many) <------> (many) Customer
Instead of putting only customer in lucene index to get product keys, and then query RDBMS with IN Query, I'd suggest, create the lucene index with the cartesian product of Product as well as Customer.
Like
customer_1, product_1
customer_1, product_2
customer_2, product_2..
This way, when you are searching for a product in lucene, it will give both the customer as well as the products id.. and instead of joining them in RDBMS, you can simply look up those customers as well as products for more information from RDBMS, if there is a need. If you are using caching, then the additional details lookup cost will also go down.
根据 BrokenGlass 的回答,我想到了一些更多,我提出以下建议,看看我是否正确:
基本上,进一步采用选项 2,可以执行以下操作:
IN (value [, value])
谓词的查询。JOIN
)。当然,第 5 步和第 6 步可能会更好,但为了解释起见,我在描述中添加了详细的方法。如果 Lucene 命中包含某种“相关性”值,那么您可以将其归因于结果集并执行标准排序,但这对读者来说是一个练习。 :)
可能是这个吗?
Based on BrokenGlass's answer, I've thought some more and am proposing the following to see if I'm on the right lines:
Basically, taking option 2 further, one could do the following:
IN (value [, value])
predicate.JOIN
s to other tables).Of course steps 5 and 6 could be better, but for the sake of explanation I put that verbose method in my description. If the Lucene hits include some sort of "relevance" value, then you could attribute that to the resultset and perform a standard sort, but that's an exercise for the reader. :)
Could this be it?