将 Solr 作为索引与 Oracle 作为存储数据库集成的最佳方式是什么?

发布于 2024-09-25 17:36:55 字数 999 浏览 6 评论 0原文

我有一个包含所有“数据”的 Oracle 数据库,以及一个对所有这些数据进行索引的 Solr 索引。理想情况下,我希望能够运行这样的查询:

select * from data_table where id in ([solr query results for 'search string']);

然而,一个关键问题出现了: Oracle 不允许“in”子句中的项目数组中包含超过 1000 个项目(大事,因为我找到的对象列表通常 > 1000,并且通常在 50-200k 项目左右)

我尝试过使用“split”函数解决此问题,该函数将采用一串逗号分隔值,并将它们分解为数组项,但随后我使用 SQL 达到了函数参数的 4000 个字符限制(PL/SQL 为 32k 字符) ,但在某些情况下对于 80,000 多个结果来说仍然太有限)

我还使用 WHERE IN (....) 遇到性能问题,我被告知这会导致查询非常慢,即使引用的字段是索引字段?

我尝试为 1000 项限制制作递归“OR”(又名: id in (1...1000 或 (id in (1001....2000) 或 id in (2001....3000) )) - 这可以工作,但是非常慢,

我想我应该将 Solr 客户端 JAR 加载到 Oracle 中,并用 Java 编写一个 Oracle 函数,该函数将调用 solr 并通过管道返回结果:一个列表,这样我就可以执行以下操作:

select * from data_table where id in (select * from table(runSolrQuery('my query text')));

事实证明这非常困难,我不确定这是否可能。

我不能做的事情:

  • 将完整数据存储在 Solr 中(安全性 + 存储限制)
  • 用户 Solr 作为 分页和排序控制器 (这就是为什么我从 DB)

因此我必须制定一种混合方法,让 Solr 真正充当 Oracle 的全文搜索提供程序。帮助!有人遇到过这个吗?

I have an Oracle database with all the "data", and a Solr index where all this data is indexed. Ideally, I want to be able to run queries like this:

select * from data_table where id in ([solr query results for 'search string']);

However, one key issue arises:
Oracle WILL NOT allow more than 1000 items in the array of items in the "in" clause (BIG DEAL, as the list of objects I find is very often > 1000 and will usually be around the 50-200k items)

I have tried to work around this using a "split" function that will take a string of comma-separated values, and break them down into array items, but then I hit the 4000 char limit on the function parameter using SQL (PL/SQL is 32k chars, but it's still WAY too limiting for 80,000+ results in some cases)

I am also hitting performance issues using a WHERE IN (....), I am told that this causes a very slow query, even when the field referenced is an indexed field?

I've tried making recursive "OR"s for the 1000-item limit (aka: id in (1...1000 or (id in (1001....2000) or id in (2001....3000))) - and this works, but is very slow.

I am thinking that I should load the Solr Client JARs into Oracle, and write an Oracle Function in Java that will call solr and pipeline back the results as a list, so that I can do something like:

select * from data_table where id in (select * from table(runSolrQuery('my query text')));

This is proving quite hard, and I am not sure it's even possible.

Things that I can't do:

  • Store full data in Solr (security +
    storage limits)
  • User Solr as
    controller of pagination and ordering
    (this is why I am fetching data from
    the DB)

So I have to cook up a hybrid approach where Solr really act like the full-text search provider for Oracle. Help! Has anyone faced this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

内心旳酸楚 2024-10-02 17:36:55

看看这个:
http://demo.scotas.com/search-sqlconsole.php

该产品似乎完全按照您的需要去做。

干杯

Check this out:
http://demo.scotas.com/search-sqlconsole.php

This product seems to do exactly what you need.

cheers

毁梦 2024-10-02 17:36:55

我不是 Solr 专家,但我假设您可以将 Solr 查询结果获取到 Java 集合中。一旦有了它,您应该能够通过 JDBC 使用该集合。这避免了 1000 个文字项的限制,因为您的 IN 列表将是查询的结果,而不是文字值列表。

Dominic Brooks 有一个通过 JDBC 使用对象集合的示例。您可以在 Java 中的 Oracle 中执行类似

创建几种类型的

CREATE TYPE data_table_id_typ AS OBJECT (
  id NUMBER
);

CREATE TYPE data_table_id_arr AS TABLE OF data_table_id_typ;

操作,然后您可以创建一个适当的 STRUCT 数组,从 Solr 填充该数组,然后将其绑定到 SQL 语句

SELECT *
  FROM data_table
 WHERE id IN (SELECT * FROM TABLE( CAST (? AS data_table_id_arr)))

I'm not a Solr expert, but I assume that you can get the Solr query results into a Java collection. Once you have that, you should be able to use that collection with JDBC. That avoids the limit of 1000 literal items because your IN list would be the result of a query, not a list of literal values.

Dominic Brooks has an example of using object collections with JDBC. You would do something like

Create a couple of types in Oracle

CREATE TYPE data_table_id_typ AS OBJECT (
  id NUMBER
);

CREATE TYPE data_table_id_arr AS TABLE OF data_table_id_typ;

In Java, you can then create an appropriate STRUCT array, populate this array from Solr, and then bind it to the SQL statement

SELECT *
  FROM data_table
 WHERE id IN (SELECT * FROM TABLE( CAST (? AS data_table_id_arr)))
闻呓 2024-10-02 17:36:55

您可以使用TermsFilter(与RangeFilter类似,但项目不必按顺序排列),而不是使用长BooleanQuery。

像这样(首先用术语填充您的TermsFilter):

TermsFilter termsFilter = new TermsFilter();

        // Loop through terms and add them to filter
        Term term = new Term("<field-name>", "<query>");
        termsFilter.addTerm(term);

然后像这样搜索索引:

DocList parentsList = null;
parentsList = searcher.getDocList(new MatchAllDocsQuery(),  searcher.convertFilter(termsFilter), null, 0, 1000);

其中搜索器是SolrIndexSearcher(有关getDocList方法的更多信息,请参阅java文档):
http://lucene.apache.org/solr/api /org/apache/solr/search/SolrIndexSearcher.html

Instead of using a long BooleanQuery, you can use TermsFilter (works like RangeFilter, but the items doesn't have to be in sequence).

Like this (first fill your TermsFilter with terms):

TermsFilter termsFilter = new TermsFilter();

        // Loop through terms and add them to filter
        Term term = new Term("<field-name>", "<query>");
        termsFilter.addTerm(term);

then search the index like this:

DocList parentsList = null;
parentsList = searcher.getDocList(new MatchAllDocsQuery(),  searcher.convertFilter(termsFilter), null, 0, 1000);

Where searcher is SolrIndexSearcher (see java doc for more info on getDocList method):
http://lucene.apache.org/solr/api/org/apache/solr/search/SolrIndexSearcher.html

如梦初醒的夏天 2024-10-02 17:36:55

我想到了两种解决方案。

首先,研究使用 Oracle 特定于 JDBC 的 Java 扩展。它们允许您传入实际的数组/列表作为参数。您可能需要创建一个存储过程(自从我不得不这样做以来已经有一段时间了),但如果这是一个重点用例,那么它不应该过于繁重。

其次,如果您仍然遇到 1000 个对象限制之类的边界,请考虑在查询 Solr 时使用“行”设置并利用其固有的分页功能。

我使用这种批量获取方法和存储过程来获取需要放入 Solr 的大量数据。让您的 DBA 参与进来。如果您有一个好的,并使用 Oracle 特定的扩展,我认为您应该获得非常合理的性能。

Two solutions come to mind.

First, look into using Oracle specific Java extensions to JDBC. They allow you to pass in an actual array/list as an argument. You may need to create a stored proc (it has a been a while since I had to do this), but if this is a focused use case, it shouldn't be overly burdensome.

Second, if you are still running into a boundary like 1000 object limits, consider using the "rows" setting when querying Solr and leveraging it's inherent pagination feature.

I've used this bulk fetching method with stored procs to fetch large quantities of data which needed to be put into Solr. Involve your DBA. If you have a good one, and use the Oracle specific extensions, I think you should attain very reasonable performance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文