DataImportHandler 未报告错误但未完成

发布于 2024-12-29 09:29:01 字数 856 浏览 2 评论 0原文

我试图说服 solr 执行 sqlite 数据库的批量导入。我将 DataImportHandler 配置为通过 jdbc 成功打开该数据库，并且可以使用 wget http: //localhost:8080/solr/dataimport?command=full-import 但无论我做什么，solr 似乎只索引前 499 个文档（如 wget http://localhost:8080/solr/dataimport?command=status）。

jetty 日志文件不报告任何错误消息。相反，它报告索引结束：

27-Jan-2012 19:08:13 org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties
INFO: Read dataimport.properties
27-Jan-2012 19:08:13 org.apache.solr.handler.dataimport.SolrWriter persist
INFO: Wrote last indexed time to dataimport.properties
27-Jan-2012 19:08:13 org.apache.solr.handler.dataimport.DocBuilder execute
INFO: Time taken = 0:0:1.145

我可能做错了什么？

原文

I am trying to convince solr to perform a bulk import of a sqlite database. I configured DataImportHandler to open that database through jdbc successfully and I can start the import with a wget http://localhost:8080/solr/dataimport?command=full-import but whatever I do, solr appears to be indexing only the first 499 documents (as reported by wget http://localhost:8080/solr/dataimport?command=status).

The jetty log file does not report any error message. Instead, it reports the end of indexing:

27-Jan-2012 19:08:13 org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties
INFO: Read dataimport.properties
27-Jan-2012 19:08:13 org.apache.solr.handler.dataimport.SolrWriter persist
INFO: Wrote last indexed time to dataimport.properties
27-Jan-2012 19:08:13 org.apache.solr.handler.dataimport.DocBuilder execute
INFO: Time taken = 0:0:1.145

What could I have done wrong ??

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦屿孤独相伴 2025-01-05 09:29:01

我知道回答自己的问题不是很好，但我最终找出了导致此错误的讨厌问题。

用于为特定数据源配置 solr 的指令是这样的：

<dataSource type="JdbcDataSource" driver="org.sqlite.JDBC" url="jdbc:sqlite:/foo.db"/>

默认情况下，JdbcDataSource 类读取此 XML 节点的 batchSize 属性，并假设它设置为 500，除非指定。所以，上面的内容实际上相当于：

<dataSource type="JdbcDataSource" ... batchSize="500"/>

现在，JdbcDataSource 将batchSize 传递给底层JDBC 驱动程序（在本例中为sqlite jdbc 驱动程序）的setFetchSize 方法。该驱动程序假定该方法实际上要求它限制返回的行数，因此在这种情况下永远不会返回超过 500 行。我对 JDBC API 的预期语义还不够熟悉，无法判断是否是 sqlite 驱动程序在解释该值的方式上出现了错误，或者是否是 solr JdbcDataSource 类在它认为驱动程序的方式上出现了错误对此方法调用做出反应。

不过，我所知道的是，修复方法是指定batchSize =“0”，因为sqlite jdbc驱动程序假定值为零意味着：“未指定行限制”。我将此提示添加到相应的 solr FAQ 页面。

I know that it is not very good taste to answer one's own question but I eventually figured out the nasty problem that caused this error.

The directive used to configure solr for a specific data source is this:

<dataSource type="JdbcDataSource" driver="org.sqlite.JDBC" url="jdbc:sqlite:/foo.db"/>

By default, the JdbcDataSource class reads the batchSize attribute of this XML node and assumes it to be set to 500 unless specified. So, The above was in fact equivalent to:

<dataSource type="JdbcDataSource" ... batchSize="500"/>

Now, JdbcDataSource passes batchSize to the method setFetchSize of the underlying JDBC driver (in this case, the sqlite jdbc driver). This driver assumes that this method is in fact asking it to limit the number of rows returned and thus never returns more than 500 rows in this case. I am not sufficiently familiar with the expected semantics of the JDBC API to be able to tell whether it is the sqlite driver which is wrong in how it interprets this value or whether it is the solr JdbcDataSource class that is wrong in how it thinks drivers will react to this method call.

What I know, though, is that the fix is to specify batchSize="0" because the sqlite jdbc driver assumes that a value of zero means: "no row limit specified". I added this tip to the corresponding solr FAQ page.

回复收藏 0 原文

~没有更多了~