模拟和单元测试 Solr 和 Lucene 索引

发布于 2024-11-27 04:14:58 字数 1084 浏览 1 评论 0原文

我们需要控制生产solr索引中的数据，并且需要它与新的开发兼容。理想情况下，我们希望在本地计算机上模拟索引，使用 solr 进行查询并编写单元测试来查询它以实现更快的迭代。

RamDirectory 在另一个问题中使用来执行类似的操作，但该问题是 2 年前的问题。这个示例似乎就是这样做的（使用 FSDirectory 而不是 RamDirectory）。这些是解决这个问题的正确方法吗？有更好的方法来做到这一点吗？

我们想编写如下测试：

setup mock index;
query mock index;
assert(stuff that should be true);
teardown mock index;

编辑：其他详细信息：

我们的想法是我们将构建一个索引，有一种简单的方法来添加文档，而不需要索引器和系统的其余部分，除了我们可以保留的本地数据库之外在版本控制中。过去我们生成了一个索引，当出现不兼容时，我们重新生成它。

如果我们重新索引，我们会增加大量开销，并且考虑到我们的索引器包含大量数据处理逻辑（例如将数据从数据库添加到可搜索字段），模拟索引器似乎不是一个好的选择。我们的索引器连接到外部数据库，因此我们也需要支持它。我们可以有一个如上所述的本地测试数据库，它几乎没有任何开销。

一旦我们有了测试数据库，我们就需要建立一个索引，然后我们就可以离开第二个链接上面。问题是我们如何快速构建索引以进行测试（例如 1000 个文档的大小）。

问题是我们需要使本地数据库模式与生产模式保持同步。生产模式经常发生变化，这已经成为一个问题。我们希望有一个足够灵活的测试基础设施来处理这个问题 - 目前的方法只是每次重建数据库，这很慢并且会惹恼其他人！

原文

We need control of the data in the production solr index and we need it to be compatible with new development. Ideally, we'd like to mock the index on local machines, query with it solr and write unit tests to query it for quicker iterations.

RamDirectory is used in another question to do something similar but the question is from 2 years back. This example appears to do just that (using FSDirectory instead of RamDirectory). Are these the right approaches to this problem? Are there better ways to do this?

We'd like to write tests like:

setup mock index;
query mock index;
assert(stuff that should be true);
teardown mock index;

EDIT: Additional details:

Our thought was we would build an index, have a simple way of adding documents without needing the indexer and the rest of the system, except perhaps a local database that we could keep in version control. In the past we generated an index and when incompatibilities arose, we regenerated it.

If we re-index, we're adding in a lot of overhead, and mocking the indexer doesn't seem like a good option given that our indexer contains a lot of data processing logic (like adding data to searchable fields from a db). Our indexer connects to an external db so we'd need to support that too. We could have a local test database as stated above which has little no overhead.

Once we have a test db, we need to build an index and then we could go off the second link above. The question becomes how do we build an index really quickly for testing, say of the size 1000 documents.

The problem with this is we then need to keep our local db schema in sync with the production schema. The production schema changes often enough that this is a problem. We'd like to have a test infrastructure that's flexible enough to handle this- the approach as of now is just rebuild the database each time which is slow and pisses off other people!

分享到QQ

分享到微博