模拟和单元测试 Solr 和 Lucene 索引
我们需要控制生产solr索引中的数据,并且需要它与新的开发兼容。理想情况下,我们希望在本地计算机上模拟索引,使用 solr 进行查询并编写单元测试来查询它以实现更快的迭代。
RamDirectory 在另一个问题中使用 来执行类似的操作,但该问题是 2 年前的问题。这个 示例 似乎就是这样做的(使用 FSDirectory 而不是 RamDirectory)。这些是解决这个问题的正确方法吗?有更好的方法来做到这一点吗?
我们想编写如下测试:
setup mock index;
query mock index;
assert(stuff that should be true);
teardown mock index;
编辑:其他详细信息:
我们的想法是我们将构建一个索引,有一种简单的方法来添加文档,而不需要索引器和系统的其余部分,除了我们可以保留的本地数据库之外在版本控制中。过去我们生成了一个索引,当出现不兼容时,我们重新生成它。
如果我们重新索引,我们会增加大量开销,并且考虑到我们的索引器包含大量数据处理逻辑(例如将数据从数据库添加到可搜索字段),模拟索引器似乎不是一个好的选择。我们的索引器连接到外部数据库,因此我们也需要支持它。我们可以有一个如上所述的本地测试数据库,它几乎没有任何开销。
一旦我们有了测试数据库,我们就需要建立一个索引,然后我们就可以离开 第二个链接上面。问题是我们如何快速构建索引以进行测试(例如 1000 个文档的大小)。
问题是我们需要使本地数据库模式与生产模式保持同步。生产模式经常发生变化,这已经成为一个问题。我们希望有一个足够灵活的测试基础设施来处理这个问题 - 目前的方法只是每次重建数据库,这很慢并且会惹恼其他人!
We need control of the data in the production solr index and we need it to be compatible with new development. Ideally, we'd like to mock the index on local machines, query with it solr and write unit tests to query it for quicker iterations.
RamDirectory is used in another question to do something similar but the question is from 2 years back. This example appears to do just that (using FSDirectory instead of RamDirectory). Are these the right approaches to this problem? Are there better ways to do this?
We'd like to write tests like:
setup mock index;
query mock index;
assert(stuff that should be true);
teardown mock index;
EDIT: Additional details:
Our thought was we would build an index, have a simple way of adding documents without needing the indexer and the rest of the system, except perhaps a local database that we could keep in version control. In the past we generated an index and when incompatibilities arose, we regenerated it.
If we re-index, we're adding in a lot of overhead, and mocking the indexer doesn't seem like a good option given that our indexer contains a lot of data processing logic (like adding data to searchable fields from a db). Our indexer connects to an external db so we'd need to support that too. We could have a local test database as stated above which has little no overhead.
Once we have a test db, we need to build an index and then we could go off the second link above. The question becomes how do we build an index really quickly for testing, say of the size 1000 documents.
The problem with this is we then need to keep our local db schema in sync with the production schema. The production schema changes often enough that this is a problem. We'd like to have a test infrastructure that's flexible enough to handle this- the approach as of now is just rebuild the database each time which is slow and pisses off other people!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您使用 Solr,我什至不会费心模拟或模拟(即不要更改其配置)。
相反,编写一个设置 solr 索引的集成测试。设置只是像平常一样对数据建立索引。您可能希望您的开发人员运行他们自己的 solr。
我不会太担心速度,因为 solr 索引速度快得令人难以置信(对于我们的环境,不到 30 秒就可以索引 100,000 个文档……事实上,瓶颈是从数据库中提取数据)。
所以实际上你的模拟索引应该只是你将索引到 solr 中的生产数据的一小部分(你可以使用 @BeforeClass 对每个 TestCase 类执行一次)。
编辑(基于您的编辑):
我将告诉您我们如何做到这一点(以及我看到其他人如何做到这一点):
我们有一个开发模式/数据库和生产模式/数据库。当开发人员从事工作时,他们只需复制“构建机器”开发数据库并在本地恢复它。该数据库比生产数据库小得多,非常适合测试。您的生产数据库不应与开发数据库模式有太大不同(如果是这种情况,请进行较小的更改并更频繁地发布。)
If you are using Solr I wouldn't even bother with mocking or emulating (ie don't change its config).
Instead write an integration test that sets up your solr index. The setting up would be to just to index the data like you normally would. You will probably want your developers to run their own solr.
I wouldn't worry that much about speed because solr indexes incredible fast (100,000 documents in less than 30 seconds for our environment... infact the bottle neck is pulling the data from the database).
So really your mock index should just be a small subset of production data that you will index into solr (you can do this once for each TestCase class with @BeforeClass).
EDIT (based on your Edits):
I'll tell you how we do it (and how I have seen others do it):
We have a development schema/db and production schema/db. When developers are working on stuff they just make a copy of the "build machines" development database and restore it locally. This database is much smaller than the production db and is ideal for testing. Your production db should no be that much different than your development db schema wise (make smaller changes and release more often if it is the case.)