数据库集成测试的可维护性

发布于 2024-10-14 17:07:18 字数 740 浏览 4 评论 0原文

我正在开发一种 ETL 流程,用于将业务数据从一个数据库提取到数据仓库。该应用程序未使用 NHibinate、Linq to Sql 或实体框架。应用程序有自己生成的数据访问类,这些类生成执行 CUID 所需的 SQL 语句。

可以想象,编写生成自定义 SQL 的代码的开发人员很容易犯错误。

我想编写一个程序来生成测试数据(Arrange),然后执行 ETL 过程(Act)并验证数据仓库(Assert)。

我认为编写这样的程序并不难。然而,我担心的是,在过去,我的公司曾尝试做类似的事情,但最终导致了一系列不可维护的单元测试,这些测试不断失败,因为随着新功能的添加,数据库模式发生了许多新的变化。

我的计划是编写一个在构建机器上运行的集成测试,而不是任何单元测试来确保 ETL 过程正常运行。由于业务逻辑决定数据如何加载到数据仓库,因此测试数据不能完全随机生成。我们有自定义开发工具,可以在数据库定义发生更改时生成新的数据访问类。

我希望社区能够提供任何反馈,帮助我编写易于维护的集成测试。我的一些想法:

  1. 在版本控制(TFS)中保存备份测试数据库,当源或数据仓库有数据更改时,开发人员需要修改备份数据库。

  2. 开发人员需要通过测试程序(本例中为 C#)手动维护测试数据。该程序将为开发人员生成测试数据提供一个基本框架。

  3. 当测试数据库初始化时,它会生成随机数据。开发人员需要编写代码来覆盖某些随机生成的数据,以确保测试通过。

我欢迎任何建议 谢谢

I am developing a ETL process that extract business data from one database to a data warehouse. The application is NOT using NHibinate, Linq to Sql or Entity Framework. The application has its own generated data access classes that generate the necessary SQL statements to perform CUID.

As one can image, developers who write code that generate custom SQL can easily make mistakes.

I would like to write a program that generate testing data (Arrange), than perform the ETL process (Act) and validate the data warehouse (Assert).

I don't think it is hard to write such program. However, what I worry is that in the past my company had attempt to do something similar, and ending up with a brunch of un-maintainable unit tests that constantly fail because of many new changes to the database schema as new features are added.

My plan is to write an integration test that runs on the build machine, and not any unit tests to ensures the ETL process works. The testing data cannot be totally random generate because of business logic on determine how data are loaded to the data warehouse. We have custom development tool that generates new data access classes when there is a change in the database definition.

I would love any feedback from the community on giving me advice on write such integration test that is easy to easy to maintain. Some ideas I have:

  1. Save a backup testing database in the version control (TFS), developers will need to modify the backup database when there are data changes to the source or data warehouse.

  2. Developers needs to maintain testing data though the testing program (C# in this case) manually. This program would have a basic framework for developer to generate their testing data.

  3. When the test database is initialize, it generate random data. Developers will need to write code to override certain randomly generated data to ensure the test passes.

I welcome any suggestions
Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

飘然心甜 2024-10-21 17:07:18

嘿,德苏姆,
尽管我并不真正了解 ETL 的整个架构,但我想说,集成测试应该只是测试过程中的另一个步骤。

即使第一次遇到的单元测试结果一团糟,您也应该记住,在许多情况下,单个单元测试是最好的检查位置。或者您是否想将整个集成测试拆分为三路案例或某种东西。其他更深入的,为了保证这三个条件中的每一个的正确流动?

混乱的单元测试只是混乱的生产代码的结果。不要感到被冒犯。这只是我的意见。单元测试迫使编码人员保持干净的编码风格,并使整个事情更加可维护。

所以......我的目标是,您不仅要考虑对整个事情执行集成测试,因为单元测试(如果以正确的方式使用)可以更详细地关注问题。

问候,
MacX

Hey dsum,
allthough I don't really know your whole architecture of the ETL, I would say, that integration-testing should only be another step in your testing process.

Even if the unit-testing in the first encounter ended up in a mess, you should keep in mind, that for many cases a single unit-test is the best place to check. Or do you want to split the whole integration test for triple-way case or sth. other further deep down, in order to guarantee the right flow in every of the three conditions?

Messy unit-test are only the result of messy production code. Don't feel offended. That's just my opinion. Unit-tests force coders to keep a clean coding style and keep the whole thing much more maintainable.

So... my goal is, that you just think about not only to perform integration testing on the whole thing, because unit-tests (if they are used in the right way) can focus on problems in more detail.

Regards,
MacX

混吃等死 2024-10-21 17:07:18

首先,假设我认为这是一个很好的计划,并且我已经使用 Oracle & 做了类似的事情。 PL/SQL 几年前。恕我直言,您的问题主要是组织问题,而不是技术问题:

  • 您必须有专人负责扩展和维护测试代码。
  • 维护测试数据的责任必须明确(并提供易于测试数据维护的机制;这同样适用于您可能需要的任何验证数据)
  • 整个团队应该知道,只要测试失败,任何代码都不会进入生产环境。如果测试失败,团队的首要任务应该是修复它(代码或测试,无论是正确的)。训练他们只要测试失败就不要开发任何新功能!
  • 错误修复后,修复者应该很容易验证之前失败的集成部分之后不会失败。这意味着,应该可以从任何开发人员机器(或至少是其中的一部分)快速轻松地运行整个测试。如果您的测试太大,快速可能会给 ETL 过程带来问题,因此请专注于使用尽可能少的数据来测试大量内容。也许您可以将整个测试分解为可以逐步执行的更小的部分。

First, let's say I think that's a good plan, and I have done something similar using Oracle & PL/SQL some years ago. IMHO your problem is mainly an organizational one, not a technical:

  • You must have someone who is responsible to extend and maintain the test code.
  • Responsibility for maintaining the test data must be clear (and provide mechanisms for easy test data maintenance; same applies to any verification data you might need)
  • The whole team should know that no code will go into the production environment as long as the test fails. If the test fails, first priority of the team should be to fix it (the code or the test, whatever is right). Train them not to work on any new feature as long as the test breaks!
  • After a bug fix, it should be easy for the one who fixed it to verify that the part of the integration which failed before does not fail afterwards. That means, it should be possible to run the whole test quick and easily from any developer machine (or at least, parts of it). Quick can get a problem for an ETL process if your test is too big, so focus on testing a lot of things with as few data as possible. And perhaps you can break the whole test into smaller pieces which can be executed step-by-step.
初雪 2024-10-21 17:07:18

如果在ETL中执行数据集成测试时想要维护数据,我们也可以执行这些步骤,因为ETL过程和相关应用程序的集成测试涉及到这些步骤。例如:

1.在源系统中设置测试数据。
2.执行ETL过程将测试数据加载到目标中。
3.查看或处理目标系统中的数据。
4.验证数据和使用该数据的应用程序功能

If one wants to maintain data while performing Data integration testing in ETL, We could also go with these steps because Integration testing of the ETL process and the related applications involves in them. for eg:

1.Setup test data in the source system.
2.Execute ETL process to load the test data into the target.
3.View or process the data in the target system.
4.Validate the data and application functionality that uses the data

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文