数据库集成测试的可维护性
我正在开发一种 ETL 流程,用于将业务数据从一个数据库提取到数据仓库。该应用程序未使用 NHibinate、Linq to Sql 或实体框架。应用程序有自己生成的数据访问类,这些类生成执行 CUID 所需的 SQL 语句。
可以想象,编写生成自定义 SQL 的代码的开发人员很容易犯错误。
我想编写一个程序来生成测试数据(Arrange),然后执行 ETL 过程(Act)并验证数据仓库(Assert)。
我认为编写这样的程序并不难。然而,我担心的是,在过去,我的公司曾尝试做类似的事情,但最终导致了一系列不可维护的单元测试,这些测试不断失败,因为随着新功能的添加,数据库模式发生了许多新的变化。
我的计划是编写一个在构建机器上运行的集成测试,而不是任何单元测试来确保 ETL 过程正常运行。由于业务逻辑决定数据如何加载到数据仓库,因此测试数据不能完全随机生成。我们有自定义开发工具,可以在数据库定义发生更改时生成新的数据访问类。
我希望社区能够提供任何反馈,帮助我编写易于维护的集成测试。我的一些想法:
在版本控制(TFS)中保存备份测试数据库,当源或数据仓库有数据更改时,开发人员需要修改备份数据库。
开发人员需要通过测试程序(本例中为 C#)手动维护测试数据。该程序将为开发人员生成测试数据提供一个基本框架。
当测试数据库初始化时,它会生成随机数据。开发人员需要编写代码来覆盖某些随机生成的数据,以确保测试通过。
我欢迎任何建议 谢谢
I am developing a ETL process that extract business data from one database to a data warehouse. The application is NOT using NHibinate, Linq to Sql or Entity Framework. The application has its own generated data access classes that generate the necessary SQL statements to perform CUID.
As one can image, developers who write code that generate custom SQL can easily make mistakes.
I would like to write a program that generate testing data (Arrange), than perform the ETL process (Act) and validate the data warehouse (Assert).
I don't think it is hard to write such program. However, what I worry is that in the past my company had attempt to do something similar, and ending up with a brunch of un-maintainable unit tests that constantly fail because of many new changes to the database schema as new features are added.
My plan is to write an integration test that runs on the build machine, and not any unit tests to ensures the ETL process works. The testing data cannot be totally random generate because of business logic on determine how data are loaded to the data warehouse. We have custom development tool that generates new data access classes when there is a change in the database definition.
I would love any feedback from the community on giving me advice on write such integration test that is easy to easy to maintain. Some ideas I have:
Save a backup testing database in the version control (TFS), developers will need to modify the backup database when there are data changes to the source or data warehouse.
Developers needs to maintain testing data though the testing program (C# in this case) manually. This program would have a basic framework for developer to generate their testing data.
When the test database is initialize, it generate random data. Developers will need to write code to override certain randomly generated data to ensure the test passes.
I welcome any suggestions
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
嘿,德苏姆,
尽管我并不真正了解 ETL 的整个架构,但我想说,集成测试应该只是测试过程中的另一个步骤。
即使第一次遇到的单元测试结果一团糟,您也应该记住,在许多情况下,单个单元测试是最好的检查位置。或者您是否想将整个集成测试拆分为三路案例或某种东西。其他更深入的,为了保证这三个条件中的每一个的正确流动?
混乱的单元测试只是混乱的生产代码的结果。不要感到被冒犯。这只是我的意见。单元测试迫使编码人员保持干净的编码风格,并使整个事情更加可维护。
所以......我的目标是,您不仅要考虑对整个事情执行集成测试,因为单元测试(如果以正确的方式使用)可以更详细地关注问题。
问候,
MacX
Hey dsum,
allthough I don't really know your whole architecture of the ETL, I would say, that integration-testing should only be another step in your testing process.
Even if the unit-testing in the first encounter ended up in a mess, you should keep in mind, that for many cases a single unit-test is the best place to check. Or do you want to split the whole integration test for triple-way case or sth. other further deep down, in order to guarantee the right flow in every of the three conditions?
Messy unit-test are only the result of messy production code. Don't feel offended. That's just my opinion. Unit-tests force coders to keep a clean coding style and keep the whole thing much more maintainable.
So... my goal is, that you just think about not only to perform integration testing on the whole thing, because unit-tests (if they are used in the right way) can focus on problems in more detail.
Regards,
MacX
首先,假设我认为这是一个很好的计划,并且我已经使用 Oracle & 做了类似的事情。 PL/SQL 几年前。恕我直言,您的问题主要是组织问题,而不是技术问题:
First, let's say I think that's a good plan, and I have done something similar using Oracle & PL/SQL some years ago. IMHO your problem is mainly an organizational one, not a technical:
如果在ETL中执行数据集成测试时想要维护数据,我们也可以执行这些步骤,因为ETL过程和相关应用程序的集成测试涉及到这些步骤。例如:
1.在源系统中设置测试数据。
2.执行ETL过程将测试数据加载到目标中。
3.查看或处理目标系统中的数据。
4.验证数据和使用该数据的应用程序功能
If one wants to maintain data while performing Data integration testing in ETL, We could also go with these steps because Integration testing of the ETL process and the related applications involves in them. for eg:
1.Setup test data in the source system.
2.Execute ETL process to load the test data into the target.
3.View or process the data in the target system.
4.Validate the data and application functionality that uses the data