对大块代码进行单元测试(映射、翻译等)

发布于 2024-08-17 16:31:58 字数 1076 浏览 3 评论 0原文

我们对大部分业务逻辑进行单元测试,但仍停留在如何最好地测试一些大型服务任务和导入/导出例程上。例如,考虑将工资单数据从一个系统导出到第三方系统。为了以公司需要的格式导出数据,我们需要访问约 40 个表,这为创建测试数据和模拟依赖关系带来了噩梦般的情况。

例如,考虑以下内容(大约 3500 行导出代码的子集):

public void ExportPaychecks()
{
   var pays = _pays.GetPaysForCurrentDate();
   foreach (PayObject pay in pays)
   {
      WriteHeaderRow(pay);
      if (pay.IsFirstCheck)
      {
         WriteDetailRowType1(pay);
      }
   }
}

private void WriteHeaderRow(PayObject pay)
{
   //do lots more stuff
}

private void WriteDetailRowType1(PayObject pay)
{
   //do lots more stuff
}

在这个特定的导出类中,我们只有一个公共方法 - ExportPaychecks()。这确实是对调用此类的人有意义的唯一操作...其他所有内容都是私有的(约 80 个私有函数)。我们可以将它们公开进行测试,但随后我们需要模拟它们来单独测试每个(即,如果不模拟 WriteHeaderRow 函数,您就无法在真空中测试 ExportPaychecks。这也是一个巨大的痛苦。

因为这是一个对于单个供应商来说,将逻辑移至域中是没有意义的。该逻辑在该特定类之外没有域意义。作为测试,我们构建了接近 100% 代码覆盖率的单元测试。但这需要将大量测试数据输入到存根/模拟对象中,再加上存根/模拟我们的许多依赖项,

我们有数百个导出和导入项。 减轻痛苦吗?我很想说“不对导入/导出例程进行单元测试”,然后再实施集成测试。

真的对这种类型的事情进行单元测试吗?如果是这样,有什么捷径可以 /strong> - 感谢大家的回答,我很想看到一个例子,因为我仍然没有看到有人如何将大文件导出之类的东西变成易于测试的代码块,而无需转换代码。陷入混乱。

We unit test most of our business logic, but are stuck on how best to test some of our large service tasks and import/export routines. For example, consider the export of payroll data from one system to a 3rd party system. To export the data in the format the company needs, we need to hit ~40 tables, which creates a nightmare situation for creating test data and mocking out dependencies.

For example, consider the following (a subset of ~3500 lines of export code):

public void ExportPaychecks()
{
   var pays = _pays.GetPaysForCurrentDate();
   foreach (PayObject pay in pays)
   {
      WriteHeaderRow(pay);
      if (pay.IsFirstCheck)
      {
         WriteDetailRowType1(pay);
      }
   }
}

private void WriteHeaderRow(PayObject pay)
{
   //do lots more stuff
}

private void WriteDetailRowType1(PayObject pay)
{
   //do lots more stuff
}

We only have the one public method in this particular export class - ExportPaychecks(). That's really the only action that makes any sense to someone calling this class ... everything else is private (~80 private functions). We could make them public for testing, but then we'd need to mock them to test each one separately (i.e. you can't test ExportPaychecks in a vacuum without mocking the WriteHeaderRow function. This is a huge pain too.

Since this is a single export, for a single vendor, moving logic into the Domain doesn't make sense. The logic has no domain significance outside of this particular class. As a test, we built out unit tests which had close to 100% code coverage ... but this required an insane amount of test data typed into stub/mock objects, plus over 7000 lines of code due to stubbing/mocking our many dependencies.

As a maker of HRIS software, we have hundreds of exports and imports. Do other companies REALLY unit test this type of thing? If so, are there any shortcuts to make it less painful? I'm half tempted to say "no unit testing the import/export routines" and just implement integration testing later.

Update - thanks for the answers all. One thing I'd love to see is an example, as I'm still not seeing how someone can turn something like a large file export into an easily testable block of code without turning the code into a mess.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

梦言归人 2024-08-24 16:31:58

这种(尝试的)单元测试风格,你试图通过一个公共方法覆盖整个巨大的代码库,总是让我想起外科医生、牙医或妇科医生,他们通过小开口执行复杂的操作。有可能,但并不容易。

封装是面向对象设计中的一个古老概念,但有些人将其推向极端,导致可测试性受到影响。还有另一个面向对象原则,称为开放/封闭原则,它更适合可测试性。封装仍然有价值,但不以牺牲可扩展性为代价 - 事实上,可测试性确实是只是开闭原则的另一种说法

我并不是说您应该公开您的私有方法,而是说您应该考虑将您的应用程序重构为可组合的部分 - 许多相互协作的小类,而不是一个大的 交易脚本。您可能认为对于单一供应商的解决方案这样做没有多大意义,但现在您正在遭受痛苦,而这是一种出路。

当您在复杂的 API 中拆分单个方法时,经常会发生的情况是您还获得了很多额外的灵活性。一开始作为一次性项目的项目可能会变成一个可重用的库。


以下是关于如何针对当前问题进行重构的一些想法: 每个 ETL 应用程序必须至少执行以下三个步骤:

  1. 从源中提取数据 转换
  2. 数据
  3. 将数据加载到目标中

(因此,名称ETL)。作为重构的开始,这为我们提供了至少三个具有不同职责的类:ExtractorTransformerLoader。现在,不再是一个大类,而是三个具有更有针对性的职责。没什么乱七八糟的,而且已经更容易测试了。

现在放大这三个领域中的每一个,看看在哪里可以进一步划分职责。

  • 至少,您需要源数据的每一“行”在内存中都有一个良好的表示。如果源是关系数据库,您可能想要使用 ORM,但如果不是,则需要对此类类进行建模,以便它们正确保护每行的不变量(例如,如果字段不可为空,则该类应保证如果尝试使用 null 值,则抛出异常)。此类具有明确的用途,并且可以单独进行测试。
  • 对于目的地也是如此:您需要一个良好的对象模型。
  • 如果源头正在进行高级应用程序端过滤,您可以考虑使用规范来实现这些过滤设计模式。这些往往也非常容易测试。
  • 转换步骤是许多操作发生的地方,但是现在您已经拥有了源和目标的良好对象模型,可以通过映射器(同样是可测试的类)来执行转换。

如果您有许多源数据和目标数据“行”,则可以在映射器中进一步将其拆分为每个逻辑“行”等。

它永远不需要变得混乱,并且附加的好处(除了自动化测试之外)是对象模型现在更加灵活。如果您需要编写涉及两侧之一的另一个 ETL 应用程序,那么您至少已经编写了三分之一的代码。

This style of (attempted) unit testing where you try to cover an entire huge code base through a single public method always reminds me of surgeons, dentists or gynaecologists whe have perform complex operations through small openings. Possible, but not easy.

Encapsulation is an old concept in object-oriented design, but some people take it to such extremes that testability suffers. There's another OO principle called the Open/Closed Principle that fits much better with testability. Encapsulation is still valuable, but not at the expense of extensibility - in fact, testability is really just another word for the Open/Closed Principle.

I'm not saying that you should make your private methods public, but what I am saying is that you should consider refactoring your application into composable parts - many small classes that collaborate instead of one big Transaction Script. You may think it doesn't make much sense to do this for a solution to a single vendor, but right now you are suffering, and this is one way out.

What will often happen when you split up a single method in a complex API is that you also gain a lot of extra flexibility. What started out as a one-off project may turn into a reusable library.


Here are some thoughts on how to perform a refactoring for the problem at hand: Every ETL application must perform at least these three steps:

  1. Extract data from the source
  2. Transform the data
  3. Load the data into the destination

(hence, the name ETL). As a start for refactoring, this give us at least three classes with distinct responsibilities: Extractor, Transformer and Loader. Now, instead of one big class, you have three with more targeted responsibilities. Nothing messy about that, and already a bit more testable.

Now zoom in on each of these three areas and see where you can split up responsibilities even more.

  • At the very least, you will need a good in-memory representation of each 'row' of source data. If the source is a relational database, you may want to use an ORM, but if not, such classes need to be modeled so that they correctly protect the invariants of each row (e.g. if a field is non-nullable, the class should guarantee this by throwing an exception if a null value is attempted). Such classes have a well-defined purpose and can be tested in isolation.
  • The same holds true for the destination: You need a good object model for that.
  • If there's advanced application-side filtering going on at the source, you could consider implementing these using the Specification design pattern. Those tend to be very testable as well.
  • The Transform step is where a lot of the action happens, but now that you have good object models of both source and destination, transformation can be performed by Mappers - again testable classes.

If you have many 'rows' of source and destination data, you can further split this up in Mappers for each logical 'row', etc.

It never needs to become messy, and the added benefit (besides automated testing) is that the object model is now way more flexible. If you ever need to write another ETL application involving one of the two sides, you alread have at least one third of the code written.

雪若未夕 2024-08-24 16:31:58

关于重构,我想到了一些一般性的事情:

重构并不意味着您将 3.5k LOC 分成 n 个部分。我不建议将您的 80 个方法中的某些方法公开或类似的东西。它更像是垂直分割你的代码:

  • 尝试分解独立的算法和数据结构,如解析器、渲染器、搜索操作、转换器、专用数据结构......
  • 尝试弄清楚你的数据是否分几个步骤进行处理,并且可以构建在一种管道和过滤器机制或分层架构中。尝试找到尽可能多的层。
  • 将技术(文件、数据库)部分与逻辑部分分开。
  • 如果您有许多这样的导入/导出怪物,请了解它们的共同点,并将其分解并重复使用。
  • 一般来说,您的代码可能过于密集,即它在太少的 LOC 中包含太多不同的功能。访问代码中的不同“发明”,并思考它们实际上是否是值得拥有自己的类的棘手设施。
    • 重构时,LOC 和类的数量都可能增加
    • 尝试使类内部的代码真正简单(“婴儿代码”),而类之间的关系变得复杂。

因此,您根本不必编写覆盖整个 3.5k LOC 的单元测试。单个测试仅涵盖其中的一小部分,并且您将有许多彼此独立的小测试。


编辑

这是一个很好的重构模式列表。其中,一个很好地表达了我的意图:分解条件

在该示例中,某些表达式被分解为方法。不仅使代码更易于阅读,而且您还获得了对这些方法进行单元测试的机会。

更好的是,您可以将此模式提升到更高的水平,并将这些表达式、算法、值等不仅分解为方法,还分解为它们自己的类。

Something general that came to my mind about refactoring:

Refactoring does not mean you take your 3.5k LOC and divide it into n parts. I would not recommend to make some of your 80 methods public or stuff like this. It's more like vertically slicing your code:

  • Try to factor out self-standing algorithms and data structures like parsers, renderers, search operations, converters, special-purpose data structures ...
  • Try to figure out if your data is processed in several steps and can be build in a kind of pipe and filter mechanism, or tiered architecture. Try to find as many layers as possible.
  • Separate technical (files, database) parts from logical parts.
  • If you have many of these import/export monsters see what they have in common and factor that parts out and reuse them.
  • Expect in general that your code is too dense, i.e. it contains too many different functionalities next to each in too few LOC. Visit the different "inventions" in your code and think about if they are in fact tricky facilities that are worth having their own class(es).
    • Both LOC and number of classes are likely to increase when you refactor.
    • Try to make your code real simple ('baby code') inside classes and complex in the relations between the classes.

As a result, you won't have to write unit tests that cover the whole 3.5k LOC at all. Only small fractions of it are covered in a single test, and you'll have many small tests that are independent from each other.


EDIT

Here's a nice list of refactoring patterns. Among those, one shows quite nicely my intention: Decompose Conditional.

In the example, certain expressions are factored out to methods. Not only becomes the code easier to read but you also achieve the opportunity to unit test those methods.

Even better, you can lift this pattern to a higher level and factor out those expressions, algorithms, values etc. not only to methods but also to their own classes.

凉宸 2024-08-24 16:31:58

您最初应该进行的是集成测试。这些将测试函数是否按预期执行,并且您可以为此访问实际数据库。

一旦有了安全网,您就可以开始重构代码以使其更易于维护并引入单元测试。

正如 serbrech 所提到的,有效地使用遗留代码将帮助您无休无止,我强烈建议您阅读它,即使对于新建项目也是如此。

http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp /0131177052

我想问的主要问题是代码多久更改一次?如果它不频繁,那么真的值得努力引入单元测试,如果它经常更改,那么我肯定会考虑稍微清理一下它。

What you should have initially are integration tests. These will test that the functions perform as expected and you could hit the actual database for this.

Once you have that savety net you could start refactoring the code to be more maintainable and introducing unit tests.

As mentioned by serbrech Workign Effectively with Legacy code will help you to no end, I would strongly advise reading it even for greenfield projects.

http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052

The main question I would ask is how often does the code change? If it is infrequent is it really worth the effort trying to introduce unit tests, if it is changed frequently then I would definatly consider cleaning it up a bit.

倾`听者〃 2024-08-24 16:31:58

听起来集成测试可能就足够了。特别是如果这些导出例程一旦完成就不会改变或仅在有限的时间内使用。只需获取一些具有变化的示例输入数据,并进行测试来验证最终结果是否符合预期。

您的测试的一个问题是您必须创建大量的虚假数据。您可以通过创建共享夹具来减少这种情况(http://xunitpatterns.com/Shared%20Fixture .html)。对于单元测试,固定装置可能是要导出的业务对象的内存中表示,或者对于集成测试的情况,它可能是使用已知数据初始化的实际数据库。关键是,无论您如何生成共享固定装置,在每个测试中都是相同的,因此创建新测试只需对现有固定装置进行细微调整即可触发您想要测试的代码。

那么你应该使用集成测试吗?一个障碍是如何设置共享夹具。如果您可以在某处复制数据库,则可以使用 DbUnit 之类的工具来准备共享夹具。将代码分成几部分(导入、转换、导出)可能会更容易。然后使用基于 DbUnit 的测试来测试导入和导出,并使用常规单元测试来验证转换步骤。如果您这样做,则不需要 DbUnit 为变换步骤设置共享夹具。如果您可以将代码分为 3 个步骤(提取、转换、导出),至少您可以将测试工作集中在可能存在错误或稍后更改的部分。

It sounds like integration tests may be sufficient. Especially if these export routines that don't change once their done or are only used for a limited time. Just get some sample input data with a variations, and have a test that verifies the final result is as expected.

A concern with your tests was the amount of fake data you had to create. You may be able to reduce this by creating a shared fixture (http://xunitpatterns.com/Shared%20Fixture.html). For unit tests the fixture which may be an in-memory representation of business objects to export, or for the case on integration tests it may be the actual databases initialized with known data. The point is that however you generate the shared fixture is the same in each test, so creating new tests is just a matter of doing minor tweaks to the existing fixture to trigger the code you want to test.

So should you use integration tests? One barrier is how to set up the shared fixture. If you can duplicate the databases somewhere, you could use something like DbUnit to prepare the shared fixture. It might be easier to break the code into pieces (import, transform, export). Then use the DbUnit based tests to test import and export, and use regular unit tests to verify the transform step. If you do that you don't need DbUnit to set up a shared fixture for the transform step. If you can break the code into 3 steps (extract, transform, export) at least you can focus your testing efforts on the part thats likely to have bugs or change later.

如果没有你 2024-08-24 16:31:58

我与 C# 无关,但我有一些想法你可以在这里尝试。如果您稍微拆分代码,那么您会注意到您所拥有的基本上是对序列执行的操作链。

第一个获得当前日期的报酬:

    var pays = _pays.GetPaysForCurrentDate();

第二个无条件处理结果

    foreach (PayObject pay in pays)
    {
       WriteHeaderRow(pay);
    }

第三个执行条件处理:

    foreach (PayObject pay in pays)
    {
       if (pay.IsFirstCheck)
       {
          WriteDetailRowType1(pay);
       }
    }

现在,您可以使这些阶段更加通用(抱歉伪代码,我不知道 C#):

    var all_pays = _pays.GetAll();

    var pwcdate = filter_pays(all_pays, current_date()) // filter_pays could also be made more generic, able to filter any sequence

    var pwcdate_ann =  annotate_with_header_row(pwcdate);       

    var pwcdate_ann_fc =  filter_first_check_only(pwcdate_annotated);  

    var pwcdate_ann_fc_ann =  annotate_with_detail_row(pwcdate_ann_fc);   // this could be made more generic, able to annotate with arbitrary row passed as parameter

    (Etc.)

如您所见,现在您有了一组未连接的阶段,可以单独测试,然后以任意顺序连接在一起。这种连接或组合也可以单独测试。等等(即 - 您可以选择要测试的内容)

I have nothing to do with C#, but I have some idea you could try here. If you split your code a bit, then you'll notice that what you have is basically chain of operations performed on sequences.

First one gets pays for current date:

    var pays = _pays.GetPaysForCurrentDate();

Second one unconditionally processes the result

    foreach (PayObject pay in pays)
    {
       WriteHeaderRow(pay);
    }

Third one performs conditional processing:

    foreach (PayObject pay in pays)
    {
       if (pay.IsFirstCheck)
       {
          WriteDetailRowType1(pay);
       }
    }

Now, you could make those stages more generic (sorry for pseudocode, I don't know C#):

    var all_pays = _pays.GetAll();

    var pwcdate = filter_pays(all_pays, current_date()) // filter_pays could also be made more generic, able to filter any sequence

    var pwcdate_ann =  annotate_with_header_row(pwcdate);       

    var pwcdate_ann_fc =  filter_first_check_only(pwcdate_annotated);  

    var pwcdate_ann_fc_ann =  annotate_with_detail_row(pwcdate_ann_fc);   // this could be made more generic, able to annotate with arbitrary row passed as parameter

    (Etc.)

As you can see, now you have set of unconnected stages that could be separately tested and then connected together in arbitrary order. Such connection, or composition, could also be tested separately. And so on (i.e. - you can choose what to test)

无妨# 2024-08-24 16:31:58

这是嘲笑一切的概念失败的领域之一。当然,单独测试每个方法将是一种“更好”的做事方式,但是将制作所有方法的测试版本的工作量与将代码指向测试数据库的工作量进行比较(如果需要,在每个测试运行开始时重置) )。

这就是我在组件之间有很多复杂交互的代码中使用的方法,并且效果很好。由于每个测试将运行更多代码,因此您更有可能需要使用调试器逐步执行以准确找到出错的位置,但您无需付出大量额外的努力即可获得单元测试的主要好处(知道出了问题) 。

This is one of those areas where the concept of mocking everything falls over. Certainly testing each method in isolation would be a "better" way of doing things, but compare the effort of making test versions of all your methods to that of pointing the code at a test database (reset at the start of each test run if necessary).

That is the approach I'm using with code that has a lot of complex interactions between components, and it works well enough. As each test will run more code, you are more likely to need to step through with the debugger to find exactly where something went wrong, but you get the primary benefit of unit tests (knowing that something went wrong) without putting in significant additional effort.

半窗疏影 2024-08-24 16:31:58

我认为托马斯·齐林斯基已经找到了答案。但如果你说你有3500行程序代码,那么问题就更大了。
把它分成更多的功能并不会帮助你测试它。然而,这是确定可以进一步提取到另一个类的职责的第一步(如果您对方法有好的名称,那么在某些情况下这可能是显而易见的)。

我想对于这样一个类,您需要处理一个令人难以置信的依赖项列表,以便能够将此类实例化到测试中。那么在测试中创建该类的实例就变得非常困难......
Michael Feathers 的《Working With Legacy Code》一书很好地回答了这些问题。
能够很好地测试代码的第一个目标应该是识别类的角色并将其分解为更小的类。当然,这说起来很容易,讽刺的是,如果不进行测试来确保修改安全,这是有风险的……

您说该类中只有 1 个公共方法。这应该可以简化重构,因为您不需要担心所有私有方法的用户。封装很好,但是如果您在该类中有这么多私有的东西,这可能意味着它不属于这里,您应该从该怪物中提取不同的类,您最终将能够进行测试。一点一点地,设计应该看起来更干净,并且您将能够测试更多的大代码。
如果您开始使用它,您最好的朋友将是一个重构工具,那么它应该可以帮助您在提取类和方法时不破坏逻辑。

迈克尔·费瑟斯 (Michael Feathers) 的这本书似乎又是你必读的:)
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/ dp/0131177052

添加示例:

这个示例来自 Michael Feathers 的书,很好地说明了您的问题,我认为:

RuleParser  
public evaluate(string)  
private brachingExpression  
private causalExpression  
private variableExpression  
private valueExpression  
private nextTerm()  
private hasMoreTerms()   
public addVariables()  

显然,将方法 nextTerm 和 hasMoreTerms 公开是没有意义的。没有人应该看到这些方法,我们移动到下一个项目的方式绝对是类内部的。那么如何测试这个逻辑呢?

好吧,如果您发现这是一个单独的责任并提取一个类,例如 Tokenizer。这个方法将突然在这个新类中公开!因为这就是它的目的。然后测试这种行为就变得很容易......

因此,如果您将其应用到您的大段代码中,并将其片段提取到职责较少的其他类中,并且将这些方法公开会感觉更自然,那么您也将能够轻松地测试它们。
您说您正在访问大约 40 个不同的表来映射它们。为什么不将其分解为映射的每个部分的类?

推理我无法阅读的代码有点困难。您可能还有其他问题阻止您这样做,但这是我最好的尝试。

希望这有帮助
祝你好运 :)

I think Tomasz Zielinski has a piece of the answer. But if you say you have 3500 lines of procedural codes, then the the problem is bigger than that.
Cutting it into more functions will not help you test it. However, it' a first step to identify responsibilities that could be extracted further to another class (if you have good names for the methods, that can be obvious in some cases).

I guess with such a class you have an incredible list of dependencies to tackle just to be able to instanciate this class into a test. It becomes then really hard to create an instance of that class in a test...
The book from Michael Feathers "Working With Legacy Code" answer very well such questions.
The first goal to be able to test well that code into should be to identify the roles of the class and to break it into smaller classes. Of course that's easy to say and the irony is that it's risky to do without tests to secure your modifications...

You say you have only 1 public method in that class. That should ease the refactoring as you don't need to worry about the users fro, all the private methods. Encapsulation is nice, but if you have so much stuff private in that class, that probably means it doesn't belong here and you should extract different classes from that monster, that you will eventually be able to test. Pieces by pieces, the design should look cleaner, and you will be able to test more of that big piece of code.
You best friend if you start this will be a refactoring tool, then it should help you not to break logic while extracting classes and methods.

Again the book from Michael Feathers seems to be a must read for you :)
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052

ADDED EXAMPLE :

This example come from the book from Michael Feathers and illustrate well your problem I think :

RuleParser  
public evaluate(string)  
private brachingExpression  
private causalExpression  
private variableExpression  
private valueExpression  
private nextTerm()  
private hasMoreTerms()   
public addVariables()  

obvioulsy here, it doesn't make sense to make the methods nextTerm and hasMoreTerms public. Nobody should see these methods, the way we are moving to the next item is definitely internal to the class. so how to test this logic??

Well if you see that this is a separate responsibility and extract a class, Tokenizer for example. this method will suddenly be public within this new class! because that's its purpose. It becomes then easy to test that behaviour...

So if you would apply that to your huge piece of code, and extract pieces of it to other classes with less responsibilities, and where it would feel more natural to make these methods public, you also will be able to test them easily.
You said you are accessing about 40 different tables to map them. Why not breaking that into classes for each part of the mapping?

It's a bit hard to reason about a code I can't read. You maybe have other issues that prevent you to do this, but that's my best try on it.

Hope this helps
Good luck :)

眸中客 2024-08-24 16:31:58

我真的很难接受您有多个大约 3.5 个 Klines 数据导出函数,并且它们之间根本没有通用功能。如果事实确实如此,那么单元测试可能不是您需要关注的。如果每个导出模块确实只做一件事,并且本质上是不可分割的,那么也许需要快照比较、数据驱动的集成测试套件。

如果存在共同的功能,则将它们中的每一个提取出来(作为单独的类)并单独测试它们。这些小辅助类自然会有不同的公共接口,这应该可以减少私有 API 无法测试的问题。

您没有提供有关实际输出格式的任何详细信息,但如果它们通常是表格、固定宽度或分隔文本,那么您至少应该能够将导出器拆分为结构和格式化代码。我的意思是,您将拥有类似以下内容的代码,而不是上面的示例代码:

public void ExportPaychecks(HeaderFormatter h, CheckRowFormatter f)
{
   var pays = _pays.GetPaysForCurrentDate();
   foreach (PayObject pay in pays)
   {
      h.formatHeader(pay);
      f.WriteDetailRow(pay);
   }
}

HeaderFormatterCheckRowFormatter 抽象类将为这些类型的报告定义一个通用接口例如,各个具体子类(用于各种报告)将包含用于删除重复行的逻辑(或特定供应商所需的任何内容)。

另一种切片方法是将数据提取和格式化相互分开。编写代码,将各种数据库中的所有记录提取为中间表示形式,该中间表示形式是所需表示形式的超集,然后编写相对简单的过滤器例程,将 uber 格式转换为每个供应商所需的格式。


经过更多思考后,我意识到您已将其识别为 ETL 应用程序,但您的示例似乎将所有三个步骤结合在一起。这表明第一步是将事物分开,以便首先提取所有数据,然后翻译,然后存储。您当然可以至少单独测试这些步骤。

I really find it hard to accept that you've got multiple, ~3.5 Klines data-export functions with no common functionality at all between them. If that's in fact the case, then maybe Unit Testing is not what you need to be looking at here. If there really is only one thing that each export module does, and it's essentially indivisible, then maybe a snapshot-comparison, data driven integration test suite is what's called for.

If there are common bits of functionality, then extract each of them out (as separate classes) and test them individually. Those little helper classes will naturally have different public interfaces, which should reduce the problem of private APIs that can't be tested.

You don't give any details about what the actual output formats look like, but if they're generally tabular, fixed-width or delimited text, then you ought at least to be able to split the exporters up into structural and formatting code. By which I mean, instead of your example code up above, you'd have something like:

public void ExportPaychecks(HeaderFormatter h, CheckRowFormatter f)
{
   var pays = _pays.GetPaysForCurrentDate();
   foreach (PayObject pay in pays)
   {
      h.formatHeader(pay);
      f.WriteDetailRow(pay);
   }
}

The HeaderFormatter and CheckRowFormatter abstract classes would define a common interface for those types of report elements, and the individual concrete subclasses (for the various reports) would contain logic for removing duplicate rows, for example (or whatever a particular vendor requires).

Another way to slice this is to separate data extraction and formatting from each other. Write code that extracts all the records from the various databases into an intermediate representation that's a super-set of the needed representations, then write relatively simple-minded filter routines that convert from the uber-format down to the required format for each vendor.


After thinking about this a little more, I realize you've identified this as an ETL application, but your example seems to combine all three steps together. That suggests that a first step would be to split things up such that all the data is extracted first, then translated, then stored. You can certainly test at least those steps separately.

余生共白头 2024-08-24 16:31:58

我维护了一些与您所描述的类似的报告,但数量没有那么多,数据库表也更少。我使用了一个三重策略,它的扩展性可能足以对您有用:

  1. 在方法级别,我对我主观认为“复杂”的任何内容进行单元测试。这包括 100% 的错误修复,以及任何让我感到紧张的内容。

  2. 在模块级别,我对主要用例进行单元测试。正如您所遇到的,这是相当痛苦的,因为它确实需要以某种方式模拟数据。我通过抽象数据库接口(即我的报告模块中没有直接的 SQL 连接)来实现这一点。对于一些简单的测试,我手动输入了测试数据,对于其他测试,我编写了一个记录和/或回放查询的数据库接口,以便我可以用真实数据引导我的测试。换句话说,我在记录模式下运行一次,它不仅获取真实数据,而且还为我在文件中保存快照;当我在播放模式下运行时,它会查阅此文件而不是真正的数据库表。 (我确信有模拟框架可以做到这一点,但由于我的世界中的每个 SQL 交互都有签名存储过程调用 -> Recordset,因此我自己编写它非常简单。)

  3. 我很幸运能够访问具有完整生产数据副本的暂存环境,因此我可以通过对以前的软件版本进行完全回归来执行集成测试。

I maintain some reports similar to what you describe, but not as many of them and with fewer database tables. I use a 3-fold strategy that might scale well enough to be useful to you:

  1. At the method level, I unit test anything I subjectively deem to be 'complicated'. This includes 100% of bug fixes, plus anything that just makes me feel nervous.

  2. At the module level, I unit test the main use cases. As you have encountered, this is fairly painful since it does require somehow mocking the data. I have accomplished this by abstracting the database interfaces (i.e. no direct SQL connections within my reporting module). For some simple tests I have typed the test data by hand, for others I have written a database interface that records and/or plays back queries, so that I can bootstrap my tests with real data. In other words, I run once in record mode and it not only fetches real data but it also saves a snapshot for me in a file; when I run in playback mode, it consults this file instead of the real database tables. (I'm sure there are mocking frameworks that can do this, but since every SQL interaction in my world has the signature Stored Procedure Call -> Recordset it was quite simple just to write it myself.)

  3. I'm fortunate to have access to a staging environment with a full copy of production data, so I can perform integration tests with full regression against previous software versions.

婴鹅 2024-08-24 16:31:58

您是否查看过Moq?

来自该网站的引用:

Moq(发音为“Mock-you”或只是
“Mock”)是唯一的模拟库
.NET 从头开始​​开发到
充分利用.NET 3.5(即
Linq 表达式树)和 C# 3.0
特征(即 lambda 表达式)
使其成为最具生产力的,
类型安全且重构友好
可用模拟库。

Have you looked into Moq?

Quote from the site:

Moq (pronounced "Mock-you" or just
"Mock") is the only mocking library
for .NET developed from scratch to
take full advantage of .NET 3.5 (i.e.
Linq expression trees) and C# 3.0
features (i.e. lambda expressions)
that make it the most productive,
type-safe and refactoring-friendly
mocking library available.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文