如何使用 MSpec 有效测试固定长度的平面文件解析器?

发布于 2024-11-30 02:53:32 字数 2020 浏览 1 评论 0原文

我有这个方法签名:ListParse(string[]lines)

ITMData 有 35 个属性。

您将如何有效测试这样的解析器?

问题:

  • 我应该加载整个文件(我可以使用 System.IO)吗?
  • 我应该将文件中的一行放入字符串常量中吗?
  • 我应该测试一行或多行
  • 我应该测试 ITMData 的每个属性还是应该测试整个对象?
  • 我的测试的命名怎么样?

编辑

我将方法签名更改为 ITMData Parse(string line)

测试代码:

[Subject(typeof(ITMFileParser))]
public class When_parsing_from_index_59_to_79
{
    private const string Line = ".........";
    private static ITMFileParser _parser;
    private static ITMData _data;

    private Establish context = () => { _parser = new ITMFileParser(); };

    private Because of = () => { _data = _parser.Parse(Line); };

    private It should_get_fldName = () => _data.FldName.ShouldBeEqualIgnoringCase("HUMMELDUMM");
}

编辑2

我仍然不确定是否应该每个类仅测试一个属性。在我看来,这使我能够为规范提供更多信息,即当我解析从索引 59 到索引 79 的单行时,我得到 fldName。如果我测试一个类中的所有属性,我就会丢失此信息。我是否过度指定了我的测试?

我的测试现在看起来像这样:

[Subject(typeof(ITMFileParser))]
public class When_parsing_single_line_from_ITM_file
{
    const string Line = ""

    static ITMFileParser _parser;
    static ITMData _data;

    Establish context = () => { _parser = new ITMFileParser(); };

    private Because of = () => { _data = _parser.Parse(Line); };

    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    ...

}

I have this method signature: List<ITMData> Parse(string[] lines)

ITMData has 35 properties.

How would you effectively test such a parser?

Questions:

  • Should I load the whole file (May I use System.IO)?
  • Should I put a line from the file into a string constant?
  • Should I test one or more lines
  • Should I test each property of ITMData or should I test the whole object?
  • What about the naming of my test?

EDIT

I changed the method signature to ITMData Parse(string line).

Test Code:

[Subject(typeof(ITMFileParser))]
public class When_parsing_from_index_59_to_79
{
    private const string Line = ".........";
    private static ITMFileParser _parser;
    private static ITMData _data;

    private Establish context = () => { _parser = new ITMFileParser(); };

    private Because of = () => { _data = _parser.Parse(Line); };

    private It should_get_fldName = () => _data.FldName.ShouldBeEqualIgnoringCase("HUMMELDUMM");
}

EDIT 2

I am still not sure if I should test only one property per class. In my opinion this allows me to give more information for the specification namely that when I parse a single line from index 59 to index 79 I get fldName. If I test all properties within one class I loss this information. Am I overspecifying my tests?

My Tests now looks like this:

[Subject(typeof(ITMFileParser))]
public class When_parsing_single_line_from_ITM_file
{
    const string Line = ""

    static ITMFileParser _parser;
    static ITMData _data;

    Establish context = () => { _parser = new ITMFileParser(); };

    private Because of = () => { _data = _parser.Parse(Line); };

    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    ...

}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

香橙ぽ 2024-12-07 02:53:33

我应该加载整个文件(我可以使用 System.IO)吗?

如果这样做,它就不再是单元测试——它变成集成或回归测试。如果您希望它显示单元测试不会显示的可能错误,您可以这样做。但这不太可能。

单元测试可能会更好,至少在开始时是这样。

我应该将文件中的一行放入字符串常量吗?

如果您计划编写多个使用同一输入行的测试,那么当然可以。但就我个人而言,我可能倾向于编写一堆不同的测试,每个测试都传递不同的输入字符串。此时,没有太多理由创建常量(除非它是在测试方法内声明的局部常量)。

我应该测试一根或多根线吗?

您没有指定,但我假设您的输出与您的输入是一一对应的——也就是说,如果您传入三个字符串,您将获得三个 ITMData s回来了。在这种情况下,多线测试的需求将受到限制。

测试退化情况几乎总是值得的,在这种情况下将是一个空字符串数组(零行)。并且至少有一个包含多行的测试可能是值得的,这样您就可以确保迭代中没有愚蠢的错误。

但是,如果您的输出与输入是一对一的,那么您确实有另一种方法想要退出 - 您应该有一个 ParseSingleLine 方法。那么您的 Parse 只不过是迭代行并调用 ParseSingleLine 而已。您仍然需要对 Parse 进行一些测试,但大多数测试将集中在 ParseSingleLine 上。

Should I load the whole file (May I use System.IO)?

If you do this, it's no longer a unit test -- it becomes an integration or regression test. You can do this, if you expect it to show possible bugs that a unit test wouldn't. But that's not too likely.

You're probably better off with unit tests, at least to start.

Should I put a line from the file into a string constant?

If you plan to write more than one test that uses the same input line, then sure. But personally, I would probably tend to write a bunch of different tests, with each one passing a different input string. At that point, there's not much reason to make a constant (unless it's a local constant, declared inside the test method).

Should I test one or more lines?

You didn't specify, but I'm going to assume that your output is one-for-one with your input -- that is, if you pass in three strings, you'll get three ITMDatas returned. In that case, the need for multi-line tests would be limited.

It's almost always worth testing the degenerate case, which in this case would be an empty string array (zero lines). And it's probably worth having at least one test that has more than one line, just so you can make sure there are no silly mistakes in your iteration.

However, if your output is one-for-one with your input, then you really have another method wanting to get out -- you should have a ParseSingleLine method. Then your Parse would be nothing more than iterating lines and calling ParseSingleLine. You would still want a handful of tests for Parse, but most of your testing would focus around ParseSingleLine.

自找没趣 2024-12-07 02:53:33

如果我遇到这样的问题,我通常会这样做:

提前声明一个简短的免责声明:我认为我会更多地走“集成测试”或“测试整个解析器”路线,而不是测试单独的行。过去,我不止一次面临过这样的情况:大量实现细节泄漏到我的测试中,并迫使我在更改实现细节时经常更改测试。我猜是过度规范的典型情况;-/

  1. 我不会在解析器中包含文件加载。正如 @mquander 建议的那样,我宁愿使用 TextReader 或 IEnumerable 作为输入参数。这将导致测试速度更快,因为您可以在内存中指定解析器输入,而不必接触文件系统。
  2. 我不太喜欢手动滚动测试数据,因此在大多数情况下,我使用嵌入式资源和 ResourceManager 通过 assembly.GetManifestResource() 直接从规范程序集中加载测试数据。我的解决方案中通常有一堆扩展方法来简化资源的读取(例如 TextReader TextResource.Load("NAME_OF_SOME_RESOURCE"))。
  3. 关于 MSpec:我使用每个文件一个类来解析。对于解析结果中测试的每个属性,我都有一个单独的 (It) 断言。这些通常是单行代码,因此额外的编码量并不大。就文档和诊断而言,恕我直言,这是一个巨大的优势,因为当属性未正确解析时,您可以直接看到哪个断言失败,而无需查看源代码或搜索行号。它也会出现在您的 MSpec 结果文件中。此外,您不会隐藏其他失败的断言(修复一个断言只是看到规范在下一个断言的下一行上失败的情况)。这当然迫使您更多地考虑在规范中使用的措辞,但对我来说这也是一个巨大的优势,因为我是语言形成思维这一观点的支持者。换句话说,如果您不知道如何命名您的断言,那么您的规范或实现可能有问题。
  4. 关于解析器的方法签名:我不会返回像 List这样的具体类型。或数组,我还建议不要返回可变的 List类型。您在这里基本上说的是:“嘿,您可以在我完成后修改解析结果”,这在大多数情况下可能是您不想要的。我建议返回 IEnumerable相反(或 ICollection如果您确实需要事后修改它)

Here's what I would normally do if I'm facing such a problem:

One short disclaimer in advance: I think I would more go down the "integration testing" or "testing the parser as a whole" route rather than testing individual lines. In the past I've more than once faced the situation where lots of implementation details leaked into my tests and forced me to change the tests often when I changed implementation details. Typical case of overspecification I guess ;-/

  1. I wouldn't include file loading in the parser. As @mquander suggested I would rather go with a TextReader or an IEnumerable as the input parameter instead. This will result in way faster tests since you're able to specify the parser input in-memory and don't have to touch the file system.
  2. I'm not a big fan of hand rolling test data, so in most cases I'm using embedded resources and the ResourceManager to load test data directly from the specification assembly via assembly.GetManifestResource(). I typically have a bunch of extension methods in my solution to streamline the reading of resources (something like TextReader TextResource.Load("NAME_OF_SOME_RESOURCE")).
  3. Regarding MSpec: I'm using one class per file to parse. For each property that is tested in the parsed result I've a separate (It)assertion. These are normally one liners, so the additional amount of coding isn't that big. In terms of documentation and diagnostics imho it's a huge plus since when a property isn't parsed correctly you can see directly which assertion failed without having to look into the source or searching for line numbers. It also appears in your MSpec result file. Besides, you don't hide other failed assertions (the situation where you fix one assertion only to see the spec fail on the next line with the next assertion). This of course forces you to think more about the wording you use in your specifications but for me that's also a huge plus since I'm a proponent of the idea that language forms thinking. In other words, if you've no clue how to frackin name your assertion there's probably something fishy either about your specification or your implementation.
  4. Regarding your method signature for the parser: I wouldn't return a concrete type like List<T> or an array and I would also suggest not to return the mutable List<T> type. What you're basically saying here is: "Hey, you can muck around with the parsing result after I've finished" which in most cases is probably what you don't want. I would suggest to return IEnumerable<T> instead (or ICollection<T> if you REALLY need to modify it afterwards)
三生池水覆流年 2024-12-07 02:53:33

我通常会尝试考虑常见的成功和失败场景以及边缘情况。需求也有助于设置适当的用例。考虑使用 Pex 来枚举各种场景。

I typically try to consider common success and fail scenarios, along with edge cases. Requirements are also helpful for setting up appropriate use cases. Consider Pex for enumerating various scenarios.

你不是我要的菜∠ 2024-12-07 02:53:33

关于您的新问题:

我应该测试 ITMData 的每个属性还是应该测试整个对象?

如果您想安全起见,您可能应该至少进行一项测试来检查每个属性是否匹配。

我的测试的命名怎么样?

关于这个主题有很多讨论,例如这个。一般规则是,您的单元测试类中将有多个方法,每个方法都旨在测试特定的内容。在您的情况下,可能是这样的:

public void Check_All_Properties_Parsed_Correctly(){.....}

public void Exception_Thrown_If_Lines_Is_Null(){.....}

public void Exception_Thrown_If_Lines_Is_Wrong_Length(){.....}

因此,换句话说,测试您认为解析器“正确”的确切行为。完成此操作后,您在更改解析器代码时会感到更加轻松,因为您将拥有一个全面的测试套件来检查您是否没有破坏任何内容。请记住经常进行实际测试,并在进行更改时保持测试更新!

一般来说,我认为您可以通过谷歌搜索找到大多数问题的答案。还有几本关于测试驱动开发的优秀书籍,它们不仅会阐明 TDD 的如何,还会阐明为什么。如果您对编程语言相对不可知,我会推荐 Kent Beck 的 示例测试驱动开发,否则类似于 Microsoft .NET 中的测试驱动开发。这些应该会让你很快走上正轨。

编辑:

我是否过度指定了我的测试?

在我看来,是的。具体来说,我不同意你的下一句话:

如果我测试一个类中的所有属性,我就会丢失此信息。

您究竟以什么方式丢失信息?假设除了每个测试都有一个新类之外,还有两种方法可以进行此测试:

  1. 对每个属性使用不同的方法。您的测试方法可以称为 CheckPropertyXCheckPropertyY 等。当您运行测试时,您将准确地看到哪些字段通过了,哪些字段失败了。这显然满足了您的要求,尽管我会说这仍然是矫枉过正。我会选择选项 2:
  2. 采用几种不同的方法,每种方法都测试一个特定的方面。这是我最初推荐的,我想你指的是这个。当其中一个测试失败时,每个方法您只能获得有关第一个失败的信息,但如果您很好地编码了断言,您将确切知道哪个属性不正确。考虑以下代码:

Assert.AreEqual("test1", myObject.PropertyX, "Property X was invalidly parsed");
Assert.AreEqual("test2", myObject.PropertyY, "属性 Y 被错误地解析");

当其中一个失败时,您将知道哪一行失败。当您修复了相关错误并重新运行测试时,您将看到是否有任何其他属性失败。这通常是大多数人采用的方法,因为为每个属性创建一个类甚至方法会导致太多的代码,并且需要太多的工作来保持最新。

Regarding your newer questions:

Should I test each property of ITMData or should I test the whole object?

If you want to be on the safe side, you should probably have at least one test which checks that each property was matched.

What about the naming of my test?

There are quite a few discussions on this topic, such as this one. The general rule is that you would have multiple methods in your unit test class, each aimed at testing something specific. In your case, it might be things like:

public void Check_All_Properties_Parsed_Correctly(){.....}

public void Exception_Thrown_If_Lines_Is_Null(){.....}

public void Exception_Thrown_If_Lines_Is_Wrong_Length(){.....}

So, in other words, testing for the exact behaviour that you consider "correct" for the parser. Once this is done, you will feel much more at ease when making changes to the parser code, because you will have a comprehensive test suite to check that you didn't break anything. Remember to actually test often, and to keep your tests updated when you make changes! There's a fairly good guide about unit testing and Test Driven Development on MSDN.

In general, I think you can find answers to most of your questions by googling a bit. There are also several excellent books on Test Driven Development, which will drive home not only the how of TDD, but the why. If you are relatively programming language agnostic, I would recommend Kent Beck's Test Driven Development By Example, otherwise something like Test-Driven Development in Microsoft .NET. These should get you on the right track very quickly.

EDIT:

Am I overspecifying my tests?

In my opinion, yes. Specifically, I don't agree with your next line:

If I test all properties within one class I loss this information.

In what way do you lose information exactly? Let's say there are 2 ways to do this test, other than having a new class per test:

  1. Have different methods for each property. Your test methods could be called CheckPropertyX, CheckPropertyY, etc. When you run your tests, you will see exactly which fields passed and which fields failed. This clearly satisfies your requirements, although I would say it's still overkill. I would go with option 2:
  2. Have a few different methods, each testing one specific aspect. This is what I originally recommended, and I think what you are referring to. When one of the tests fails, you will only get information about the first thing that failed, per method, but if you coded your Assert nicely, you will know exactly which property is incorrect. Consider the following code:

Assert.AreEqual("test1", myObject.PropertyX, "Property X was incorrectly parsed");
Assert.AreEqual("test2", myObject.PropertyY, "Property Y was incorrectly parsed");

When one of those fails, you will know which line failed. When you have fixed the relevant error, and re-run your tests, you will see if any other properties have failed. This is generally the approach that most people take, because creating a class or even method per property results in too much code, and too much work to keep up to date.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文