当生产功能可能有数百万个测试用例时,TDD 如何工作?
在 TDD 中,您选择一个测试用例并实现该测试用例,然后编写足够的生产代码以便测试通过,重构代码,然后再次选择一个新的测试用例,然后继续循环。
我在这个过程中遇到的问题是 TDD 说您编写了足够的代码来通过刚刚编写的测试。我指的确切的是,如果一个方法可以有例如 100 万个测试用例,你能做什么?!显然不是写100万个测试用例?!
让我通过下面的例子更清楚地解释我的意思:
internal static List<long> GetPrimeFactors(ulong number)
{
var result = new List<ulong>();
while (number % 2 == 0)
{
result.Add(2);
number = number / 2;
}
var divisor = 3;
while (divisor <= number)
{
if (number % divisor == 0)
{
result.Add(divisor);
number = number / divisor;
}
else
{
divisor += 2;
}
}
return result;
}
上面的代码返回给定数字的所有素因数。 ulong 有 64 位,这意味着它可以接受 0 到 18,446,744,073,709,551,615 之间的值!
那么,当生产功能可能有数百万个测试用例时,TDD 如何工作?!
我的意思是需要编写多少测试用例才能说我使用 TDD 来实现此生产代码?
TDD 中的这个概念是说你应该只编写足够的代码来通过测试,这对我来说似乎是错误的,正如上面的示例所示?
什么时候才够呢?
我自己的想法是,我只选择一些测试用例,例如上带、下带以及更多的测试用例,例如 5 个测试用例,但这不是 TDD,不是吗?
非常感谢您在此示例中对 TDD 的看法。
In TDD, you pick a test case and implement that test case then you write enough production code so that the test passes, refactor the codes and again you pick a new test case and the cycle continues.
The problem I have with this process is that TDD says that you write enough code only to pass the test you just wrote. What I refer to exactly is that if a method can have e.g. 1 million test cases, what can you do?! Obviously not writing 1 million test cases?!
Let me explain what I mean more clearly by the below example:
internal static List<long> GetPrimeFactors(ulong number)
{
var result = new List<ulong>();
while (number % 2 == 0)
{
result.Add(2);
number = number / 2;
}
var divisor = 3;
while (divisor <= number)
{
if (number % divisor == 0)
{
result.Add(divisor);
number = number / divisor;
}
else
{
divisor += 2;
}
}
return result;
}
The above code returns all the prime factors of a given number. ulong has 64 bits which means it can accept values between 0 to 18,446,744,073,709,551,615!
So, How TDD works when there can be millions of test cases for a production functionality?!
I mean how many test cases suffice to be written so that I can say I used TDD to achieve this production code?
This concept in TDD which says that you should only write enough code to pass your test seems to be wrong to me as can be seen by the example above?
When enough is enough?
My own thoughts are that I only pick some test cases e.g. for Upper band, lower band and few more e.g. 5 test cases but that's not TDD, is it?
Many thanks for your thoughts on TDD for this example.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
这是您进行任何测试时遇到的第一个问题。 TDD 在这里并不重要。
是的,这样的例子有很多很多;而且,如果你开始构建系统,还会有各种情况的组合和组合。它确实会导致你的组合爆炸。
对此该怎么办是一个好问题。通常,您选择您的算法可能会以相同方式工作的等价类,并为每个类测试一个值。
下一步是测试边界条件(请记住,CS 中两个最常见的错误相差一个错误)。
接下来...好吧,出于所有实际原因,可以停在这里。不过,请看一下这些讲义:http://www.scs .stanford.edu/11au-cs240h/notes/testing.html
PS。顺便说一句,“按书本”使用 TDD 来解决数学问题并不是一个好主意。 Kent Beck 在他的 TDD 书中证明了这一点,实现计算斐波那契数的函数的最糟糕的实现。如果您知道封闭形式,或者有一篇描述经过验证的算法的文章,只需如上所述进行健全性检查,并且不要在整个重构周期中进行 TDD,这会节省您的时间。
聚苯硫醚。实际上,有一篇不错的文章(令人惊讶) !) 提到了斐波那契问题以及 TDD 的问题。
That's sort of the first question you've got for any testing. TDD is of no importance here.
Yes, there are lots and lots of cases; moreover, there are combinations and combinations of cases if you start building the system. It will indeed lead you to a combinatoric explosion.
What to do about that is a good question. Usually, you choose equivalence classes for which your algorithm will probably work the same—and test one value for each class.
Next step would be, test boundary conditions (remember, two most frequent errors in CS are off-by one error).
Next... Well, for all practical reasons, it's ok to stop here. Still, take a look at these lecture notes: http://www.scs.stanford.edu/11au-cs240h/notes/testing.html
PS. By the way, using TDD "by book" for math problems is not a very good idea. Kent Beck in his TDD book proves that, implementing the worst possible implementation of a function calculating Fibonacci numbers. If you know a closed form—or have an article describing a proven algorithm, just make sanity checks as described above, and do not do TDD with the whole refactoring cycle—it'll save your time.
PPS. Actually, there's a nice article which (surprise!) mentions bot the Fibonacci problem and the problem you have with TDD.
没有数百万个测试用例。只有几个。您可能想尝试PEX,它可以让您找出不同的<算法中的真实测试用例。当然,您只需要测试这些。
There aren't millions of test cases. Only a few. You might like to try PEX, which will let you find out the different real test cases in your algorithm. Of course, you need only test those.
我从来没有做过任何 TDD,但你问的不是 TDD:而是关于如何编写一个好的测试套件。
我喜欢设计每段代码可能处于的所有状态的模型(在纸上或在我的脑海中)。我将每一行视为状态机的一部分。对于每一行,我确定可以进行的所有转换(执行下一行、分支或不分支、引发异常、溢出表达式中的任何子计算等)。
从那里我得到了测试用例的基本矩阵。然后,我确定每个状态转换的每个边界条件,以及每个边界之间任何有趣的中点。然后我得到了测试用例的变体。
从这里开始,我尝试提出有趣且不同的流程或逻辑组合 - “这个 if 语句,加上那个 if 语句 - 列表中有多个项目”等。
由于代码是一个流程,因此您通常不能中断它中间,除非为不相关的类插入模拟是有意义的。在这些情况下,我经常会大大减少我的矩阵,因为有些条件是你无法满足的,或者因为变化被另一段逻辑掩盖而变得不那么有趣。
之后,我一天都累了,就回家了:) 对于每个精心设计且相当简短的方法,我可能有大约 10-20 个测试用例,或者每个算法/类有 50-100 个测试用例。不是一千万。
我可能想出了太多无趣的测试用例,但至少我通常会过度测试而不是测试不足。我通过尝试很好地分解我的测试用例来避免代码重复来缓解这个问题。
这里的关键部分:
不,你不必编写 FSM绘图,除非你有乐趣做那种事。我不:)
I've never done any TDD, but what you're asking about isn't about TDD: It is about how to write a good test suite.
I like to design models (on paper or in my head) of all the states each piece of code can be in. I consider each line as if it were a part of a state machine. For each of those lines, I determine all the transitions that can be made (execute the next line, branch or not branch, throw an exception, overflow any of the sub calculations in the expression, etc).
From there I've got a basic matrix for my test cases. Then I determine each boundary condition for each of those state transitions, and any interesting mid-points between each of those boundaries. Then I've got the variations for my test cases.
From here I try to come up with interesting and different combinations of flow or logic - "This if statement, plus that one - with multiple items in the list", etc.
Since code is a flow, you often can't interrupt it in the middle unless it makes sense to insert a mock for an unrelated class. In those cases I've often reduced my matrix quite a bit, because there are conditions you just can't hit, or because the variation becomes less interesting by being masked out by another piece of logic.
After that, I'm about tired for the day, and go home :) And I probably have about 10-20 test cases per well-factored and reasonably short method, or 50-100 per algorithm/class. Not 10,000,000.
I probably come up with too many uninteresting test cases, but at least I usually overtest rather than undertest. I mitigate this by trying to to factor my test cases well to avoid code duplication.
Key pieces here:
And no, you don't have to write up FSM drawings, unless you have fun doing that sort of thing. I don't :)
您通常所做的,是针对“测试边界条件”和一些随机条件进行测试。
例如:ulong.min、ulong.max 和一些值。为什么你还要制作 GetPrimeFactors?您喜欢一般性地计算它们,还是为了做一些特定的事情而计算它们?测试一下你为什么这么做。
您还可以对 result.Count 断言,而不是对所有单独的项目断言。如果您知道应该获得多少个项目以及某些特定情况,您仍然可以重构代码,并且如果这些情况和总计数相同,则假设该函数仍然有效。
如果你真的想测试那么多,你也可以考虑白盒测试。例如,Pex 和 Moles 就非常不错。
What you usually do, it test against "test boundary conditions", and a few random conditions.
for example: ulong.min, ulong.max, and some values. Why are you even making a GetPrimeFactors? You like to calculate them in general, or are you making that to do something specific? Test for why you're making it.
What you could also do it Assert for result.Count, instead of the all individual items. If you know how many items you're suppose to get, and some specific cases, you can still refactor your code and if those cases and the total count is the same, assume the function still works.
If you really want to test that much, you could also look into white box testing. For example Pex and Moles is pretty good.
TDD 并不是一种检查函数/程序在每种可能的输入排列上是否正常工作的方法。我的看法是,我编写特定测试用例的概率与我对我的代码在该情况下是否正确的不确定程度成正比。
这基本上意味着我在两种情况下编写测试:1)我编写的一些代码很复杂和/或有太多假设,2)生产中发生错误。
一旦了解了导致错误的原因,通常很容易在测试用例中进行编码。从长远来看,这样做会产生一个强大的测试套件。
TDD is not a way to check that a function/program works correctly on every permutation of inputs possible. My take on it is that the probability that I write a particular test-case is proportional to how uncertain I am that my code is correct in that case.
This basically means I write tests in two scenarios: 1) some code I've written is complicated or complex and/or has too many assumptions and 2) a bug happens in production.
Once you understand what causes a bug it is generally very easy to codify in a test case. In the long term, doing this produces a robust test suite.
这是一个有趣的问题,与认识论中的可证伪性的概念有关。通过单元测试,您并不是真正试图证明系统是否有效;而是要证明系统是否有效。你正在构建实验,如果实验失败,将证明系统的工作方式与你的期望/信念不符。如果您的测试通过,您不知道您的系统是否正常工作,因为您可能忘记了一些未经测试的边缘情况;您所知道的是,到目前为止,您没有理由相信您的系统有故障。
科学史上的经典例子是“天鹅都是白色的吗?”这个问题。无论你找到多少只不同的白天鹅,你都不能说“所有天鹅都是白色的”假设是正确的。另一方面,给我一只黑天鹅,我就知道这个假设是不正确的。
良好的 TDD 单元测试遵循以下原则:如果它通过了,它不会告诉你一切都是正确的,但如果它失败了,它会告诉你你的假设哪里不正确。在这种框架下,测试每个数字并没有那么有价值:一种情况应该足够了,因为如果它不适用于该情况,您就知道出了问题。
但问题很有趣的是,与天鹅不同,你无法真正枚举世界上的每一只天鹅,以及它们所有未来的孩子和父母,你可以枚举每个整数(这是一个有限集),并验证每种可能的情况。此外,程序在很多方面更接近数学而不是物理,在某些情况下,您还可以真正验证某个陈述是否正确 - 但在我看来,这种类型的验证并不是 TDD 所追求的。 TDD 正在追求良好的实验,旨在捕捉可能的失败案例,而不是证明某些事情是正确的。
It's an interesting question, related to the idea of falsifiability in epistemology. With unit tests, you are not really trying to prove that the system works; you are constructing experiments which, if they fail, will prove that the system doesn't work in a way consistent with your expectations/beliefs. If your tests pass, you do not know that your system works, because you may have forgotten some edge case which is untested; what you know is that as of now, you have no reason to believe that your system is faulty.
The classical example in history of sciences is the question "are all swans white?". No matter how many different white swans you find, you can't say that the hypothesis "all swans are white" is correct. On the other hand, bring me one black swan, and I know the hypothesis is not correct.
A good TDD unit test is along these lines; if it passes, it won't tell you that everything is right, but if it fails, it tells you where your hypothesis is incorrect. In that frame, testing for every number isn't that valuable: one case should be sufficient, because if it doesn't work for that case, you know something is wrong.
Where the question is interesting though is that unlike for swans, where you can't really enumerate over every swan in the world, and all their future children and their parents, you could enumerate every single integer, which is a finite set, and verify every possible situation. Also, a program is in lots of ways closer to mathematics than to physics, and in some cases you can also truly verify whether a statement is true - but that type of verification is, in my opinion, not what TDD is going after. TDD is going after good experiments which aim at capturing possible failure cases, not at proving that something is true.
您忘记了第三步:
编写测试用例会让您变成红色。
编写足够的代码来使这些测试用例通过可以让您获得绿色。
将您的代码推广到不仅仅适用于您编写的测试用例,同时仍然不破坏其中任何一个,这就是重构。
You're forgetting step three:
Writing your test cases gets you to red.
Writing enough code to make those test cases pass gets you to green.
Generalizing your code to work for more than just the test cases you wrote, while still not breaking any of them, is the refactoring.
您似乎将 TDD 视为黑盒测试。它不是。如果是黑盒测试,只有一套完整的(数百万个测试用例)测试才能满足你,因为任何给定的案例都可能未经测试,因此黑盒中的恶魔才能逃脱欺骗。
但它并不是代码黑匣子里的恶魔。是你,在一个白色的盒子里。你知道自己是否作弊。 Fake It Til You Make It 的实践与 TDD 密切相关,有时也与之混淆。是的,您编写伪造的实现来满足早期的测试用例 - 但您知道您正在伪造它。你也知道什么时候你不再假装了。你知道什么时候你有一个真正的实施,并且你已经通过渐进迭代和测试驱动到达了那里。
所以你的问题实在是问错了。对于TDD,您需要编写足够的测试用例来推动您的解决方案的完成和正确性;您不需要对每一个可以想象的输入集都进行测试用例。
You appear to be treating TDD as if it is black-box testing. It's not. If it were black-box testing, only a complete (millions of test cases) set of tests would satisfy you, because any given case might be untested, and therefore the demons in the black box would be able to get away with a cheat.
But it isn't demons in the black box in your code. It's you, in a white box. You know whether you're cheating or not. The practice of Fake It Til You Make It is closely associated with TDD, and sometimes confused with it. Yes, you write fake implementations to satisfy early test cases - but you know you're faking it. And you also know when you have stopped faking it. You know when you have a real implementation, and you've gotten there by progressive iteration and test-driving.
So your question is really misplaced. For TDD, you need to write enough test cases to drive your solution to completion and correctness; you don't need test cases for every conceivable set of inputs.
从我的观点来看,这段代码似乎没有发生重构步骤......
在我的书中,TDD并不意味着为每个可能的输入/输出参数的每个可能的排列编写测试用例...
但是要编写所有需要的测试用例,以确保它执行指定的操作,即对于这种方法,所有边界情况加上一个测试,该测试从包含已知正确结果的数字的列表中随机选取一个数字。如果需要,您可以随时扩展此列表以使测试更彻底...
TDD 仅在现实世界中有效,前提是您不抛弃常识...
至于
这指的是“非作弊程序员”...如果您有一个或多个“作弊程序员”,例如,他们只是将测试用例的“正确结果”硬编码到我怀疑的方法中你手上有一个比 TDD 更大的问题......
顺便说一句,“测试用例构建”是你练习得越多就越擅长的东西 - 没有书籍/指南可以告诉你哪些测试用例最适合任何给定的情况。 ...经验会带来巨大回报涉及到构建测试用例......
From my POV the refactoring step doesn't seem to have taken place on this piece of code...
In my book TDD does NOT mean to write testcases for every possible permutation of every possible input/output parameter...
BUT to write all testcases needed to ensure that it does what it is specified to be doing i.e. for such a method all boundary cases plus a test which picks randomly a number from a list containing numbers with known correct results. If need be you can always extend this list to make the test more thorough...
TDD only works in real world if you don't throw common sense out the window...
As to
in TDD this refers to "non-cheating programmers"... IF you have one or more "cheating programmer" who for example just hardcode the "correct result" of the testcases into the method I suspect you have a much bigger problem on your hands than TDD...
BTW "Testcase construction" is something you get better at the more you practice it - there is no book/guide that can tell you which testcases are best for any given situation upfront... experience pays off big when it comes to constructing testcases...
如果您愿意,TDD 确实允许您使用常识。没有必要将你的 TDD 版本定义为愚蠢的,只是为了让你可以说“我们没有做 TDD,我们正在做一些不那么愚蠢的事情”。
您可以编写一个测试用例,多次调用被测函数,并传递不同的参数。这可以防止“编写代码来分解 1”、“编写代码来分解 2”、“编写代码来分解 3”成为单独的开发任务。
要测试多少个不同的值实际上取决于您需要运行测试的时间。您想要测试任何可能是极端情况的东西(因此在分解的情况下至少为 0, 1, 2, 3,
LONG_MAX+1
因为它具有最多的因子,以具有最多因子的值为准不同的因子,一个卡迈克尔数,以及一些具有不同数量的质因数的完美平方)加上尽可能大的值范围,希望涵盖一些您没有意识到的东西极端情况,但是 是。这很可能意味着编写测试,然后编写函数,然后根据观察到的性能调整范围的大小。您还可以阅读函数规范,并实现函数就像测试的值比实际要多的值。这并不真正与“仅实现测试的内容”相矛盾,它只是承认在发货日期之前没有足够的时间来运行所有 2^64 可能的输入,因此实际测试是“逻辑”测试的代表性样本如果你有时间的话你会跑步。您仍然可以针对您想要测试的内容进行编码,而不是针对您实际有时间测试的内容进行编码。
如果您发现您的程序员(即您自己)被确定为有悖常理,并继续编写仅的代码,您甚至可以测试随机选择的输入(通常是安全分析师“模糊测试”的一部分)解决了测试的输入,而不是其他。显然,随机测试的可重复性存在问题,因此使用 PRNG 并记录种子。您会在竞赛编程、在线裁判程序等中看到类似的情况,以防止作弊。程序员并不确切知道将测试哪些输入,因此必须尝试编写解决所有可能输入的代码。由于您无法对自己保守秘密,因此随机输入可以起到同样的作用。在现实生活中,使用 TDD 的程序员不会故意作弊,但可能会意外作弊,因为测试和代码是同一个人编写的。有趣的是,测试随后会错过与代码相同的困难极端情况。
对于接受字符串输入的函数,问题更加明显,可能的测试值远远超过
2^64
。选择最好的,即程序员最有可能出错的,充其量是一门不精确的科学。您还可以让测试人员作弊,超越 TDD。首先编写测试,然后编写代码以通过测试,然后返回并编写更多白盒测试,其中 (a) 包含看起来可能是实际编写的实现中的边缘情况的值; (b) 包含足够的值以获得 100% 的代码覆盖率,无论您有时间和意志力去实现什么代码覆盖率指标。该过程的 TDD 部分仍然有用,它有助于编写代码,但随后您会进行迭代。如果这些新测试中的任何一个失败,您可以将其称为“添加新需求”,在这种情况下,我认为您所做的仍然是纯粹的 TDD。但这只是你如何称呼它的问题,实际上你并没有添加新的需求,你正在比编写代码之前更彻底地测试原始需求。
TDD does permit you to use common sense if you want to. There's no point defining your version of TDD to be stupid, just so that you can say "we're not doing TDD, we're doing something less stupid".
You can write a single test case that calls the function under test more than once, passing in different arguments. This prevents "write code to factorize 1", "write code to factorize 2", "write code to factorize 3" being separate development tasks.
How many distinct values to test really depends how much time you have to run the tests. You want to test anything that might be a corner case (so in the case of factorization at least 0, 1, 2, 3,
LONG_MAX+1
since it has the most factors, whichever value has the most distinct factors, a Carmichael number, and a few perfect squares with various numbers of prime factors) plus as big a range of values as you can in the hope of covering something that you didn't realise was a corner case, but is. This may well mean writing the test, then writing the function, then adjusting the size of the range based on its observed performance.You're also allowed to read the function specification, and implement the function as if more values are tested than actually will be. This doesn't really contradict "only implement what's tested", it just acknowledges that there isn't enough time before ship date to run all 2^64 possible inputs, and so the actual test is a representative sample of the "logical" test that you'd run if you had time. You can still code to what you want to test, rather than what you actually have time to test.
You could even test randomly-selected inputs (common as part of "fuzzing" by security analysts), if you find that your programmers (i.e. yourself) are determined to be perverse, and keep writing code that only solves the inputs tested, and no others. Obviously there are issues around the repeatability of random tests, so use a PRNG and log the seed. You see a similar thing with competition programming, online judge programs, and the like, to prevent cheating. The programmer doesn't know exactly which inputs will be tested, so must attempt to write code that solves all possible inputs. Since you can't keep secrets from yourself, random input does the same job. In real life programmers using TDD don't cheat on purpose, but might cheat accidentally because the same person writes the test and the code. Funnily enough, the tests then miss the same difficult corner cases that the code does.
The problem is even more obvious with a function that takes a string input, there are far more than
2^64
possible test values. Choosing the best ones, that is to say ones the programmer is most likely to get wrong, is at best an inexact science.You can also let the tester cheat, moving beyond TDD. First write the test, then write the code to pass the test, then go back and write more white box tests, that (a) include values that look like they might be edge cases in the implementation actually written; and (b) include enough values to get 100% code coverage, for whatever code coverage metric you have the time and willpower to work to. The TDD part of the process is still useful, it helps write the code, but then you iterate. If any of these new tests fail you could call it "adding new requirements", in which case I suppose what you're doing is still pure TDD. But it's solely a question of what you call it, really you aren't adding new requirements, you're testing the original requirements more thoroughly than was possible before the code was written.
当您编写测试时,您应该采用有意义的案例,而不是所有案例。有意义的情况包括一般情况、极端情况...
您无法为每个情况编写测试(否则您可以将值放在表格上并回答它们,这样您就可以 100% 确定您的程序能够正常工作:P)。
希望有帮助。
When you write a test you should take meaningful cases, not every case. Meaningful cases include general cases, corner cases...
You just CAN'T write a test for every single case (otherwise you could just put the values on a table and answer them, so you'd be 100% sure your program will work :P).
Hope that helps.