屏幕抓取的单元测试?
我是单元测试的新手,所以我想征求一些更了解的人的意见。
我很快需要编写一些屏幕抓取代码。目标系统是一个 Web UI,其中将涉及大量的 HTML 解析和类似的易失性优点。我永远不会收到目标系统的任何更改通知(例如,他们在其网站上进行了重新设计或以其他方式更改了功能)。所以我预计我的代码会经常被破坏。
所以我认为我真正的问题是,我的单元测试应该担心或处理界面(我正在抓取的网站)变化多少(如果有的话)?
我认为无论是否进行单元测试,我都需要在运行时进行大量测试,因为我需要确保我正在使用的数据是原始的。即使我在每次运行之前运行单元测试,Web UI 仍然可能在测试和运行时之间发生变化。
那么我应该专注于代码内测试和异常处理吗?这是否意味着要划清界限并将此类测试完全排除在单元测试之外?
谢谢
I'm new to unit testing so I'd like to get the opinion of some who are a little more clued-in.
I need to write some screen-scraping code shortly. The target system is a web ui where there'll be copious HTML parsing and similar volatile goodness involved. I'll never be notified of any changes by the target system (e.g. they put a redesign on their site or otherwise change functionality). So I anticipate my code breaking regularly.
So I think my real question is, how much, if any, of my unit testing should worry about or deal with the interface (the website I'm scraping) changing?
I think unit tests or not, I'm going to need to test heavily at runtime since I need to ensure the data I'm consuming is pristine. Even if I ran unit tests prior to every run, the web UI could still change between tests and runtime.
So do I focus on in-code testing and exception handling? Does that mean to draw a line in the sand and exclude this kind of testing from unit tests altogether?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
单元测试应始终设计为具有可重复的已知结果。
因此,要对屏幕抓取器进行单元测试,您应该针对一组已知的 HTML 编写测试(您可以使用模拟对象来表示这一点)。
您所谈论的事情听起来并不像是一个场景对我来说单元测试 - 如果你想确保你的代码尽可能稳健地运行,那么正如你所说,更多的是关于代码内测试和异常处理。
我还会包含一些警报代码,这样它们系统就会让您意识到 HTML 未按预期进行解析的任何情况。
Unit testing should always be designed to have repeatable known results.
Therefore, to unit test a screen-scraper, you should be writing the test against a known set of HTML (you may use a mock object to represent this)
The sort of thing you are talking about doesn't really sound like a scenario for unit-testing to me - if you want to ensure your code runs as robustly as possible, then it is more, as you say, about in-code testing and exception handling.
I would also include some alerting code, so they system made you aware of any occasions when the HTML does not get parsed as expected.
您应该尝试尽可能地将测试分开。使用执行实际代码的低级测试来测试数据处理(即不通过模拟浏览器)。
在模拟浏览器中,只需确保当您单击按钮、提交表单以及点击链接时会发生正确的事情。
切勿尝试测试布局是否正确。
You should try to separate your tests as much as possible. Test the data handling with low level tests that execute the actual code (i.e. not via a simulated browser).
In the simulated browser, just make sure that the right things happen when you click on buttons, when you submit forms, and when you follow links.
Never try to test whether the layout is correct.
我认为单元测试在这里可能有用的是,如果您有构建服务器,它们会给您一个代码不再有效的早期警告。您无法编写单元测试来证明如果网站更改其 HTML,屏幕抓取仍然可以工作(因为您无法判断它们会更改什么)。
您也许可以编写一个单元测试来检查您的努力是否返回了有用的内容。
I think the thing unit tests might be useful for here is if you have a build server they will give you an early warning the code no longer works. You can't write a unit test to prove that screenscraping will still work if the site changes its HTML (because you can't tell what they will change).
You might be able to write a unit test to check that something useful is returned from your efforts.