通过示例测试分类器
我正在编写一个分类器,用于对特价优惠是否适用于餐厅/酒店/等进行分类...这是用于分析外部网站的网络爬虫的一部分。 首先,我创建了一个 Meal?() 方法,该方法接受一段文本,如果它认为该文本是关于用餐交易的,则返回 true。它不可能 100% 准确,因为仅使用简单的关键字匹配。
def meal?(text)
!text.match(/restaurant|meal|wine|.../i).nil?
end
现在我正在为其编写一个测试,我有两个问题。第一个是我认为在单元测试中重新列出所有这些关键字有点多余。你怎么认为?
第二个问题: 我在源代码管理中有一个 .html 文件。用于测试爬虫的解析功能。理论上它的所有项目都应该通过,所以我想在这个分类测试中使用该 html,解析该 html 并将每笔交易的描述输入到该方法中。
一个缺点是 .html 取自外部站点。当该网站更改布局时,我将更新此 .html 文件,然后我也必须更改此分类测试。但我认为这没关系。
这是推荐的吗?我之所以想到这种方式,是因为我觉得从 .html 中提取信息并将其放入测试脚本本身(不是 DRY,并且使测试脚本变得相当大)感到不安。提供解析的描述是否会违反任何基本测试法则,例如“这向开发人员隐藏了必要的详细信息”或“这不利于生成报告”?
I am writing a classifier for categorizing whether a special deal is for a restaurant/hotel/etc... This is part of a web-crawler for analyzing external sites.
For start I made a meal?() method, which accepts a piece of text and will return true if it think the text is about a meal deal. It can't be 100% accurate, since only simple keyword matching is used.
def meal?(text)
!text.match(/restaurant|meal|wine|.../i).nil?
end
Now I am writing a test for it, and I have two questions. The first one is that I think it is a bit redundant to re-list all of these keywords in the unit test again. What do you think?
The second question:
I have an .html file in source control. It is used to test the crawler's parsing functionality. Theoretically all of its items should pass, so I am thinking to use that html in this categorizing test, parse that html and feed the descriptions of each deal into this method.
One drawback is that the .html is taken from an external site. When that site changes layout I will update this .html file, and then I have to change this categorizing test too. But I think this is okay.
Is this recommended? I thought of this way because I feels uneasy extracting information out of that .html and place it in the test script itself (not DRY, and makes test script quite big). Would feeding the parsed description violate any fundamental testing laws, like 'this hides the necessary details away from developers' or 'this is bad for generating reports'?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧,我显然误解了这个问题,所以我将彻底修改这个答案。
我个人认为,与间接加载 html 文件相比,从 html 文件中获取实际文本并将其复制/粘贴到测试中更简单、更可取。我可以找到两个原因......
然而,我找不到你想做的事情真的很糟糕的原因,我认为这可以归结为个人喜好。
OK so I obviously misunderstood the question so I will revise this answer completely.
I personally think it is simpler and preferable to take the actual text from the html file and copy/paste it to the test as opposed to the indirection of loading an html file. Two reasons I can find...
I cannot however find a reason why what you are trying to do is really really bad, I think it boils down to personal preference.