需要帮助理解 PHPUnit 测试的原始图像二进制数据的差异
所以我编写了一个单元测试来比较 PHP 中的裁剪图像(使用 imagemagick)。该测试有效,但在一次比较大量图像时我遇到了问题。根据图像的创建时间,每个图像都会收到一个直接嵌入到原始数据中的时间戳。我一直在比较文件之前使用正则表达式来提取时间戳,但似乎每隔一段时间,其中一个图像文件就会包含额外的原始数据,即使它们完全相同。
举个例子,这是我的一项测试的结果(注意,我将图像的二进制数据作为字符串进行比较):
ImageTest::testAutoCrop
无法断言两个字符串相等。
---预期
+++实际
@@@@
?n??m?
-?F sO=f?????????^??????w??>
<前><代码> ?(???/o???M)???o%tEXt??%tEXt+?F sO=f?????????^??????w??>
<前><代码> ?(???/o???M)???o%tEXt
正如您所看到的......这两个文件之间的唯一区别是预期的图像中包含这个附加字符串:“?%tEXt”。
有人可以帮助我理解这段随机数据代表什么吗?这将帮助我弄清楚如何修改我的单元测试,这样类似的问题就不会再发生了。
谢谢,
马尔科姆
PS:如果我需要提供更多信息,请告诉我。
So I wrote a Unit Test to compare cropped images (using imagemagick) in PHP. The test works, but i've been running into problems when it comes to comparing a large number of images at a time. Depending on the time the image is created at, each image receives a timestamp that is embedded directly into the raw data. I've been using a regular expression to pull out that timestamp right before comparing the files but it appears as though every once in a while, one of the image files will have additional raw data in it even though they're exactly the same.
To give an example, here's the result from one of my tests (note, i'm comparing the binary data of the images as a string):
ImageTest::testAutoCrop
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
?n??m?
-?F sO=f??????????^???????w??>
?(???/o????M)???o%tEXt??%tEXt
+?F sO=f??????????^???????w??>
?(???/o????M)???o%tEXt
As you can see....the only difference between these two files is that the expected image has this additional string in it: "?%tEXt".
Can someone help me understand what this random piece of data represents? That will help me figure out how to modify my unit test so that issues like this won't happen anymore.
Thanks,
Malcolm
PS: Please let me know if I need to provide more information.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
所以我最终想出了解决这个问题的办法。有几件事需要澄清:
我进行单元测试的原因是因为我们的 imageservice Web 应用程序( PHP )使用 Imagemagick 来处理所有图像处理、操作、HTML 到图像的转换以及 PDF 到图像( jpg、png、 gif、所有非 cmyk、pdf )在我们的主网站上进行的转换。需要确保当我们向该图像服务应用程序添加新功能时,有足够的测试来确保一切仍然正常运行。
我们在每个图像中看到的字符串数据(又名:?%tEXt)是图像的 exif 数据。 ( http://en.wikipedia.org/wiki/Exchangeable_image_file_format ) 以便比较图片(建议取自 David Andersson 的回复( https://stackoverflow.com/users/904933/david-andersson )我们需要完全剥离所有评论数据图像以及创建日期时间戳/修改信息这样您就可以简单地处理图像而无需其他类型的元数据:
在将它们相互比较之前(以字符串格式),在每个图像上运行该函数。希望这对将来可能做类似事情的人有所帮助。
我计划更详细地写一篇关于此的博客文章,以展示我如何处理许多其他测试。当我这样做时,我将使用评论或此答案中的链接更新此问题。希望这对某人有帮助。
So I eventually came up with a solution to this issue. Couple things to clarify:
The reason why I was making unit tests is because our imageservice web application ( PHP ) uses Imagemagick to handle all image processing, manipulation , conversion of HTML to image, and PDF to image ( jpg,png,gif, all non cmyk, pdf ) conversions that happen on our main website. Needed to make sure that as we added new features to this image service application, there were enough tests in place to ensure that everything still functioned correctly.
The string data that we saw in each image ( aka: ?%tEXt ) is the image's exif data. ( http://en.wikipedia.org/wiki/Exchangeable_image_file_format ) in order to compare pictures ( suggestion taken from David Andersson's reply ( https://stackoverflow.com/users/904933/david-andersson ) we needed to completely strip all comment data out of the image along with the creation date time stamp / modified on info. That way you're dealing with simply an image and no other type of meta data. We did that with the following function:
This was run on each image before comparing them to each other ( in String format ). Hopefully this helps someone in the future who might be doing something similar.
I plan on writing a blog post about this in more detail to show how I took care of a number of other tests. When I do I will update this question with the link in either the comments or this answer. Hope this helps someone.
在单元测试中,您应该只测试您的单元,而不是第三方代码的单元。
您尚未指定有关图像缩放器的任何详细信息,但我假设您正在使用第三方函数,这些函数也算作自己的单元(一个函数是一个单元,就像一个类是一个单元一样)。
所以问题是:二进制数据是由您的代码生成的,您的单位吗?我想不是,否则你就会知道为什么二进制数据不同。
由于这些不是您的单位,因此不要为它们编写测试。相反,请转到原始单元来自的项目(上游)并检查其测试套件。
如果您关心集成测试(测试不同单元相互协作),您应该定义可以处理子组件返回的(不同)数据的稳定测试。例如,您可能需要进行图像比较(像素大小以及像素值(以及文件格式可能)是否正确),而不是比较可能不同的二进制数据,因为文件格式通常允许多种编码方式图像数据(加上元数据)。
In unit tests you should only test your units, not third party code's units.
You have not specified any details about your image resizer, but I assume you're making use of third-party functions which count as units of their own (one function is a unit, like one class is a unit).
So the question would be: Is the binary data generated by your code, your units? I guess not, otherwise you would have known why the binary data differs.
As those aren't your units, don't write tests for them. Instead, go to the project the original units come from (upstream) and check for their test-suite instead.
If you're concerned for integration tests (test that different units work with each other), you should define stable tests that can deal with the (different) data returned by sub-components. E.g. you might need an image comparison (is the pixel size and are the pixel values (as well as the fileformat maybe) correct) instead of comparing binary data which can differ as file-formats often allow more than one way how to encode the same image data (plus meta data).