需要帮助理解 PHPUnit 测试的原始图像二进制数据的差异

发布于 2024-11-30 23:18:27 字数 695 浏览 1 评论 0原文

所以我编写了一个单元测试来比较 PHP 中的裁剪图像（使用 imagemagick）。该测试有效，但在一次比较大量图像时我遇到了问题。根据图像的创建时间，每个图像都会收到一个直接嵌入到原始数据中的时间戳。我一直在比较文件之前使用正则表达式来提取时间戳，但似乎每隔一段时间，其中一个图像文件就会包含额外的原始数据，即使它们完全相同。

举个例子，这是我的一项测试的结果（注意，我将图像的二进制数据作为字符串进行比较）：

ImageTest::testAutoCrop
无法断言两个字符串相等。
---预期
+++实际
@@@@
?n??m?
-?F sO=f?????????^??????w??>
<前><代码> ?(???/o???M)???o%tEXt??%tEXt
+?F sO=f?????????^??????w??>
<前><代码> ?(???/o???M)???o%tEXt

正如您所看到的......这两个文件之间的唯一区别是预期的图像中包含这个附加字符串：“？％tEXt”。

有人可以帮助我理解这段随机数据代表什么吗？这将帮助我弄清楚如何修改我的单元测试，这样类似的问题就不会再发生了。

谢谢，

马尔科姆

PS：如果我需要提供更多信息，请告诉我。

原文

So I wrote a Unit Test to compare cropped images (using imagemagick) in PHP. The test works, but i've been running into problems when it comes to comparing a large number of images at a time. Depending on the time the image is created at, each image receives a timestamp that is embedded directly into the raw data. I've been using a regular expression to pull out that timestamp right before comparing the files but it appears as though every once in a while, one of the image files will have additional raw data in it even though they're exactly the same.

To give an example, here's the result from one of my tests (note, i'm comparing the binary data of the images as a string):

ImageTest::testAutoCrop
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
?n??m?
-?F sO=f??????????^???????w??>
                          ?(???/o????M)???o%tEXt??%tEXt
+?F sO=f??????????^???????w??>
                          ?(???/o????M)???o%tEXt

As you can see....the only difference between these two files is that the expected image has this additional string in it: "?%tEXt".

Can someone help me understand what this random piece of data represents? That will help me figure out how to modify my unit test so that issues like this won't happen anymore.

Thanks,

Malcolm

PS: Please let me know if I need to provide more information.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

软甜啾 2024-12-07 23:18:27

所以我最终想出了解决这个问题的办法。有几件事需要澄清：

我进行单元测试的原因是因为我们的 imageservice Web 应用程序（ PHP ）使用 Imagemagick 来处理所有图像处理、操作、HTML 到图像的转换以及 PDF 到图像（ jpg、png、 gif、所有非 cmyk、pdf ）在我们的主网站上进行的转换。需要确保当我们向该图像服务应用程序添加新功能时，有足够的测试来确保一切仍然正常运行。
我们在每个图像中看到的字符串数据（又名：?%tEXt）是图像的 exif 数据。 ( http://en.wikipedia.org/wiki/Exchangeable_image_file_format ) 以便比较图片（建议取自 David Andersson 的回复（ https://stackoverflow.com/users/904933/david-andersson ）我们需要完全剥离所有评论数据图像以及创建日期时间戳/修改信息这样您就可以简单地处理图像而无需其他类型的元数据：


protected static function _removeTimeStamp( $string, $pdf = false ) {

  /* Note: Assume $string parameter is the image you're planning on cleaning in string format. */

  /* If you're working with a pdf, you need to remove the CreationDate using regex from the string representation. */
  if ( $pdf )
    return preg_replace( '/(CreationDate[^)]+)/', '', $string );

  /* Create a path for the temporary image we're going to need to create that will hold the exif free image */
  $strip_tmp = 'test/strip_tmp';

  /* write contents of string to temp string file */
  file_put_contents( $strip_tmp, $string );

  /* this will remove all exif data along with the date:create and date:modify properties from the image */
  exec( 'convert ' . $strip_tmp . ' -strip +set date:create +set date:modify ' . $strip_tmp . ' 2> /dev/null' ); 

  /* get the string representation of the new "cleaned" image */
  $result = file_get_contents( $strip_tmp ); 

  /* delete the temp file */
  unlink( $strip_tmp ); 

  /* return the cleaned string */
  return $result;

} // _removeTimeStamp

在将它们相互比较之前（以字符串格式），在每个图像上运行该函数。希望这对将来可能做类似事情的人有所帮助。

我计划更详细地写一篇关于此的博客文章，以展示我如何处理许多其他测试。当我这样做时，我将使用评论或此答案中的链接更新此问题。希望这对某人有帮助。

So I eventually came up with a solution to this issue. Couple things to clarify:

The reason why I was making unit tests is because our imageservice web application ( PHP ) uses Imagemagick to handle all image processing, manipulation , conversion of HTML to image, and PDF to image ( jpg,png,gif, all non cmyk, pdf ) conversions that happen on our main website. Needed to make sure that as we added new features to this image service application, there were enough tests in place to ensure that everything still functioned correctly.
The string data that we saw in each image ( aka: ?%tEXt ) is the image's exif data. ( http://en.wikipedia.org/wiki/Exchangeable_image_file_format ) in order to compare pictures ( suggestion taken from David Andersson's reply ( https://stackoverflow.com/users/904933/david-andersson ) we needed to completely strip all comment data out of the image along with the creation date time stamp / modified on info. That way you're dealing with simply an image and no other type of meta data. We did that with the following function:


protected static function _removeTimeStamp( $string, $pdf = false ) {

  /* Note: Assume $string parameter is the image you're planning on cleaning in string format. */

  /* If you're working with a pdf, you need to remove the CreationDate using regex from the string representation. */
  if ( $pdf )
    return preg_replace( '/(CreationDate[^)]+)/', '', $string );

  /* Create a path for the temporary image we're going to need to create that will hold the exif free image */
  $strip_tmp = 'test/strip_tmp';

  /* write contents of string to temp string file */
  file_put_contents( $strip_tmp, $string );

  /* this will remove all exif data along with the date:create and date:modify properties from the image */
  exec( 'convert ' . $strip_tmp . ' -strip +set date:create +set date:modify ' . $strip_tmp . ' 2> /dev/null' ); 

  /* get the string representation of the new "cleaned" image */
  $result = file_get_contents( $strip_tmp ); 

  /* delete the temp file */
  unlink( $strip_tmp ); 

  /* return the cleaned string */
  return $result;

} // _removeTimeStamp

This was run on each image before comparing them to each other ( in String format ). Hopefully this helps someone in the future who might be doing something similar.

I plan on writing a blog post about this in more detail to show how I took care of a number of other tests. When I do I will update this question with the link in either the comments or this answer. Hope this helps someone.

回复收藏 0 原文