如何对绘制 PDF 图形的 Python 函数进行单元测试？

发布于 2024-10-11 20:33:25 字数 181 浏览 7 评论 0原文

我正在编写一个使用 Cairo 图形库输出 PDF 文件的 CAD 应用程序。许多单元测试并不需要实际生成 PDF 文件，例如计算对象的预期边界框。但是，我想确保在更改代码后生成的 PDF 文件“看起来”正确。有没有一种自动化的方法来做到这一点？如何才能尽可能实现自动化？我需要目视检查每个生成的 PDF 吗？如何在不拔头发的情况下解决这个问题呢？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

却一份温柔 2024-10-18 20:33:25

（另请参阅下面的更新！）

我正在 Linux 上使用 shell 脚本做同样的事情，该脚本将

ImageMagick 的 compare 命令
包装在pdftk 实用程序
Ghostscript （可选）

中（这将是将其移植到 DOS/Windows 的 .bat 批处理文件相当容易。）

我有一些由我的应用程序创建的“已知良好”的参考 PDF。将代码更改后新生成的 PDF 与这些参考 PDF 进行比较。比较是逐像素进行的，并保存为新的 PDF。在此 PDF 中，所有未更改的像素都涂成白色，而所有不同的像素都涂成红色。

以下是构建块：

pdftk

使用此命令将多页 PDF 文件拆分为多个单页 PDF：

pdftk  reference.pdf  burst  output  somewhere/reference_page_%03d.pdf
pdftk  comparison.pdf burst  output  somewhere/comparison_page_%03d.pdf

比较

使用此命令为每个页面创建一个“差异”PDF 页面：

compare \
       -verbose \
       -debug coder -log "%u %m:%l %e" \
        somewhere/reference_page_001.pdf \
        somewhere/comparison_page_001.pdf \
       -compose src \
        somewhereelse/reference_diff_page_001.pdf

Ghostscript

由于自动插入的元数据（例如当前日期+时间），PDF 输出对于基于 MD5 哈希的文件比较效果不佳。

如果您想自动发现所有由纯白页组成的案例，您还可以使用 bmp256 输出设备转换为无元数据位图格式。您可以对原始 PDF（参考和比较）或 diff-PDF 页面执行此操作：

 gs \
   -o reference_diff_page_001.bmp \
   -r72 \
   -g595x842 \
   -sDEVICE=bmp256 \
    reference_diff_page_001.pdf

 md5sum reference_diff_page_001.bmp

如果 MD5sum 符合您对 595x842 PostScript 点的全白页面的预期，则您的单元测试通过。

更新：

我不知道为什么我之前没有想到从 ImageMagick compare 生成直方图输出...

以下是链接 2 个不同命令的命令管道：

第一个与上面的 compare 相同，生成 '白色像素相等，红色像素不同' 格式，只是它输出 ImageMagick 内部 miff 格式。它不会写入文件，而是写入stdout。
第二个使用convert读取stdin，生成直方图并以文本形式输出结果。将有两行：
- 表示白色像素的数量
- 另一个表示红色像素的数量。

这里是：

compare \
   reference.pdf \
   current.pdf \
  -compose src \
   miff:- \
| \
convert \
   - \
  -define histogram:unique-colors=true \
  -format %c \
   histogram:info:-

示例输出：

 56934: (61937,    0, 7710,52428) #F1F100001E1ECCCC srgba(241,0,30,0.8)
444056: (65535,65535,65535,52428) #FFFFFFFFFFFFCCCC srgba(255,255,255,0.8)

（示例输出是通过使用这些参考生成的。 pdf 和 current.pdf 文件。）

我认为这种类型的输出非常适合自动单元测试。如果您评估这两个数字，您可以轻松计算“红色像素”百分比，您甚至可以根据特定阈值决定返回通过或失败（如果您不这样做）由于某种原因不一定需要“零红色”）。

(See also update below!)

I'm doing the same thing using a shell script on Linux that wraps

ImageMagick's compare command
the pdftk utility
Ghostscript (optionally)

(It would be rather easy to port this to a .bat Batch file for DOS/Windows.)

I have a few reference PDFs created by my application which are "known good". Newly generated PDFs after code changes are compared to these reference PDFs. The comparison is done pixel by pixel and is saved as a new PDF. In this PDF, all unchanged pixels are painted in white, while all differing pixels are painted in red.

Here are the building blocks:

pdftk

Use this command to split multipage PDF files into multiple singlepage PDFs:

pdftk  reference.pdf  burst  output  somewhere/reference_page_%03d.pdf
pdftk  comparison.pdf burst  output  somewhere/comparison_page_%03d.pdf

compare

Use this command to create a "diff" PDF page for each of the pages:

compare \
       -verbose \
       -debug coder -log "%u %m:%l %e" \
        somewhere/reference_page_001.pdf \
        somewhere/comparison_page_001.pdf \
       -compose src \
        somewhereelse/reference_diff_page_001.pdf

Ghostscript

Because of automatically inserted meta data (such as the current date+time), PDF output is not working well for MD5hash-based file comparisons.

If you want to automatically discover all cases which consist of purely white pages, you could also convert to a meta-data free bitmap format using the bmp256 output device. You can do that for the original PDFs (reference and comparison), or for the diff-PDF pages:

 gs \
   -o reference_diff_page_001.bmp \
   -r72 \
   -g595x842 \
   -sDEVICE=bmp256 \
    reference_diff_page_001.pdf

 md5sum reference_diff_page_001.bmp

If the MD5sum is what you expect for an all-white page of 595x842 PostScript points, then your unit test passed.

Update:

I don't know why I didn't previously think of generating a histogram output from the ImageMagick compare...

The following is a command pipeline chaining 2 different commands:

the first one is the same as the above compare which generates the 'white pixels are equal, red pixels are differences'-format, only it outputs the ImageMagick internal miff format. It doesn't write to a file, but to stdout.
the second one uses convert to read stdin, generate a histogram and output the result in text form. There will be two lines:
- one indicating the number of white pixels
- the other one indicating the number of red pixels.

Here it goes:

compare \
   reference.pdf \
   current.pdf \
  -compose src \
   miff:- \
| \
convert \
   - \
  -define histogram:unique-colors=true \
  -format %c \
   histogram:info:-

Sample output:

 56934: (61937,    0, 7710,52428) #F1F100001E1ECCCC srgba(241,0,30,0.8)
444056: (65535,65535,65535,52428) #FFFFFFFFFFFFCCCC srgba(255,255,255,0.8)

(Sample output was generated by using these reference.pdf and current.pdf files.)

I think this type of output is really well suited for automatic unit testing. If you evaluate the two numbers, you can easily compute the "red pixel" percentage and you could even decide to return PASSED or FAILED based on a certain threshold (if you don't necessarily need "zero red" for some reason).

回复收藏 0 原文

残月升风 2024-10-18 20:33:25

您可以将 PDF 捕获为位图（或至少是无损压缩的）图像，然后将每个测试生成的图像与其预期外观的参考图像进行比较。任何差异都将被标记为测试错误。

回复收藏 0 原文

心病无药医 2024-10-18 20:33:25

我脑海中出现的第一个想法是使用 diff 实用程序。这些通常用于比较文档的文本，但也可能比较 PDF 的布局。使用它，您可以将预期输出与提供的输出进行比较。

谷歌给我的第一个结果是这个。尽管它是商业的，但可能还有其他免费/开源替代品。

回复收藏 0 原文

与之呼应 2024-10-18 20:33:25

我会尝试使用 xpresser - (https://wiki.ubuntu.com/Xpresser ) 您可以尝试将图像与相似的图像而不是精确的副本匹配 - 这是这些情况下的问题。

我不知道 xpresser 是否正在积极开发，或者它是否可以与独立的图像文件一起使用（我认为是这样）——无论如何，它从 Sikuli 项目（这是带有 Jython 前端的 Java，而xpresser 是 Python）。

回复收藏 0 原文

幸福丶如此 2024-10-18 20:33:25

我用 Python 编写了一个工具来验证雇主文档的 PDF。它能够将各个页面与主图像进行比较。我使用了一个名为 swftools 的库将页面导出为 PNG，然后使用 Python Imaging Library 与master进行比较。

相关代码看起来像这样（这不会运行，因为对脚本的其他部分有一些依赖，但您应该明白）：

# exporting

gfxpdf = gfx.open("pdf", self.pdfpath)
if os.path.isfile(pngPath):
    os.remove(pngPath)
page = gfxpdf.getPage(pagenum)
img = gfx.ImageList()
img.startpage(page.width, page.height)
page.render(img)
img.endpage()
img.save(pngPath)
return os.path.isfile(pngPath)

# comparing

outPng = os.path.join(outpath, pngname)
masterPng = os.path.join(outpath, "_master", pngname)
if os.path.isfile(masterPng):
    output = Image.open(outPng).convert("RGB") # discard alpha channel, if any
    master = Image.open(masterPng).convert("RGB")
    mismatch = any(x[1] for x in ImageChops.difference(output, master).getextrema())

I wrote a tool in Python to validate PDFs for my employer's documentation. It has the capability to compare individual pages to master images. I used a library I found called swftools to export the page to PNG, then used the Python Imaging Library to compare it with the master.

The relevant code looks something like this (this won't run as there are some dependencies on other parts of the script, but you should get the idea):

# exporting

gfxpdf = gfx.open("pdf", self.pdfpath)
if os.path.isfile(pngPath):
    os.remove(pngPath)
page = gfxpdf.getPage(pagenum)
img = gfx.ImageList()
img.startpage(page.width, page.height)
page.render(img)
img.endpage()
img.save(pngPath)
return os.path.isfile(pngPath)

# comparing

outPng = os.path.join(outpath, pngname)
masterPng = os.path.join(outpath, "_master", pngname)
if os.path.isfile(masterPng):
    output = Image.open(outPng).convert("RGB") # discard alpha channel, if any
    master = Image.open(masterPng).convert("RGB")
    mismatch = any(x[1] for x in ImageChops.difference(output, master).getextrema())

回复收藏 0 原文