如何对绘制 PDF 图形的 Python 函数进行单元测试?

发布于 2024-10-11 20:33:25 字数 181 浏览 7 评论 0原文

我正在编写一个使用 Cairo 图形库输出 PDF 文件的 CAD 应用程序。许多单元测试并不需要实际生成 PDF 文件,例如计算对象的预期边界框。但是,我想确保在更改代码后生成的 PDF 文件“看起来”正确。有没有一种自动化的方法来做到这一点?如何才能尽可能实现自动化?我需要目视检查每个生成的 PDF 吗?如何在不拔头发的情况下解决这个问题呢?

I'm writing a CAD application that outputs PDF files using the Cairo graphics library. A lot of the unit testing does not require actually generating the PDF files, such as computing the expected bounding boxes of the objects. However, I want to make sure that the generated PDF files "look" correct after I change the code. Is there an automated way to do this? How can I automate as much as possible? Do I need to visually inspect each generated PDF? How can I solve this problem without pulling my hair out?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

却一份温柔 2024-10-18 20:33:25

(另请参阅下面的更新!)

我正在 Linux 上使用 shell 脚本做同样的事情,该脚本将

  1. ImageMagick 的 compare 命令
  2. 包装在pdftk 实用程序
  3. Ghostscript (可选)

中(这将是将其移植到 DOS/Windows 的 .bat 批处理文件相当容易。)

我有一些由我的应用程序创建的“已知良好”的参考 PDF。将代码更改后新生成的 PDF 与这些参考 PDF 进行比较。比较是逐像素进行的,并保存为新的 PDF。在此 PDF 中,所有未更改的像素都涂成白色,而所有不同的像素都涂成红色。

以下是构建块:

pdftk

使用此命令将多页 PDF 文件拆分为多个单页 PDF:

pdftk  reference.pdf  burst  output  somewhere/reference_page_%03d.pdf
pdftk  comparison.pdf burst  output  somewhere/comparison_page_%03d.pdf

比较

使用此命令为每个页面创建一个“差异”PDF 页面:

compare \
       -verbose \
       -debug coder -log "%u %m:%l %e" \
        somewhere/reference_page_001.pdf \
        somewhere/comparison_page_001.pdf \
       -compose src \
        somewhereelse/reference_diff_page_001.pdf

Ghostscript

由于自动插入的元数据(例如当前日期+时间),PDF 输出对于基于 MD5 哈希的文件比较效果不佳。

如果您想自动发现所有由纯白页组成的案例,您还可以使用 bmp256 输出设备转换为无元数据位图格式。您可以对原始 PDF(参考和比较)或 diff-PDF 页面执行此操作:

 gs \
   -o reference_diff_page_001.bmp \
   -r72 \
   -g595x842 \
   -sDEVICE=bmp256 \
    reference_diff_page_001.pdf

 md5sum reference_diff_page_001.bmp
 

如果 MD5sum 符合您对 595x842 PostScript 点的全白页面的预期,则您的单元测试通过。


更新:

我不知道为什么我之前没有想到从 ImageMagick compare 生成直方图输出...

以下是链接 2 个不同命令的命令管道:

  1. 第一个与上面的 compare 相同,生成 '白色像素相等,红色像素不同' 格式,只是它输出 ImageMagick 内部 miff 格式。它不会写入文件,而是写入stdout
  2. 第二个使用convert读取stdin,生成直方图并以文本形式输出结果。将有两行:
    • 表示白色像素的数量
    • 另一个表示红色像素的数量。

这里是:

compare \
   reference.pdf \
   current.pdf \
  -compose src \
   miff:- \
| \
convert \
   - \
  -define histogram:unique-colors=true \
  -format %c \
   histogram:info:-

示例输出:

 56934: (61937,    0, 7710,52428) #F1F100001E1ECCCC srgba(241,0,30,0.8)
444056: (65535,65535,65535,52428) #FFFFFFFFFFFFCCCC srgba(255,255,255,0.8)

(示例输出是通过使用这些参考生成的。 pdfcurrent.pdf 文件。)

我认为这种类型的输出非常适合自动单元测试。如果您评估这两个数字,您可以轻松计算“红色像素”百分比,您甚至可以根据特定阈值决定返回通过失败(如果您不这样做)由于某种原因不一定需要“零红色”)。

(See also update below!)

I'm doing the same thing using a shell script on Linux that wraps

  1. ImageMagick's compare command
  2. the pdftk utility
  3. Ghostscript (optionally)

(It would be rather easy to port this to a .bat Batch file for DOS/Windows.)

I have a few reference PDFs created by my application which are "known good". Newly generated PDFs after code changes are compared to these reference PDFs. The comparison is done pixel by pixel and is saved as a new PDF. In this PDF, all unchanged pixels are painted in white, while all differing pixels are painted in red.

Here are the building blocks:

pdftk

Use this command to split multipage PDF files into multiple singlepage PDFs:

pdftk  reference.pdf  burst  output  somewhere/reference_page_%03d.pdf
pdftk  comparison.pdf burst  output  somewhere/comparison_page_%03d.pdf

compare

Use this command to create a "diff" PDF page for each of the pages:

compare \
       -verbose \
       -debug coder -log "%u %m:%l %e" \
        somewhere/reference_page_001.pdf \
        somewhere/comparison_page_001.pdf \
       -compose src \
        somewhereelse/reference_diff_page_001.pdf

Ghostscript

Because of automatically inserted meta data (such as the current date+time), PDF output is not working well for MD5hash-based file comparisons.

If you want to automatically discover all cases which consist of purely white pages, you could also convert to a meta-data free bitmap format using the bmp256 output device. You can do that for the original PDFs (reference and comparison), or for the diff-PDF pages:

 gs \
   -o reference_diff_page_001.bmp \
   -r72 \
   -g595x842 \
   -sDEVICE=bmp256 \
    reference_diff_page_001.pdf

 md5sum reference_diff_page_001.bmp
 

If the MD5sum is what you expect for an all-white page of 595x842 PostScript points, then your unit test passed.


Update:

I don't know why I didn't previously think of generating a histogram output from the ImageMagick compare...

The following is a command pipeline chaining 2 different commands:

  1. the first one is the same as the above compare which generates the 'white pixels are equal, red pixels are differences'-format, only it outputs the ImageMagick internal miff format. It doesn't write to a file, but to stdout.
  2. the second one uses convert to read stdin, generate a histogram and output the result in text form. There will be two lines:
    • one indicating the number of white pixels
    • the other one indicating the number of red pixels.

Here it goes:

compare \
   reference.pdf \
   current.pdf \
  -compose src \
   miff:- \
| \
convert \
   - \
  -define histogram:unique-colors=true \
  -format %c \
   histogram:info:-

Sample output:

 56934: (61937,    0, 7710,52428) #F1F100001E1ECCCC srgba(241,0,30,0.8)
444056: (65535,65535,65535,52428) #FFFFFFFFFFFFCCCC srgba(255,255,255,0.8)

(Sample output was generated by using these reference.pdf and current.pdf files.)

I think this type of output is really well suited for automatic unit testing. If you evaluate the two numbers, you can easily compute the "red pixel" percentage and you could even decide to return PASSED or FAILED based on a certain threshold (if you don't necessarily need "zero red" for some reason).

残月升风 2024-10-18 20:33:25

您可以将 PDF 捕获为位图(或至少是无损压缩的)图像,然后将每个测试生成的图像与其预期外观的参考图像进行比较。任何差异都将被标记为测试错误。

You could capture the PDF as a bitmap (or at least a losslessly-compressed) image, and then compare the image generated by each test with a reference image of what it's supposed to look like. Any differences would be flagged as an error for the test.

心病无药医 2024-10-18 20:33:25

我脑海中出现的第一个想法是使用 diff 实用程序。这些通常用于比较文档的文本,但也可能比较 PDF 的布局。使用它,您可以将预期输出与提供的输出进行比较。

谷歌给我的第一个结果是这个。尽管它是商业的,但可能还有其他免费/开源替代品。

The first idea that pops in my head is to use a diff utility. These are generally used to compare texts of documents but they might also compare the layout of the PDF. Using it, you can compare the expected output with the output supplied.

The first result google gives me is this. Altough it is commercial, there might be other free/open source alternatives.

与之呼应 2024-10-18 20:33:25

我会尝试使用 xpresser - (https://wiki.ubuntu.com/Xpresser ) 您可以尝试将图像与相似的图像而不是精确的副本匹配 - 这是这些情况下的问题。

我不知道 xpresser 是否正在积极开发,或者它是否可以与独立的图像文件一起使用(我认为是这样)——无论如何,它从 Sikuli 项目(这是带有 Jython 前端的 Java,而xpresser 是 Python)。

I would try this using xpresser - (https://wiki.ubuntu.com/Xpresser ) You can try to match images to similar images not exact copies - which is the problem in these cases.

I don't know if xpresser is being ctively developed, or if it can be used with stand alone image files (I think so) -- anyway it takes its ideas from teh Sikuli project (which is Java with a Jython front end, while xpresser is Python).

幸福丶如此 2024-10-18 20:33:25

我用 Python 编写了一个工具来验证雇主文档的 PDF。它能够将各个页面与主图像进行比较。我使用了一个名为 swftools 的库将页面导出为 PNG,然后使用 Python Imaging Library 与master进行比较。

相关代码看起来像这样(这不会运行,因为对脚本的其他部分有一些依赖,但您应该明白):

# exporting

gfxpdf = gfx.open("pdf", self.pdfpath)
if os.path.isfile(pngPath):
    os.remove(pngPath)
page = gfxpdf.getPage(pagenum)
img = gfx.ImageList()
img.startpage(page.width, page.height)
page.render(img)
img.endpage()
img.save(pngPath)
return os.path.isfile(pngPath)

# comparing

outPng = os.path.join(outpath, pngname)
masterPng = os.path.join(outpath, "_master", pngname)
if os.path.isfile(masterPng):
    output = Image.open(outPng).convert("RGB") # discard alpha channel, if any
    master = Image.open(masterPng).convert("RGB")
    mismatch = any(x[1] for x in ImageChops.difference(output, master).getextrema())

I wrote a tool in Python to validate PDFs for my employer's documentation. It has the capability to compare individual pages to master images. I used a library I found called swftools to export the page to PNG, then used the Python Imaging Library to compare it with the master.

The relevant code looks something like this (this won't run as there are some dependencies on other parts of the script, but you should get the idea):

# exporting

gfxpdf = gfx.open("pdf", self.pdfpath)
if os.path.isfile(pngPath):
    os.remove(pngPath)
page = gfxpdf.getPage(pagenum)
img = gfx.ImageList()
img.startpage(page.width, page.height)
page.render(img)
img.endpage()
img.save(pngPath)
return os.path.isfile(pngPath)

# comparing

outPng = os.path.join(outpath, pngname)
masterPng = os.path.join(outpath, "_master", pngname)
if os.path.isfile(masterPng):
    output = Image.open(outPng).convert("RGB") # discard alpha channel, if any
    master = Image.open(masterPng).convert("RGB")
    mismatch = any(x[1] for x in ImageChops.difference(output, master).getextrema())
述情 2024-10-18 20:33:25

“cmppdf”比较 PDF 的视觉外观或文本内容。

它是一个 bash 脚本,可从 https://abhweb.org/jima/cmppdf?v

它使用 pdftkcompare 以图形方式比较 PDF,类似于其他人在其他答案中描述的内容。不比较元数据(任何不改变实际外观的数据)。

文本比较选项使用 pdftotxtdiff

"cmppdf" compares either the visual appearance or text content of PDFs.

It is a bash script, downloadable from https://abhweb.org/jima/cmppdf?v

It uses pdftk and compare to graphically compare PDFs, similar to what others have described in other answers. Meta data (anything which does not change the actual appearance) is not compared.

The text-comparison option uses pdftotxt and diff.

归途 2024-10-18 20:33:25

我经常使用 Python 和 Reportlab 生成 PDF,因此我通过多种方式对其进行测试:

  1. 在将文本、Matplotlib 绘图或 SVG 绘图等各个组件添加到 Reportlab 文档之前对其进行测试。
  2. 通过使用 PyMuPDF / fitz 将已完成的 PDF 文档转换为 PNG 图像来测试它。这是一个 示例,用于转换PDF。

对于任一类型的图像比较,我构建了 图像差异类接收两个图像并突出显示差异。您可以将其设置为 Pytest 固定装置,如类文档中所述。如果您使用我的实时编码插件,它会在您编辑代码时更新显示。

我通常会尽量避免将测试结果与静态图像进行比较,因为字体会发生变化,并且保持静态图像最新很烦人。相反,我编写单元测试来生成预期的图像,然后调用被测系统来生成图像并比较两者。当我编写代码构建块并在每个级别进行测试时,这种方法效果最好。否则,单元测试会比实际代码更复杂。

I often use Python and Reportlab to generate PDFs, so I test them in a couple of ways:

  1. Test the individual components like text, Matplotlib plots, or SVG drawings before they get added to the Reportlab doc.
  2. Test the completed PDF doc by converting it to a PNG image with PyMuPDF / fitz. Here's an example that converts the first page of a PDF.

For either type of image comparison, I built an image differ class that takes in two images and highlights the differences. You can set it up as a Pytest fixture, as described in the class docs. If you use my live coding plugin, it will update the display as you edit your code.

I've generally tried to avoid comparing the test results to static images, because fonts change, and it's annoying to keep the static images up to date. Instead, I write the unit test to generate an expected image, then call the system under test to generate an image and compare the two. This works best when I write building blocks of code, tested at each level. Otherwise, the unit tests get more complicated than the real code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文