(以编程方式)比较 PDF 的可靠方法?

发布于 2024-09-25 18:54:12 字数 495 浏览 14 评论 0原文

可能的重复:
比较大量 PDF 文件的工具?

我在典型的场景是,公司为您提供了一堆新的一年的 pdf 表单,没有任何修订说明,您应该弄清楚与前一年的表单有何不同。

我在这里讨论了大量的表格,所以我试图找到一种方法来比较 PDF 以概述差异,而不需要人们手动检查每一个表格。

我的想法是从 PDF 中提取所有文本并将其转储到 .txt 中,然后在文本文件上运行差异,但这听起来很糟糕。

我的问题以编程方式提出,但我很高兴有任何可靠的工具来比较 PDF,并且主要希望从人们的经验中获得想法。也愿意接受任何编程解决方案(最好是 C#,但请提出任何想法)。

Possible Duplicate:
Tool to compare large numbers of PDF files?

I am in the classic scenario where the business gives you a bunch of new pdf forms for the new year with no revision notes whatsoever and you are supposed to figure out what's different from the previous year ones.

I am talking loads of forms here, so I am trying to find a way to compare PDFs to outline differences without having people to manually go through each and every one of them.

My idea was to extract all the text from the PDFs and dump it into a .txt then run differences on text files, but it sounds horrible.

My question says programmatically, but I'd be happy with any reliable tools for comparing PDFs, and mainly looking to get an idea from people experiences. Also willing to entertain any programmatic solutions (preferably in C# but pls shoot out any ideas).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

迷迭香的记忆 2024-10-02 18:54:12

有相当多的软件产品声称可以比较 pdf。我从来不需要使用其中一个,但如果这将是一个重复的过程,我认为贵公司投资其中一个是明智的。只需谷歌“pdf diff”即可找到一堆潜在的应用程序。

此外,您的情况与这个问题非常相似:Tool to Comparison large number of PDF files? 我认为它的讨论可能会有所帮助。

There is quite a few software products that claim to diff pdfs. I've never had need to use one but if this is going to be a recurring process I think it'd be wise for your company to invest in one of them. Just Google "pdf diff" for a bunch of potential applications.

Additionally, your situation is very similar to this question: Tool to compare large numbers of PDF files? I think its discussion may help.

扭转时空 2024-10-02 18:54:12

我是 Docotic.Pdf Library 的开发人员。我们在单元测试中使用 PDF 比较来检查测试是否按预期生成 PDF。 PDF 是特殊对象的集合,我们比较所有 PDF 对象,忽略一些属性,例如预告片 ID 和创建者信息。这个实现效果很好。

您可以尝试方法PdfDocument.DocumentsAreEqual。这个方法只是告诉你文档是平等的,没有具体的区别。如果您需要更多功能,可以联系我们。

I am a developer of Docotic.Pdf Library. We use PDF comparison in unit tests for checking that test produces PDF as expected. PDF is a collection of special objects and we compare all PDF objects ignoring some properties like trailer IDs and creator info. This implementation works fine.

You can try the method PdfDocument.DocumentsAreEqual. This method just tell you are documents equal, without specific differences. You may contact us if you need more functionality.

江南烟雨〆相思醉 2024-10-02 18:54:12

我采用的方法是从 PDF 中获取原始数据,然后使用 Word 或 TortiseSVN 或 WinMerge 等来处理比较部分。在我的实例中,我在 C# 中的 RichTextBox 中进行了比较...对差异进行了着色等等...因为我们希望这一切都在我们的应用程序中。

这就是我所做的......
PDF比较,因为我试图比较混合文档、Word 和 PDF。

不过我会推荐 PDFBox 进行解析,更优雅一点...尽管 iTextSharp 效果还不错...

I went the approach to getting the raw data out of the PDF, then making use of Word or TortiseSVN, or WinMerge, etc...to take care of the comparison piece. In my instance I did the comparison in a RichTextBox in C#...coloring the differences, etc...since we wanted it all within our app.

Here is what I did...
PDF comparison as I was trying to compare mixed documents, Word and PDF.

However I would recommend PDFBox for the parsing, a bit more elegant...although iTextSharp worked out ok...

别靠近我心 2024-10-02 18:54:12

我写了一篇博客,建议一些比较 PDF 文件的方法,网址为 https:/ /blog.idrsolutions.com/2010/09/comparing-2-pdf-files/

I wrote a blog suggesting some approaches to comparing PDF files at https://blog.idrsolutions.com/2010/09/comparing-2-pdf-files/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文