如何以编程方式编辑 PDF 文件?
Adobe Acrobat 能够编辑 PDF 文件(即实际删除信息,而不是简单地在其顶部绘制黑框)。我想以编程方式使用此功能。要使用 GUI 进行密文编辑,您可以选择标记密文工具,将其绘制在要密文的文本上,然后应用密文。
有什么方法可以通过 AppleScript 或其他方式以编程方式执行此操作?
我知道要编辑的文本的 (X,y) 位置。
谢谢!
Adobe Acrobat has the ability to redact PDF files (that is, actually remove the information, rather than simply drawing a black box on top of it). I would like to use this feature programmatically. To redact using the GUI you select the Mark for Redaction Tool, draw it over the text to be redacted, then Apply Redactions.
Is there any way to do this programmatically, either through AppleScript or some other way?
I know the (X,y) location of the text to be redacted.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
为了正确编辑 PDF,您需要更改内容流。这非常难。
如果您能找到内容流中绘制您想要删除的文本的部分,那么您就成功了一半。
另一半是弄清楚如何更改内容流,这样您就不会修改文档的其余部分。如果下一个文本绘制操作符由“tm”命令进行(设置文本矩阵,它绝对定位下一段文本),那就很容易了。如果不是...您必须计算要替换的文本的确切宽度(几个不同的 PDF 库可以执行此操作),并更改绘图命令以跳过那么多内容。
例如:
因此,您必须将第一行
(...)Tj
分解为(这是一些文本,您只想)Tj
,N 0 Td
和(那边的大写“redact”)Tj
...其中“N”正确调整以下文本绘制操作的位置,使其落地在完全相同的地方。因此,您需要使用字体资源 /F1(无论结果是什么)了解“REDACT”的精确宽度,大小为 10 磅。为了让您的生活更加精彩,您还必须担心字距调整。您可以这样提供与文本内联的少量间距调整:(
这取自 PDF 规范中绘制的第一个文本)
要正确编辑“Incorporated”,您需要确定它已被拆分为两个字符串,并调整文本的位置字符串紧随其后,因此它位于完全相同的位置。
字符串可以是
十六进制值,而不是(plain old ascii)
。明白了吗?我在这里并没有涵盖所有的可能性,只是最常见的。
就像我说的:这非常难。
有一个名为 Appligent Redax (无连接)的 acrobat 插件,可让您绘制注释(或通过模板生成注释,正则表达式等),然后运行他们的代码来处理密文。应该可以以编程方式创建它们的注释,甚至可能激活它们的插件:文档中的 JS 可以运行菜单项。
In order to properly redact a PDF, you need to Alter The Content Stream. This is Very Hard.
If you can find the portion of the content stream that draws the text you want removed, you're halfway there.
The other half is figuring out how to change the content stream such that you don't modify the rest of the document. If the next text draw operator is proceeded by a "tm" command (set the text matrix, which absolutely positions the next piece of text), it's easy. If not... you have to calculate the exact width of the text you're replacing (several different PDF libraries can do this), and alter the drawing commands to skip over that much stuff.
For Example:
So you'd have to break up that first
(...)Tj
line into(Here's some text, and you only want to)Tj
,N 0 Td
, and(that upper case "redact" over there)Tj
... where the 'N' properly adjusts the position of the following text drawing operation such that it lands in EXACTLY THE SAME SPOT. So you'd need to know the precise width of " REDACT " using the font resource /F1 (whatever that turned out to be), sized to 10 points.Just to make your life more exciting, you have to worry about kerned text too. You can provide little spacing adjustments inline with text thusly:
(This is taken from the first text drawn in the PDF Spec)
To properly redact "Incorporated", you need to determine that it's been split across two strings, and adjust the positioning of the string following it so it's in Exactly The Same Spot.
And strings can be
<DEADBEEF>
hex values rather than(plain old ascii)
.Get the idea? And I haven't covered all the possibilities here, just the most common ones.
Like I said: This is Very Hard.
There's an acrobat plugin called Appligent Redax (no connection) that lets you draw annotations (or generate them via templates, regex, etc) and then run their code to handle the redaction. It should be possible to programmatically create their annotations and perhaps even activate their plugin: JS in a document can run a menu item.
您可以使用GroupDocs.Redaction for .NET以编程方式编辑 PDF 文档中的文本。您可以执行文本的精确短语、区分大小写和正则表达式编辑。您可以通过这种方式执行精确的短语编辑。
披露:我在 GroupDocs 担任开发人员传播者。
You can use GroupDocs.Redaction for .NET to programmatically redact text in the PDF documents. You can perform the exact phrase, case-sensitive and regular expression redaction of the text. This is how you can perform the exact phrase redaction.
Disclosure: I work as Developer Evangelist at GroupDocs.
这是一个网页,其中介绍了您需要执行的操作。正如其他人提到的,您必须在 Javascript 中执行此操作,因为这就是 Acrobat 的本机脚本。
http://acrobatusers.com/tutorials/2008/07/auto_redaction_with_javascript
当我使用 Acrobat 时令人惊讶的是,我经常不需要编写它的脚本。我检查了字典,看起来你必须编写 Javascript 文件,保存它,然后用 Applescript 打开它,如果这就是你想要做的(比如作为服务)。
以下是 Adobe 的 Acrobat Javascript 文档
http://livedocs.adobe.com/acrobat_sdk/9.1/Acrobat9_1_HTMLHelp/wwhelp/wwhimpl/common/html/wwhelp.htm?context=Acrobat9_HTMLHelp&file=JavaScript_SectionPage.70.1.html
Here's a web page that goes through what you need to do. As others mentioned you have to do this in Javascript as that's what Acrobat's native scripting is.
http://acrobatusers.com/tutorials/2008/07/auto_redaction_with_javascript
While I use Acrobat regularly I've surprisingly never had a need to script it. I checked the dictionary for it and it looks like you'll have to write Javascript file, save it and then open it with Applescript if that's what you want to do (say as a service).
Here's Adobe's Javascript for Acrobat documentation
http://livedocs.adobe.com/acrobat_sdk/9.1/Acrobat9_1_HTMLHelp/wwhelp/wwhimpl/common/html/wwhelp.htm?context=Acrobat9_HTMLHelp&file=JavaScript_SectionPage.70.1.html
在 Adobe Acrobat 中,您可以通过使用可在许多不同事件上调用的 ActionScript 来完成此操作。
如果您想在单独的应用程序中执行此操作,则可以在各种平台上使用许多不同的工具来创建和操作 PDF 文档,尽管我还没有找到功能丰富的开源库,甚至可以接近某些这些产品。
http://www.aspose.com /categories/.net-components/aspose.pdf-for-.net/default.aspx
http://www.aspose.com/categories/java-components/aspose.pdf-for-java/default.aspx
http://itextpdf.com/
iText 是我个人的最爱,物有所值。
Within Adobe Acrobat you may be able to do this through the use of an ActionScript that can be invoked on a number of different events.
If you would like to do this in a seperate application there are a number of different tools in a variety of platforms that can create and manipulate PDF documents, although I have yet to find a feature rich open source library that can even come close to some of these offerings.
http://www.aspose.com/categories/.net-components/aspose.pdf-for-.net/default.aspx
http://www.aspose.com/categories/java-components/aspose.pdf-for-java/default.aspx
http://itextpdf.com/
iText is my personal favorite and worth every penny.
一般来说,编辑 PDF 是一项相当复杂的任务。
您可以在 doXiview (https://doxiview.cib.de) 上免费编辑 PDF 页面。编辑选项为位于右侧。
另一种方法是通过 CIB pdf 工具箱 (https://pdftoolbox.cib.de/) 以编程方式完成
Redacting PDFs in general is a pretty complex task.
You can redact PDF pages for free on doXiview (https://doxiview.cib.de) The redact option is located on the right side.
Another approach is programmatically done by CIB pdf toolbox (https://pdftoolbox.cib.de/)