使用 pyobjc 将元数据写入 pdf

发布于 2024-09-30 12:37:38 字数 1249 浏览 8 评论 0原文

我正在尝试使用以下 python 代码将元数据写入 pdf 文件：

from Foundation import *
from Quartz import *

url = NSURL.fileURLWithPath_("test.pdf")
pdfdoc = PDFDocument.alloc().initWithURL_(url)
assert pdfdoc, "failed to create document"

print "reading pdf file"

attrs = {}
attrs[PDFDocumentTitleAttribute] = "THIS IS THE TITLE"
attrs[PDFDocumentAuthorAttribute] = "A. Author and B. Author"

PDFDocumentTitleAttribute = "test"

pdfdoc.setDocumentAttributes_(attrs)
pdfdoc.writeToFile_("mynewfile.pdf")   

print "pdf made"

这似乎工作正常（控制台没有错误），但是当我检查文件的元数据时，如下所示：

PdfID0:
242b7e252f1d3fdd89b35751b3f72d3
PdfID1:
242b7e252f1d3fdd89b35751b3f72d3
NumberOfPages: 4

并且原始文件具有以下元数据：

InfoKey: Creator
InfoValue: PScript5.dll Version 5.2.2
InfoKey: Title
InfoValue: Microsoft Word - PROGRESS  ON  THE  GABION  HOUSE Compressed.doc
InfoKey: Producer
InfoValue: GPL Ghostscript 8.15
InfoKey: Author
InfoValue: PWK
InfoKey: ModDate
InfoValue: D:20101021193627-05'00'
InfoKey: CreationDate
InfoValue: D:20101008152350Z
PdfID0: d5fd6d3960122ba72117db6c4d46cefa
PdfID1: 24bade63285c641b11a8248ada9f19
NumberOfPages: 4

所以问题是，它不是追加元数据，而是清除以前的元数据结构。我需要做什么才能让它发挥作用？我的目标是附加参考管理系统可以导入的元数据。

原文

I'm trying to write metadata to a pdf file using the following python code:

from Foundation import *
from Quartz import *

url = NSURL.fileURLWithPath_("test.pdf")
pdfdoc = PDFDocument.alloc().initWithURL_(url)
assert pdfdoc, "failed to create document"

print "reading pdf file"

attrs = {}
attrs[PDFDocumentTitleAttribute] = "THIS IS THE TITLE"
attrs[PDFDocumentAuthorAttribute] = "A. Author and B. Author"

PDFDocumentTitleAttribute = "test"

pdfdoc.setDocumentAttributes_(attrs)
pdfdoc.writeToFile_("mynewfile.pdf")   

print "pdf made"

This appears to work fine (no errors to the consoled), however when I examine the metadata of the file it is as follows:

PdfID0:
242b7e252f1d3fdd89b35751b3f72d3
PdfID1:
242b7e252f1d3fdd89b35751b3f72d3
NumberOfPages: 4

and the original file had the following metadata:

InfoKey: Creator
InfoValue: PScript5.dll Version 5.2.2
InfoKey: Title
InfoValue: Microsoft Word - PROGRESS  ON  THE  GABION  HOUSE Compressed.doc
InfoKey: Producer
InfoValue: GPL Ghostscript 8.15
InfoKey: Author
InfoValue: PWK
InfoKey: ModDate
InfoValue: D:20101021193627-05'00'
InfoKey: CreationDate
InfoValue: D:20101008152350Z
PdfID0: d5fd6d3960122ba72117db6c4d46cefa
PdfID1: 24bade63285c641b11a8248ada9f19
NumberOfPages: 4

So the problems are, it is not appending the metadata, and it is clearing the previous metadata structure. What do I need to do to get this to work? My objective is to append metadata that reference management systems can import.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱的十字路口 2024-10-07 12:37:38

马克走在正确的道路上，但有一些特殊之处需要考虑。

首先，他认为 pdfdoc.documentAttributes 是一个包含文档元数据的 NSDictionary 是正确的。您想要修改它，但请注意 documentAttributes 为您提供了一个 NSDictionary，它是不可变的。您必须将其转换为 NSMutableDictionary，如下所示：

attrs = NSMutableDictionary.alloc().initWithDictionary_(pdfDoc.documentAttributes())

现在您可以像以前一样修改 attrs。不需要像 Mark 建议的那样编写 PDFDocument.PDFDocumentTitleAttribute ，那是行不通的，PDFDocumentTitleAttribute 被声明为模块级常量，所以只需按照您的方式进行即可在你自己的代码中。

这是适合我的完整代码：

from Foundation import *
from Quartz import *

url = NSURL.fileURLWithPath_("test.pdf")
pdfdoc = PDFDocument.alloc().initWithURL_(url)

attrs = NSMutableDictionary.alloc().initWithDictionary_(pdfdoc.documentAttributes())
attrs[PDFDocumentTitleAttribute] = "THIS IS THE TITLE"
attrs[PDFDocumentAuthorAttribute] = "A. Author and B. Author"

pdfdoc.setDocumentAttributes_(attrs)
pdfdoc.writeToFile_("mynewfile.pdf")

Mark is on the right track, but there are a few peculiarities that should be accounted for.

First, he is correct that pdfdoc.documentAttributes is an NSDictionary that contains the document metadata. You would like to modify that, but note that documentAttributes gives you an NSDictionary, which is immutable. You have to convert it to an NSMutableDictionary as follows:

attrs = NSMutableDictionary.alloc().initWithDictionary_(pdfDoc.documentAttributes())

Now you can modify attrs as you did. There is no need to write PDFDocument.PDFDocumentTitleAttribute as Mark suggested, that one won't work, PDFDocumentTitleAttribute is declared as a module-level constant, so just do as you did in your own code.

Here is the full code that works for me:

from Foundation import *
from Quartz import *

url = NSURL.fileURLWithPath_("test.pdf")
pdfdoc = PDFDocument.alloc().initWithURL_(url)

attrs = NSMutableDictionary.alloc().initWithDictionary_(pdfdoc.documentAttributes())
attrs[PDFDocumentTitleAttribute] = "THIS IS THE TITLE"
attrs[PDFDocumentAuthorAttribute] = "A. Author and B. Author"

pdfdoc.setDocumentAttributes_(attrs)
pdfdoc.writeToFile_("mynewfile.pdf")

回复收藏 0 原文

一身仙ぐ女味 2024-10-07 12:37:38

免责声明：我对 Python 完全陌生，但对 PDF 却是老手了。

为了避免破坏所有现有属性，您需要以 pdfDoc.documentAttributes 开头 attrs，而不是 {}。 setDocumentAttributes 几乎肯定是覆盖，而不是合并（在此处给出您的输出）。

其次，所有 PDFDocument*Attribute 常量都是 PDFDocument 的一部分。我对 Python 的无知无疑是显而易见的，但是你不应该将它们作为属性而不是裸变量来引用吗？像这样：

attrs[PDFDocument.PDFDocumentTitleAttribute] = "THIS IS THE TITLE"

您可以分配给 PDFDocumentTitleAttribute 让我相信它不是一个常量。

如果我是对的，您的属性将尝试将大量值分配给空键。我的Python很弱，所以我不知道你如何检查它。在调用 pdfDoc.setDocumentAttributes_() 之前检查 attrs 应该会有所启发。

DISCLAIMER: I'm utterly new to Python, but an old hand at PDF.

To avoid smashing all the existing attributes, you need to start attrs with pdfDoc.documentAttributes, not {}. setDocumentAttributes is almost certainly an overwrite rather than a merge (given your output here).

Second, all the PDFDocument*Attribute constants are part of PDFDocument. My Python ignorance is undoubtedly showing, but shouldn't you be referencing them as attributes rather than as bare variables? Like this:

attrs[PDFDocument.PDFDocumentTitleAttribute] = "THIS IS THE TITLE"

That you can assign to PDFDocumentTitleAttribute leads me to believe it's not a constant.

If I'm right, your attrs will have tried to assign numerous values to a null key. My Python is weak, so I don't know how you'd check that. Examining attrs prior to calling pdfDoc.setDocumentAttributes_() should be revealing.

回复收藏 0 原文

~没有更多了~