acrobat 如何对作为便笺添加到 pdf 的注释进行编码?
多年来,我们一直在通过我们的应用程序中的 activex 控件读取和写入 pdf 粘滞便笺/注释/注释。 我们最近升级到了支持 Unicode 的 Delphi2009。 以下是导致问题的原因。
当我们调用
CAcroPDAnnot.GetContents
时,结果似乎相当奇怪,我们丢失了 Unicode 字符。 它不像保存为 ansi 字符串,通常会导致返回 ?????? 相反,我们会得到一个字符串,例如
‚És‚“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç
对于日语字符的字符串。
但是,如果我通过 pdf 本身中的菜单将 pdf 中的注释保存到数据文件中,则会将其写入文件,如
0kˆL0Oeå0k~ª0'0r0D0_0‰
后者可以导出并重新导入到 acrobat pdf 中,并将重新创建正确的文件统一码字符。 但是,一旦我在代码中调用 CAcroPDAnnot.GetContents,它就会以其他形式返回。
- CAcroPDAnnot.GetContents 是否损坏?
- 有我应该注意的编码方案吗?
- 我可以做其他选择吗?
谢谢
We have been reading and writing Sticky Notes/Annotations/Comments to pdfs via an activex control in our application for a number of years. We have recently upgraded to Delphi2009 with Unicode Support. The following is causing problems.
When we call
CAcroPDAnnot.GetContents
The results seem to be rather strange and we lose our Unicode Chars. It is not like saving as an ansi string which would usually result in returning ????? instead we get a string such as
‚És‚“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç
For a string of Japanese characters.
However if I save the comments in the pdf to a datafile via the menu in the pdf itself it is written to file as something like
0kˆL0Oeå0k˜¨ª0’0r0D0_0‰
The latter can be export and reimported into an acrobat pdf and will recreate the correct unicode characters. However once I call CAcroPDAnnot.GetContents in my code it is coming back as something else.
- Is CAcroPDAnnot.GetContents broken?
- Is there an encoding scheme I should be aware of?
- Is there an alternative I might be able to do?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这就是字符串:
に行く日に风邪をひいたら
,CP-932 又名 Shift-JIS 编码,一种可怕但遗憾的是仍然流行的编码日本编码。
您当前将其解释为 CP-1252(Windows 西欧)。 如果您的 PDF 阅读组件无法自动转换,您需要找到一种方法来检测文档的编码并手动转换。
我不知道Delphi提供了什么读取编码,但是你有没有在Windows中安装Shift-JIS的编码,从控制面板-> 区域选项 -> “安装东亚语言文件”选项? 如果没有,这也许可以解释为什么它无法自动转换。
That's the string:
に行く日に風邪をひいたら
in CP-932 aka Shift-JIS encoding, an awful but lamentably still-popular encoding in Japan.
You're currently interpreting it in as CP-1252 (Windows Western European). If your PDF-reading component won't convert it for you automatically, you'll need to find a way to detect what encoding the document is in and convert it manually.
I don't know what Delphi provides for reading encodings, but have you got the encodings for Shift-JIS installed in Windows, from the Control Panel -> Regional Options -> "Install files for East Asian languages" option? If not, that might explain why it'd be failing to convert automatically, perhaps.
您并没有向我们提供大量可供使用的信息。
我认为您在这里谈论的是“Acrobat.CAcroPDAnnot”类的方法 GetContents。 您使用的是哪个版本的 Acrobat? 您是否在开始使用 Delphi 2009 进行编程时切换过版本(或运行更新)?
然后:你是如何实例化该对象的? 如果使用从 DLL 生成的 *_TLB.pas 文件,您确定它仍然匹配吗? (如果不确定,请尝试重新生成它)。
第三:你如何调用该方法? 您将结果分配给什么类型的变量?
如果您可以提供注释示例(最好包括非 ASCII 字符),也可能会有所帮助; 对于该注释:
(*最好是(ansi/wide)字符串的十六进制字节码;但 Ctrl-F7 检查器的输出应该可以)
然后也许有人可以提供更有意义的答案。
You're not exactly giving us a lot of information to work with.
I take it you're talking about the "Acrobat.CAcroPDAnnot" class' method GetContents here. Which version of Acrobat are you using? Have you perhaps switched versions (or run an update) around the time you started programming with Delphi 2009?
Then: how did you instantiate the object? If using a *_TLB.pas file generated from the DLL, are you certain it still matches it? (Try re-generating it, if uncertain).
Third: how are you calling the method? What type of variable are you assigning the result to?
What might also help, is if you could provide a sample of an annotation (preferably including non-ASCII chars); and for that annotation:
(* preferably the HEX byte codes of the (ansi/wide)strings; but output from the Ctrl-F7 inspector should do)
Then maybe someone could provide a more meaningful answer.
好的,Delphi 2009 和早期版本之间的主要区别之一是默认字符串类型是 unicode 字符串。 这意味着,如果您使用与以前版本相同的 ActiveX 组件,则会将 unicode 字符串传递给 ascii 字符串,这通常不是一个好主意。
对于此问题有多种解决方案:
Ok, one of the main differences between Delphi 2009 and the earlier versions is that the default string type is an unicode string. That means that if you use the same ActiveX component as in previous versions, you are passing unicode strings to ascii strings and that is usually not a good idea.
There are a couple of solutions for this problem: