acrobat 如何对作为便笺添加到 pdf 的注释进行编码?

发布于 2024-07-11 01:04:57 字数 606 浏览 7 评论 0原文

多年来,我们一直在通过我们的应用程序中的 activex 控件读取和写入 pdf 粘滞便笺/注释/注释。 我们最近升级到了支持 Unicode 的 Delphi2009。 以下是导致问题的原因。

当我们调用

CAcroPDAnnot.GetContents

时,结果似乎相当奇怪,我们丢失了 Unicode 字符。 它不像保存为 ansi 字符串,通常会导致返回 ?????? 相反,我们会得到一个字符串,例如

‚És‚“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç

对于日语字符的字符串。

但是,如果我通过 pdf 本身中的菜单将 pdf 中的注释保存到数据文件中,则会将其写入文件,如

0kˆL0Oeå0k~ª0'0r0D0_0‰

后者可以导出并重新导入到 acrobat pdf 中,并将重新创建正确的文件统一码字符。 但是,一旦我在代码中调用 CAcroPDAnnot.GetContents,它就会以其他形式返回。

  1. CAcroPDAnnot.GetContents 是否损坏?
  2. 有我应该注意的编码方案吗?
  3. 我可以做其他选择吗?

谢谢

We have been reading and writing Sticky Notes/Annotations/Comments to pdfs via an activex control in our application for a number of years. We have recently upgraded to Delphi2009 with Unicode Support. The following is causing problems.

When we call

CAcroPDAnnot.GetContents

The results seem to be rather strange and we lose our Unicode Chars. It is not like saving as an ansi string which would usually result in returning ????? instead we get a string such as

‚És‚­“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç

For a string of Japanese characters.

However if I save the comments in the pdf to a datafile via the menu in the pdf itself it is written to file as something like

0kˆL0Oeå0k˜¨ª0’0r0D0_0‰

The latter can be export and reimported into an acrobat pdf and will recreate the correct unicode characters. However once I call CAcroPDAnnot.GetContents in my code it is coming back as something else.

  1. Is CAcroPDAnnot.GetContents broken?
  2. Is there an encoding scheme I should be aware of?
  3. Is there an alternative I might be able to do?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

北笙凉宸 2024-07-18 01:04:57

‚És‚‚ú‚É•—Ž×‚ð‚Ђ‚‚½‚ç

这就是字符串:

に行く日に风邪をひいたら

,CP-932 又名 Shift-JIS 编码,一种可怕但遗憾的是仍然流行的编码日本编码。

您当前将其解释为 CP-1252(Windows 西欧)。 如果您的 PDF 阅读组件无法自动转换,您需要找到一种方法来检测文档的编码并手动转换。

我不知道Delphi提供了什么读取编码,但是你有没有在Windows中安装Shift-JIS的编码,从控制面板-> 区域选项 -> “安装东亚语言文件”选项? 如果没有,这也许可以解释为什么它无法自动转换。

‚És‚­“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç

That's the string:

に行く日に風邪をひいたら

in CP-932 aka Shift-JIS encoding, an awful but lamentably still-popular encoding in Japan.

You're currently interpreting it in as CP-1252 (Windows Western European). If your PDF-reading component won't convert it for you automatically, you'll need to find a way to detect what encoding the document is in and convert it manually.

I don't know what Delphi provides for reading encodings, but have you got the encodings for Shift-JIS installed in Windows, from the Control Panel -> Regional Options -> "Install files for East Asian languages" option? If not, that might explain why it'd be failing to convert automatically, perhaps.

两仪 2024-07-18 01:04:57

您并没有向我们提供大量可供使用的信息。

我认为您在这里谈论的是“Acrobat.CAcroPDAnnot”类的方法 GetContents。 您使用的是哪个版本的 Acrobat? 您是否在开始使用 Delphi 2009 进行编程时切换过版本(或运行更新)?

然后:你是如何实例化该对象的? 如果使用从 DLL 生成的 *_TLB.pas 文件,您确定它仍然匹配吗? (如果不确定,请尝试重新生成它)。

第三:你如何调用该方法? 您将结果分配给什么类型的变量?

如果您可以提供注释示例(最好包括非 ASCII 字符),也可能会有所帮助; 对于该注释:

  • 应该是什么样子(以及它在Reader中的样子)
  • 使用2009年之前的Delphi版本时返回什么*
  • 使用Delphi 2009*时返回什么

(*最好是(ansi/wide)字符串的十六进制字节码;但 Ctrl-F7 检查器的输出应该可以)

然后也许有人可以提供更有意义的答案。

You're not exactly giving us a lot of information to work with.

I take it you're talking about the "Acrobat.CAcroPDAnnot" class' method GetContents here. Which version of Acrobat are you using? Have you perhaps switched versions (or run an update) around the time you started programming with Delphi 2009?

Then: how did you instantiate the object? If using a *_TLB.pas file generated from the DLL, are you certain it still matches it? (Try re-generating it, if uncertain).

Third: how are you calling the method? What type of variable are you assigning the result to?

What might also help, is if you could provide a sample of an annotation (preferably including non-ASCII chars); and for that annotation:

  • what it should look like (and what it does look like inside Reader)
  • what it returns when using a pre-2009 version of Delphi*
  • what it returns when using Delphi 2009*

(* preferably the HEX byte codes of the (ansi/wide)strings; but output from the Ctrl-F7 inspector should do)

Then maybe someone could provide a more meaningful answer.

空宴 2024-07-18 01:04:57

好的,Delphi 2009 和早期版本之间的主要区别之一是默认字符串类型是 unicode 字符串。 这意味着,如果您使用与以前版本相同的 ActiveX 组件,则会将 unicode 字符串传递给 ascii 字符串,这通常不是一个好主意。

对于此问题有多种解决方案:

  • 尝试是否可以升级您的 activeX 组件,以便它支持完整的 unicode 字符串。
  • 使用 AnsiString 而不是 string 与 activeX 组件通信。 在这种情况下,您仍然可以使用旧界面,但仍然受到相同的限制。
  • 使用其他创建 pdf 的控件。 有很多东西需要找到,但要准备好更改大部分软件。 (某些控件基于 XML 并使用编码。)

Ok, one of the main differences between Delphi 2009 and the earlier versions is that the default string type is an unicode string. That means that if you use the same ActiveX component as in previous versions, you are passing unicode strings to ascii strings and that is usually not a good idea.

There are a couple of solutions for this problem:

  • Try if you can upgrade your activeX component so that it supports full unicode strings.
  • Use AnsiString and not string to communicate with the activeX component. In this case, you can still use the old interface, but you are still bound to the same limitations.
  • Use an other control that creates pdf. There is a lot to find, but be prepared to change a big chunk of your software. (Some controls are XML based and use encoding. )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文