在 PHP 中使用 utf-8 字符将 FDF / XFDF 表单展平为 PDF
我的场景:
- 带有表单字段的 PDF 模板: template.pdf
- 包含要填写的数据的 XFDF 文件: fieldData.xfdf
现在我需要将这些文件组合起来并保存到文件中。压扁了。 pdftk 在 php 中轻松完成这项工作:
exec("pdftk template.pdf fill_form fieldData.xfdf output flatFile.pdf flatten");
不幸的是,这不适用于完整的 utf-8 支持。 例如:西里尔字母和希腊字母被打乱。我使用 Arial 来实现此目的,并使用 unicode 字符集。
- 我怎样才能扁平化我的 unicode 文件?
- 有没有其他提供 unicode 支持的 pdf 工具?
- pdftk 有我缺少的 unicode 开关吗?
编辑 1:由于这个问题已经超过 9 个月没有得到解决,我决定开始悬赏它。如果有选项可以赞助 pdftk 中的功能或错误修复,我很乐意捐赠。
编辑2:我不再从事这个项目了,所以我无法验证新的答案。如果有人有类似的问题,我很高兴他们能以我的方式做出回应。
My scenario:
- A PDF template with formfields: template.pdf
- An XFDF file that contains the data to be filled in: fieldData.xfdf
Now I need to have these to files combined & flattened.
pdftk does the job easily within php:
exec("pdftk template.pdf fill_form fieldData.xfdf output flatFile.pdf flatten");
Unfortunately this does not work with full utf-8 support.
For example: Cyrillic and greek letters get scrambled. I used Arial for this, with an unicode character set.
- How can I accomplish to flatten my unicode files?
- Is there any other pdf tool that offers unicode support?
- Does pdftk have an unicode switch that I am missing?
EDIT 1: As this question has not been solved for more then 9 month, I decided to start a bounty for it. In case there are options to sponsor a feature or a bugfix in pdftk, I'd be glad to donate.
EDIT 2: I am not working on this project anymore, so I cannot verify new answers. If anyone has a similar problem, I am glad if they can respond in my favour.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(13)
我发现通过使用 Jon 的模板,但使用 DomDocument,数字编码已为我处理并且运行良好。我的轻微变化如下:
I found by using Jon's template but using the DomDocument the numeric encoding was handled for me and worked well. My slight variation is below:
您可以尝试 http://www.adobe.com/products/livecycle/designer 的试用版/ 并查看它生成的 PDF 文件。
您可以尝试的另一个商业软件是http://www.appligent.com/fdfmerge。请参阅 http://146.145.110.1/docs/userguide/FDFMergeUserGuide.pdf 中的第 16 页了解它如何使用 UTF-8 处理 xFDF。
我还查看了 FDF 规范 http://partners。 adobe.com/public/developer/en/xml/xfdf_2.0.pdf
第 12 页上指出:
我浏览了 pdftk-1.44-dist/java/com/lowagie/text/pdf/XfdfReader.java。它似乎对输入没有做任何特别的事情。
当您将奇怪的字符编码为 xFDF 输入中的字符引用时,也许 pdftk 会执行您想要的操作。
You could try the trial version of http://www.adobe.com/products/livecycle/designer/ and see what PDF files it generates.
Another commercial software you could try is http://www.appligent.com/fdfmerge. See page 16 in http://146.145.110.1/docs/userguide/FDFMergeUserGuide.pdf for how it handles xFDF with UTF-8.
I also had a look at the FDF specification http://partners.adobe.com/public/developer/en/xml/xfdf_2.0.pdf
On page 12 it states:
I looked through pdftk-1.44-dist/java/com/lowagie/text/pdf/XfdfReader.java. It doesn't seem to do anything special with the input.
Maybe pdftk will do what you want, when you encode the weird characters as character references in your xFDF input.
在 Win7 机器上使用 pdftk 1.44 我遇到了与 xfdf 文件相同的问题,而 fdf 工作正常。我制作了一个没有任何特殊字符(仅 ANSI)的 xfdf 文件,但 pdftk 再次崩溃。我给开发商发邮件了。不幸的是直到现在还没有答案。
Using the pdftk 1.44 on a Win7 machine I encounter the same problems with xfdf-files whereas fdf works fine. I made a xfdf-file without any special characters (only ANSI) but pdftk crashed again. I mailed the developper. Unfortunately no answer until now.
不幸的是,UTF-8 字符编码不适用于源 .xfdf 文件中非 ASCII 字符的十进制或十六进制引用。 PDFTK v.1.44。
Unfortunately, UTF-8 character encoding does not work neither with decimal nor hexadecimal references of non-ASCII characters in source .xfdf file. PDFTK v. 1.44.
我在这方面取得了一些进展。从 http://koivi.com/fill-pdf-form-fields/,我修改了值编码以输出 ASCII 范围之外的任何字符的数字代码。
现在使用pitulski的特殊字符串:
Poznań Śródmieście Ćwiartka Ósma
输出Pozna ródmiecie wiartka Ósma
并叠加一些盒子形状ęóąśłżźćńĘÓĄŚŁŻŹĆŃ
输出óÓ
> 有更多的盒子形状。我认为盒子形状可能是我的服务器无法识别的字符。我尝试使用一些法语字符:
ùûüÿ€'“”»àâæçéèêëïôœÙÛÜŸÀÂÆÇÉÈÊËÏÎÔ
,它们都显示正常,但其中一些是重叠的。--编辑-- 我只是尝试将这些手动输入到表单中,并得到相同的结果减去框形状(使用 Evince)。然后我尝试使用不同的表单(由其他人创建) - 输入
ęóąśłżźćńĘÓĄŚŁŻŹĆŃ
后,显示ółÓŁ
。看起来这取决于文档的嵌入字体中包含哪些字符。I made some progress on this. Starting with code from http://koivi.com/fill-pdf-form-fields/, I modified the value encoding to output numeric codes for any characters outside the ascii range.
Now with pitulski's special strings:
Poznań Śródmieście Ćwiartka Ósma
outputsPozna ródmiecie wiartka Ósma
with some box shapes superimposedęóąśłżźćńĘÓĄŚŁŻŹĆŃ
outputsóÓ
with more box shapes. I think it may be that the box shapes are characters my server doesn't recognize.I tried it with some French characters:
ùûüÿ€’“”«»àâæçéèêëïôœÙÛÜŸÀÂÆÇÉÈÊËÏÎÔ
and they all came out OK, but some of them were overlapping.--edit-- I just tried entering these manually into the form and got the same result minus the box shapes (using Evince). I then tried with a different form (created by someone else) - after entering
ęóąśłżźćńĘÓĄŚŁŻŹĆŃ
,ółÓŁ
was displayed. It looks like it depends which characters are included in the document's embedded fonts.虽然 pdftk 似乎不支持 FDF 文件中的 UTF-8,但我发现在
将该 FDF 文件转换为 ISO-Latin-1 的管道中,那么至少 Latin-1 代码页中的那些字符仍然会存在得到适当的代表。
While pdftk doesn't appear to support UTF-8 in the FDF file, I found that with
in the pipeline converting that FDF file to ISO-Latin-1, then at least those characters that are in the Latin-1 code page will still be represented properly.
这个问题我也找了好久了,终于找到解决办法了!
那么,让我们开始吧。
pdftk
Helvetica
,下载此字体。flatten
选项填写表单xfdf.xml 示例:
享受 :)
I have been solving this issue for a long time, and finally I have found the solution!
so, let's start.
pdftk
Helvetica
, download this font.flatten
optionxfdf.xml example:
Enjoy :)
PDFTK 的版本是什么?
我对波兰字符(utf-8)尝试了同样的操作。
对我不起作用。
pdftk.exe、libiconv2.dll 来自: http://www.pdflabs.com/docs/install -pdftk/
Windows 7,cmd,file.pdf + file.fdf -> new.pdf
pdftk file.pdf fill_form file.xfdf output new.pdf flatten
但是,对于具有相同内容的 FDF 文件,它可以正常工作。
但是new.PDF中的字符很糟糕。
pdftk file.pdf fill_form file.fdf 输出 new.pdf 压平
---FDF---
---XFDF---
---PDF---
What PDFTK's version?
I tried the same thing with Polish characters (utf-8).
Does not work for me.
pdftk.exe, libiconv2.dll from: http://www.pdflabs.com/docs/install-pdftk/
Windows 7, cmd, file.pdf + file.fdf -> new.pdf
pdftk file.pdf fill_form file.xfdf output new.pdf flatten
But, with FDF file, with the same content, it worked properly.
But the characters in new.PDF are bad.
pdftk file.pdf fill_form file.fdf output new.pdf flatten
---FDF---
---XFDF---
---PDF---
您可以通过使用 \ddd 给出八进制的 unicode 代码来引入 utf-8 字符
You can introduce utf-8 characters by giving their unicode code in octal with \ddd
为了解决这个问题,我编写了 PdfFormFillerUTF-8: http://sourceforge.net/projects/pdfformfiller2/
To solve this, I wrote PdfFormFillerUTF-8: http://sourceforge.net/projects/pdfformfiller2/
的直接替代品
有一个 pdftk 工具Mcpdf :https://github。 com/m-click/mcpdf
解决填写表单时的 unicode 问题。适用于我的 CP1250 角色(中欧)。
来自项目页面:
请注意,您需要安装 JRE。
There is a drop-in replacement for pdftk tool
Mcpdf: https://github.com/m-click/mcpdf
that solves unicode issues when filling forms. Works for me with CP1250 characters (Central Europe).
From project page:
Note that you need to have JRE installed.
我通过使用 utf-8 编码创建 xfdf 文件,设法使其与 pdftk 一起使用。
经过多次尝试,但使其按预期工作的是添加“need_appearances”,
这里是一个示例:
I have managed to make it work with pdftk by creating a xfdf file with utf-8 encoding.
it took several tried but what make it work as exepcted was to add 'need_appearances'
here is an example:
pdftk 支持 UTF-16BE 编码。从 UTF-8 转换为 UTF-16BE 并不困难。
请参阅:使用 PDFTk 填充 PDF 时出现奇怪的字符
pdftk supports encoding in UTF-16BE. It's not that difficult to convert from UTF-8 to UTF-16BE.
See: Weird characters when filling PDF with PDFTk