这个PDF文件有什么问题?
我必须处理一个由我不知道的人创建的PDF形式。为什么使用该表单创建的程序(word + pdf导出?)将“ stunde”一词分为“ s”,“ t”和“ unde” 在解码的PDF的第6909行中?这三个部分之间没有视觉中断。
/TT1 1 Tf
11.04 0 0 11.04 59.16 476.1203 Tm
(Datum)Tj
/C2_1 1 Tf
<0003>Tj
/TT1 1 Tf
(der)Tj
0.424 -1.315 Td
(Tätigkeit)Tj
-0.0022 Tc 0 11.04 -11.04 0 261.24 437.7203 Tm
[(Ve)-4.6<7267fc74>-4.2(ungssat)-4.2(z)]TJ
/C2_1 1 Tf
0 Tc <0003>Tj
/TT1 1 Tf
-0.0021 Tc 0.935 -1.315 Td
[<2880>-6.1(/)-7.2(S)0.8(t)-4.1(unde)-4.5(\))]TJ % <<< the important line
0 Tc 11.04 0 0 11.04 340.92 468.8003 Tm
(Anlass/Art)Tj
/C2_1 1 Tf
导致
[]
为了获取上面的源代码,我将PDF文件解码为描述在这里。我没有关于PDF文件格式的知识。
背景:我必须替换“ stunde”一词,这让我疯狂地找到了源代码中(部分)写“ stunde”的地方,因为似乎没有免费的PDF编辑器能够在没有问题的情况下使用水平文本。
学术奖金问题:是否可以将列表设置为表单字段的默认值? (可修改;每次更改列时更改。)为什么我能够将“ stunde”替换为“ einsatz”,而不会因为现在不规则的偏移而损坏PDF文件?
I have to work with a PDF form created by a person unknown to me. Why did the program with which the form was created (Word + PDF export?) split the term "Stunde" into "S", "t" and "unde" in line 6909 of the decoded PDF? There is no visual break between the three parts.
/TT1 1 Tf
11.04 0 0 11.04 59.16 476.1203 Tm
(Datum)Tj
/C2_1 1 Tf
<0003>Tj
/TT1 1 Tf
(der)Tj
0.424 -1.315 Td
(Tätigkeit)Tj
-0.0022 Tc 0 11.04 -11.04 0 261.24 437.7203 Tm
[(Ve)-4.6<7267fc74>-4.2(ungssat)-4.2(z)]TJ
/C2_1 1 Tf
0 Tc <0003>Tj
/TT1 1 Tf
-0.0021 Tc 0.935 -1.315 Td
[<2880>-6.1(/)-7.2(S)0.8(t)-4.1(unde)-4.5(\))]TJ % <<< the important line
0 Tc 11.04 0 0 11.04 340.92 468.8003 Tm
(Anlass/Art)Tj
/C2_1 1 Tf
resulting in
[]
To get the source code above, I decoded the PDF file as described here. I have no know-how concerning the PDF file format.
Background: I had to replace the word "Stunde", it drove me crazy to find the place where "Stunde" was written (in parts) within the source code, since no free PDF editor seems to be able to work with horizontal text without problems.
Academic Bonus questions: Is it possible to set the sum over a column as default value for a form field? (Modifiable; changed every time the column is changed.) Why was I able to replace "Stunde" with "Einsatz" without making the PDF file corrupt due to now irregular offsets?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如@gettalong 他的答案在您的情况下,这很可能是为了应用Kerning。
如果您开始研究其他一些PDF生产商的输出,您会发现,从单词分裂的单词方面,这个词的导出实际上是非常不可思议的:
而且,这并不涵盖所有要找到的变体,而不是远……
因此,
实际上是一个相当微不足道的任务。 ..
如果所讨论的所有列值都存储在表单字段中,则可以使用JavaScript在更改表单后重新计算总和。仅将其用作“默认值”,您可以将其他(隐藏的)字段用于标志,是否已经触摸了该字段。不过,请当心:所有PDF观看者都不支持JavaScript。此外,PDF的JavaScript对象模型未在独立(类似于ISO)规范中指定,而是在可以解释规范有偏见的Adobe One中。
由于我们不知道您如何确切地应用这些更改,因此这显然很难说。
但是,您很可能确实损坏了PDF和您打开的PDF观众,只是修复了引擎盖下的腐败。 PDF观众在不通知用户的情况下进行此类维修的趋势很大。结果是,野外的大部分PDF实际上被打破了。
As @gettalong mentioned in his answer, in your case this most likely has been done to apply kerning.
If you start looking into the outputs of some other PDF producers, you'll see that this export from Word actually is very unobtrusive in regard to splitting words:
And this doesn't cover all the variants to be found, not by far...
Thus,
in your case replacing actually was a fairly trivial task...
If all the column values in question are stored in form fields, you can use JavaScript to recalculate sums after form changes. To have it serve as "default" only, you can use some other (hidden) field for a flag whether the field has already been touched. Beware, though: JavaScript is not supported by all PDF viewers. Furthermore, the JavaScript object model for PDF is not specified in an independent (like ISO) specification but in an Adobe one which can make interpretation of the specification biased.
As we don't know how exactly you applied the changes, this obviously is hard to tell.
Most likely, though, you did corrupt the PDF and the PDF viewers you opened it in merely repair the corruption under the hood. There is a strong tendency in PDF viewers to do such under-the-hood repairs without informing the user; the result is that a large part of the PDFs in the wild actually being broken.
您看不到视觉中断,但是“ S”,“ T”和“ UNDE”之间的标准距离已经改变。这是由支持Kerning的PDF作者完成的,以使单词看起来更好。这就是为什么以这种方式拆分的原因。
You don't see a visual break but the standard distance between "S", "t" and "unde" has been changed nonetheless. This is done by PDF writers that support e.g. kerning so that the word appear nicer. This is the reason why it is split that way.