这个PDF文件有什么问题?

发布于 2025-01-23 16:21:09 字数 1062 浏览 0 评论 0原文

我必须处理一个由我不知道的人创建的PDF形式。为什么使用该表单创建的程序(word + pdf导出?)将“ stunde”一词分为“ s”,“ t”和“ unde” 在解码的PDF的第6909行中?这三个部分之间没有视觉中断。

/TT1 1 Tf
11.04 0 0 11.04 59.16 476.1203 Tm
(Datum)Tj
/C2_1 1 Tf
<0003>Tj
/TT1 1 Tf
(der)Tj
0.424 -1.315 Td
(Tätigkeit)Tj
-0.0022 Tc 0 11.04 -11.04 0 261.24 437.7203 Tm
[(Ve)-4.6<7267fc74>-4.2(ungssat)-4.2(z)]TJ
/C2_1 1 Tf
0 Tc <0003>Tj
/TT1 1 Tf
-0.0021 Tc 0.935 -1.315 Td
[<2880>-6.1(/)-7.2(S)0.8(t)-4.1(unde)-4.5(\))]TJ   % <<< the important line
0 Tc 11.04 0 0 11.04 340.92 468.8003 Tm
(Anlass/Art)Tj
/C2_1 1 Tf

导致

[“源代码的结果文档部分。”]

为了获取上面的源代码,我将PDF文件解码为描述在这里。我没有关于PDF文件格式的知识。

背景:我必须替换“ stunde”一词,这让我疯狂地找到了源代码中(部分)写“ stunde”的地方,因为似乎没有免费的PDF编辑器能够在没有问题的情况下使用水平文本。

学术奖金问题:是否可以将列表设置为表单字段的默认值? (可修改;每次更改列时更改。)为什么我能够将“ stunde”替换为“ einsatz”,而不会因为现在不规则的偏移而损坏PDF文件?

I have to work with a PDF form created by a person unknown to me. Why did the program with which the form was created (Word + PDF export?) split the term "Stunde" into "S", "t" and "unde" in line 6909 of the decoded PDF? There is no visual break between the three parts.

/TT1 1 Tf
11.04 0 0 11.04 59.16 476.1203 Tm
(Datum)Tj
/C2_1 1 Tf
<0003>Tj
/TT1 1 Tf
(der)Tj
0.424 -1.315 Td
(Tätigkeit)Tj
-0.0022 Tc 0 11.04 -11.04 0 261.24 437.7203 Tm
[(Ve)-4.6<7267fc74>-4.2(ungssat)-4.2(z)]TJ
/C2_1 1 Tf
0 Tc <0003>Tj
/TT1 1 Tf
-0.0021 Tc 0.935 -1.315 Td
[<2880>-6.1(/)-7.2(S)0.8(t)-4.1(unde)-4.5(\))]TJ   % <<< the important line
0 Tc 11.04 0 0 11.04 340.92 468.8003 Tm
(Anlass/Art)Tj
/C2_1 1 Tf

resulting in

[The resulting document part of the source code.]

To get the source code above, I decoded the PDF file as described here. I have no know-how concerning the PDF file format.

Background: I had to replace the word "Stunde", it drove me crazy to find the place where "Stunde" was written (in parts) within the source code, since no free PDF editor seems to be able to work with horizontal text without problems.

Academic Bonus questions: Is it possible to set the sum over a column as default value for a form field? (Modifiable; changed every time the column is changed.) Why was I able to replace "Stunde" with "Einsatz" without making the PDF file corrupt due to now irregular offsets?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

烟凡古楼 2025-01-30 16:21:10

为什么使用该表单创建的程序(word + pdf导出?)将“ stunde”一词分为“ s”,“ t”和“ unde”。

正如@gettalong 他的答案在您的情况下,这很可能是为了应用Kerning。

如果您开始研究其他一些PDF生产商的输出,您会发现,从单词分裂的单词方面,这个词的导出实际上是非常不可思议的:

  • 有些PDF生产者在明确设置文本矩阵后,单独绘制每个字符,还有
  • 一些PDF生产商,它们具有设置为零的使用字体的字符的宽度信息,并使用 tj 中的数字来相应地转发字符之间的当前文本矩阵。

而且,这并不涵盖所有要找到的变体,而不是远……

因此,

我必须替换“ stunde”一词,这让我疯狂地找到了在源代码中(部分)写“ stunde”的地方

实际上是一个相当微不足道的任务。 ..


是否可以将列作为表单字段的默认值设置为列值? (可修改;每次更改列时更改。)

如果所讨论的所有列值都存储在表单字段中,则可以使用JavaScript在更改表单后重新计算总和。仅将其用作“默认值”,您可以将其他(隐藏的)字段用于标志,是否已经触摸了该字段。不过,请当心:所有PDF观看者都不支持JavaScript。此外,PDF的JavaScript对象模型未在独立(类似于ISO)规范中指定,而是在可以解释规范有偏见的Adobe One中。


为什么我能够将“ stunde”替换为“ einsatz”,而不会因为现在不规则偏移而损坏PDF文件?

由于我们不知道您如何确切地应用这些更改,因此这显然很难说。

但是,您很可能确实损坏了PDF和您打开的PDF观众,只是修复了引擎盖下的腐败。 PDF观众在不通知用户的情况下进行此类维修的趋势很大。结果是,野外的大部分PDF实际上被打破了。

Why did the program with which the form was created (Word + PDF export?) split the term "Stunde" into "S", "t" and "unde" in line 6909 of the decoded PDF?

As @gettalong mentioned in his answer, in your case this most likely has been done to apply kerning.

If you start looking into the outputs of some other PDF producers, you'll see that this export from Word actually is very unobtrusive in regard to splitting words:

  • there are PDF producers that draw each character individually after explicitly setting the text matrix for it, and
  • there also are PDF producers that have the width information for the characters of the used fonts set to zero and use the numbers in TJ instructions to forward the current text matrix between characters accordingly.

And this doesn't cover all the variants to be found, not by far...

Thus,

I had to replace the word "Stunde", it drove me crazy to find the place where "Stunde" was written (in parts) within the source code

in your case replacing actually was a fairly trivial task...


Is it possible to set the sum over a column as default value for a form field? (Modifiable; changed every time the column is changed.)

If all the column values in question are stored in form fields, you can use JavaScript to recalculate sums after form changes. To have it serve as "default" only, you can use some other (hidden) field for a flag whether the field has already been touched. Beware, though: JavaScript is not supported by all PDF viewers. Furthermore, the JavaScript object model for PDF is not specified in an independent (like ISO) specification but in an Adobe one which can make interpretation of the specification biased.


Why was I able to replace "Stunde" with "Einsatz" without making the PDF file corrupt due to now irregular offsets?

As we don't know how exactly you applied the changes, this obviously is hard to tell.

Most likely, though, you did corrupt the PDF and the PDF viewers you opened it in merely repair the corruption under the hood. There is a strong tendency in PDF viewers to do such under-the-hood repairs without informing the user; the result is that a large part of the PDFs in the wild actually being broken.

记忆之渊 2025-01-30 16:21:10

您看不到视觉中断,但是“ S”,“ T”和“ UNDE”之间的标准距离已经改变。这是由支持Kerning的PDF作者完成的,以使单词看起来更好。这就是为什么以这种方式拆分的原因。

You don't see a visual break but the standard distance between "S", "t" and "unde" has been changed nonetheless. This is done by PDF writers that support e.g. kerning so that the word appear nicer. This is the reason why it is split that way.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文