为什么 PDF 中的文本突出显示如此时髦?

发布于 2024-12-04 01:43:37 字数 1459 浏览 1 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

jJeQQOZ5 2024-12-11 01:43:37

这取决于 PDF 的生成方式。有些程序会生成整行文本,并且这些程序很容易突出显示文本,因为 PDF 查看器(Acrobat 等)知道文本是线性的。其他程序实际上是一一写出每个字形(字母)。基本上是“在位置(100,100)绘制字母a,在(110,99.99)绘制字母b”。在此示例中,您将看到字母 b 在 x 方向上多了 10 个单位,并且在 y 方向上几乎完全相同。几乎。从视觉上看,它们看起来完全一样。从数学上讲,程序必须猜测它们在同一条“线上”或“彼此相邻”。有时它是正确的,有时则不然。

为什么程序会一个字母一个字母地写出内容?当使用高级格式(字母间距、字距调整、连字等)时,一些设计程序决定写出某些内容应该是什么样子,而不是它实际是什么。图形设计师通常不关心文件中两个字母在物理上彼此相邻,他们只关心它们看起来是否相似。

为什么有些程序在相邻书写字母时会弄乱 y 坐标?请记住,他们试图解释文本应该如何显示,而不是实际文本是什么。所有字体都有不同的高度,一些设计程序可能会调整位置(稍微调整),以便文本在视觉上更好地对齐。

最后,不能保证文本在文件中从左到右或从上到下线性写入。有些程序可能会写第 1 行,然后是第 3 行,然后是第 2 行。显示时看起来没问题,但在文件中却不相同。他们为什么要这样做?谁知道呢。也许第二行稍微缩进了一点(或者使用了导致墨水位置稍微缩进的字母),因此从左到右的扫描没有立即捕获它。

希望这能有所帮助!

It depends on how the PDF was generated. Some programs generate full lines of text and these are easy to highlight the text because the PDF viewer (Acrobat, etc) knows that the text is linear. Other programs actually write out each glyph (letter) one-by-one. Basically "draw letter a at position (100,100), letter b at (110,99.99)". In this example you'll see the letter b is 10 units more in the x direction and almost exactly the same in the y direction. Almost. Visually they look exactly the same. Mathematically the program has to guess that these are on the same "line" or are "next to each other". Sometimes it gets it right, sometimes it doesn't.

Why do programs write things out letter by letter? When advanced formatting is used (letter spacing, kerning, ligatures, etc) some design programs decide to write out how something should look rather than what it actually is. A graphic designer generally doesn't care that two letters are physically next to each other in the file, they only care about whether they look like they are.

Why do some programs mess up the y coordinate when writing letters next to each other? Remember, they're trying to explain how text should look, not what the actual text is. All fonts have different heights and some design programs might adjust the positioning (just slightly) so that text visually falls in line better.

Lastly, there's no guarantee that text is written linearly within the file, left to right or top to bottom. Some programs might write line 1, then line 3 and then line 2. It looks okay when displayed but its not the same in the file. Why do they do this? Who knows. Maybe line two was indented a bit (or used a letter that caused the position of the ink to be slightly indented) thus a left-to-right scan didn't catch it right away.

Hopefully that helps a bit!

情痴 2024-12-11 01:43:37

它与 PDF 文件中文本的表示方式有关。 PDF 格式实际上没有文本行的概念(至少在其基本形式中);它只是将字母(或者更确切地说是字形)放在页面上的特定位置。

显示 PDF 的应用程序通常必须猜测文本的阅读顺序。对于复杂的多列布局、引述等来说,这可能比听起来更困难,如果同一行上存在具有不同度量的脚注或字体,则对于“常规”文本甚至可能很困难。

某些 PDF 还将变音符号和重音字符表示为多个字形(例如,“a”上方的 ¡),在这种情况下,很难确定该字符属于哪一行。

It has to do with how text is represented in PDF files. The PDF format doesn't really have the concept of lines of text (at least in its basic form); it just puts letters (or rather glyphs) on the page at specific positions.

The application that displays the PDF often has to guess the order in which the text is supposed to be read. This can be harder than it sounds for complex multi-column layouts, pull quotes, etc. and it can even be difficult for "regular" text if there are footnotes or fonts with different metrics on the same line.

Some PDFs also represent umlauts and accented characters as multiple glyphs (e.g. a ¨ on top of an "a") in which case it can be difficult to determine to which line the character belongs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文