PDF 文本方向
从右到左的语言(如阿拉伯语)的文本方向如何在 PDF 中编码?我的理解是,由于 PDF 本质上是一种图形格式,因此文本方向的概念不需要真正进行编码。相反,这些字形只需从右到左在屏幕上绘制即可。但是,PDF 参考手册提到了一个名为 WritingMode
的属性,您可以在其中指定从左到右、从右到左和从上到下、从下到上的组合。
所以我的问题是:
(1)如果我的理解是正确的,并且 RTL 或 LTR 只是通过在屏幕上绘制字形的方式来表达,那么 WritingMode
属性的意义是什么?
(2) 如果 PDF 文件中没有编码实际的方向性信息,除了字形绘制的顺序之外,PDF 到文本程序如何知道给定的行是否应该从右到左阅读或从左到右? (我想 PDF 程序可以只检查从 ToUnicode
映射中提取的 Unicode 代码点是否落入与 RTL 语言相对应的范围内。)
How is text direction for right-to-left languages, like Arabic, encoded in PDF? My understanding is that since PDF is fundamentally a graphical format, the concept of text-direction doesn't need to really be encoded. Rather, the glyphs simply need to be painted on-screen from right to left. However, the PDF reference manual mentions an attribute called WritingMode
, where you can specify combinations left-to-right, right-to-left and top-to-bottom, bottom-to-top.
So my questions is:
(1) If my understanding is correct, and RTL or LTR is merely expressed by the way the glyphs are painted on-screen, what is the point of the WritingMode
attribute?
(2) If there is no actual directionality information encoded in the PDF file, other than the order the glyphs are painted, how does a PDF-to-Text program know if a given line is supposed to be read right-to-left or left-to-right? (I suppose the PDF program could just check if the Unicode codepoints extracted from a ToUnicode
map fall into a range that corresponds to an RTL language.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果我正确阅读规范,WritingMode 仅适用于带标签的 PDF。如果 PDF 不包含适当的逻辑结构,您将无法获得 WriteMode。
据我了解,一般答案是“视情况而定”。在 RL 写作中,您可能会将文本提前信息编码在字体中,并且单个文本位置会将文本提前到正确的位置。我说“可能”是因为实际的生成软件可能会忽略这一点并将每个字形单独放置,而不管字体中的文本前进如何。然后你会得到一些有趣的语言,比如阿拉伯语和希伯来语,它们并不是严格意义上的强化学习,因为在强化学习行中数字仍然是 LR。
WritingMode is only for Tagged PDF, if I'm reading the spec correctly. If a PDF doesn't contain the appropriate logical structure, you don't get WritingMode.
The general answer, as I understand it, is "it depends". In R-L writing, you probably have the text advance info encoded in the font and a single text placement will advance the text to the right place. I say 'probably' because it might be that the actual generation software ignores this and places each glyph on its own, regardless of the text advance in the font. Then you get fun languages like Arabic and Hebrew which aren't strictly R-L, as numbers are still L-R within a R-L line.
文本方向将在 Trm 中设置
Text direction will be set in the Trm