PDF - 为什么页面没有标准结构元素?
PDF 规范 定义了标准结构类型,用于定义文档的结构树。据我所知,没有与页面相关的元素。以下是对元素进行分组的标准结构类型:
Document
Part
Art
Sect
Div
...and so on...
为什么此列表中没有 Page 项目?
如果你想让你的结构使用页面,应该使用什么?部分?教派?迪夫?
The PDF Spec defines standard structure types, used to define a structure tree for the document. As far as I can see, there is no element related to pages. Here are the standard structure types for grouping elements:
Document
Part
Art
Sect
Div
...and so on...
Why is there no Page item in this list?
If you want your structure to use pages, what should be used? Part? Sect? Div?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
PDF 标签的存在是为了识别元素的内容类型/含义。他们应该考虑为 PDF 提供一种“元”信息,简单地为文件中的内容提供上下文(以便可以轻松提取、转换、处理、访问内容等)。将其视为一本书的目录。仅仅因为这本书有 x 页并不意味着如果这本书的页高减半并且现在有 2x 页,内容结构就会改变。
PDF 文档结构中的页面对象已经对元素进行了分组(根据给定页面上每个元素的性质),因此在此结构中这样做会有点多余。
另外,考虑这种情况:
等...
在此示例中,第 1 节和第 2 节不能都是第 3 页的直接父级(更不用说第 1 部分跨越两个不同的页面)。此外,尝试解决此问题确实没有必要,因为此处分组的元素在实际文件格式中已经是其各自文档结构的页面节点的子节点。
PDF tags exist so that the content type / meaning of elements can be identified. They should be considering a kind of "meta" information for the PDF, simply providing context for the content in a file (so that content can be easily extracted, converted, processed, accessible, etc.). Think of it as a table of contents to a book. Just because the book has x pages doesn't mean that the content structure would be altered if the book's page height was cut in half and now had 2x pages in it.
A Page Object in the PDF Document Structure already groups elements (by nature of each element being on a given page), so doing so in this structure would be a little redundant.
Also, consider this case:
etc...
In this example, Section 1 and Section 2 couldn't both be direct parents of page 3 (not to mention that Section 1 spans two different pages). Additionally, trying to solve this problem really isn't necessary because the elements which is being grouped here is already each a child of its respective Document Structure's Page node in the actual file format.
PDF 规范的附录 G 提供了演示 Page 对象使用的示例。
Appendix G of the PDF Specification gives examples that demonstrate use of the Page object.
PDF 具有树结构(这使得它能够如此快速地加载任何页面)。内容没有任何结构,除非您选择使用格式的标记内容功能,然后允许元数据包含在数据中。
The PDF has a tree structure (which is what allows it to load any page so fast). The content does not have any structure unless you choose to use the marked content feature of the format which then allows metadata to be include in the data.