生成 PDF
我想学习如何生成 PDF,我不想使用任何第三方工具,我想自己用代码创建它。到目前为止,我所看到的唯一示例是我通过打开第 3 方 dll 上的反射器查看的代码,以查看发生了什么。不幸的是,到目前为止我看到的dll似乎正在攻击user32.dll和gdi32.dll,以帮助创建pdf文档,我的问题是我不知道他们在做什么,更重要的是为什么?
有谁有任何好的教程或参考资料,这可能会为我指明正确的方向。
提前致谢。
I am wanting to learn how to generate a PDF, I don't want to use any third party tools, I want to create it myself in code. The only things I have seen so far as examples is code I have looked at by opening up reflector on a 3rd party dll, to see what is happening. Unfortunately the dll's I have seen so far seem to be hitting user32.dll and gdi32.dll, to help creating the pdf document, my issue is I have no idea what they are doing and more importantly why ?
Does anyone have any good tutorials or references, which may point me in the right direction.
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
发布评论
评论(6)
我们在博客上运行了一组有关创建基本 PDF 的教程,网址为 http://www.jpedal.org/PDFblog/?s=%22Make+your+own+PDF+file%22
我知道您已经说过您不想使用第三方工具,但请至少看一下 iTextSharp< /a>.除非确实有真正的原因您不能使用这样的工具,否则它应该完全符合您的要求。
对于 PDF:
你不关心吗成本有点高,而且想要最好的。那么我会推荐我 Aspose.Pdf。 NET
编辑: 我现在看到您不想使用 3d party。但我会推荐你最强!当已经有这么多的时候,这将需要而且不是很快的。即使有轮子,我们也不会制造,不是吗?
但如果你真的想花时间在这上面,我在 nfop 上进行了修复,并了解了他们是如何做到的在那里,通过阅读代码。
并阅读便携式文档格式。何时从一开始就制作这样的东西很重要,了解他们使用什么标准以及格式是如何建立的。
Adobe 提供 ISO PDF 规范的副本,可供免费下载。对于这样的东西,它将是无价的:
http ://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf
使用现成的工具或在编写自己的工具之前查看其代码之间有一个微妙的界限。如果您可以接受后者,只需选择一个不错的开源工具,例如 http://www.pdfforge.org/< /a>,然后查看代码。
警告:如果您打算分发您的工具,从开源工具中获取太多灵感可能会迫使您也将自己的工具开源。我不是律师,我不知道多少算太多。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
规格是最终的指南。以下是您最终要做的事情:
标题很简单 - 它定义文件是 PDF 和版本。
PDF 中的对象数据类型。这包括布尔、数字、字符串、列表/数组、字典和流。
对象要么直接写入,要么间接写入。
直接对象按原样编写。
间接对象是这样写的:
例如,我可以写:
每当我想在其他地方使用该字符串时,我只需要使用间接引用,其定义为:
在这种情况下,我可以将我的字符串引用为:
为了快速查找对象,有一个交叉引用表来告诉特定 ID 和世代的对象在文件中的位置。
因此,除了简单地将对象写入文件之外,您还必须跟踪定义间接对象的文件位置。
所有这些都是可行的,但是您很快就会发现,当您编写这些文件时,在输出流中进行更改并保持内容整洁将变得非常具有挑战性。更糟糕的是,其他人也这样做了,所以现在 Acrobat 设法以某种方式处理一堆垃圾 PDF。例如,GhostScript(希望这个问题已得到解决)生成的 PDF 的交叉引用表完全是垃圾 - 它们没有指出任何有用的内容。还有一些生产者通过对字典条目使用错误的数据类型或其他缺少规范所需信息的方式彻底违反了规范。
使用 PDF 简直就是一场噩梦。
尽管如此,这仍然是一个有趣的练习,但如果您想做任何重要的事情,您需要开始编写好的工具来管理您的所有间接引用以及交叉引用表和字典以及类型检查等等。最后,您会发现现有的图书馆可能会更好地为您服务。
作为使用和生成 PDF 的工具的作者,我恳求您不要让任何不合规的 PDF 泄露出去。
The spec is the ultimate guide. Here is what you will ultimately have to do:
The header is easy - it defines that the file is PDF and the version.
Objects data types in PDF. This includes bool, number, string, list/array, dictionary and stream.
Objects are either written directly or indirectly.
Direct objects are written as is.
Indirect objects are written like this:
For example, I could write:
And whenever I want to use that string elsewhere, I just have to use an indirect reference, which is defined as:
in this case, I could refer to my string as:
To quickly find an object, there is a cross reference table that tells where an object of a particular id and generation lives in the file.
So, in addition to simply writing objects to the file, you have to keep track of the file position where indirect objects have been defined.
All of this is doable, but you're going to quickly find that as you write these files that it's going to become really challenging to make changes in your output stream and keep things neat and tidy. What's worse, is that other people have done this too, so now there are a pile of garbage PDFs out in the wild that Acrobat manages to cope with somehow. For example, GhostScript (hopefully this is fixed), produced PDFs whose cross-reference tables were complete garbage - they pointed at nothing useful. Then there are producers that out and out violate the spec by using the wrong data type for dictionary entries or others that have spec-required information missing.
It's fairly nightmarish to consume PDF.
Still, it's an interesting exercise, but if you want to do anything significant, you need to start writing good tools that manage all the indirect references for you and the cross reference tables and dictionaries and type checking and so on and so forth. In the end, you'll find that maybe an existing library would serve you better.
And being the author of tools that consume and generate PDF, I will plead that you don't let any of your non-compliant PDFs out into the wild.