比 PDF 或 EPUB 更好的文件格式?
我的客户希望我们为他们的应用程序构建一个自定义文档查看器。 (它确实需要定制,因为他们需要大量特定于应用程序的功能。)
我们去年为他们构建了一个,它可以获取 PDF、生成页面图像,并使用隐藏的文本层支持图像可以选择并复制。我们在 Flex 中做到了。这是一场噩梦。 PDF 太可怕了。
今年,我们需要在 HTML 5 中构建一个具有类似要求的文档,只不过现在大多数文档都是 Word 或 HTML 格式,也就是说,它们具有可重排文本,而不是 PDF 的固定布局和字形。但他们仍然想在同一个查看器中处理 PDF。
我认为我们需要将所有文档转换为某种通用文件格式,该格式既可以处理可重排文本,也可以处理 PDF 的固定位置字形。 (每个文档可能会支持其中之一,但不会同时支持两者)。 那就太好了
<text>here's some text</text>
-- or --
<glyph letter="a" name="my_a_glyph" position="10,10"/>
<image src="my_image" position="20,20"/>
如果它是一种类似 XML 的标记语言,可以说:或者类似的东西,
。是否有任何现有的文件格式可以处理它? EPUB 不会做固定位置的文本,而 PDF 的描述方式太多了。
My client wants us to build a custom document viewer for their app. (It really, truly needs to be custom, because there are a ton of application-specific features they need.)
We built one for them last year that took PDFs, generated page images, and backed the images using a hidden layer of text that could be selected and copied. We did it in Flex. It was a nightmare. PDF is horrid.
This year, we need to build one in HTML 5 with similar requirements, except that most of the documents now are in Word or HTML, that is, they have reflowable text, instead of the fixed layout and glyphs of PDF. But they still want to do PDF in the same viewer.
I'm thinking that we need to convert all documents to some common file format that can handle both reflowable text and also the fixed-position glyphs of PDF. (Each document would probably support one or the other, but not both). It would be nice if it were an XML-like markup language that would say:
<text>here's some text</text>
-- or --
<glyph letter="a" name="my_a_glyph" position="10,10"/>
<image src="my_image" position="20,20"/>
or something like that.
Is there any existing file format out there that can handle it? EPUB won't do the fixed-position text, and PDF sucks in too many ways to describe.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我想你可以看看FB2(FictionBook 2)格式。这是一种基于 XML 的格式,专为出版书籍而设计。它包括图像,尽管我不确定它们是否可以绝对对齐。
此外,您可以简单地使用 HTML 并在需要时进行 HTML 到 PDF 的渲染(有各种组件和库用于此目的)。我没有看到(或者您没有列出)这种方式不起作用的任何原因。
I think you can look at FB2 (FictionBook 2) format . That is an XML-based format, designed for publishing books. It includes images, though I am not sure if they can be aligned absolutely.
Also, you can simply go with HTML and do HTML-to-PDF rendering when needed (there exists various components and libraries for this). I don't see (or you have not listed) any reasons why this way doesn't work.
格罗夫?也许可以构建一个宏库来根据需要对其进行自定义。
Groff/troff/nroff 是 Unix 的“run off”程序,可以输出为 postscript 或 HTML。某些 PDF 查看器内置了从 Postscript 到 PDF 的跳转功能;还有几个现有的程序,例如 pstopdf。
GROFF 有一些固定布局选项和一些类似流程的选项。有了 GROFF,几乎可以更轻松地将大部分打印输出建立在规定范围内的流动文本上。
GROFF? Maybe build a macro library to customize it, as needed.
Groff/troff/nroff, the "run off" programs of Unix, can output to postscript or HTML. The jump from postscript to PDF is built in to some PDF viewers; there are also several existing programs for it, pstopdf, for example.
GROFF has some fixed layout options and some flow-like options. With GROFF, it's almost easier to base most of the printout on flowing text, within proscribed bounds.