This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed last month.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(1)
A.nnotate.com 使用 xpdf 在服务器端将 PDF 页面转换为给定缩放级别的 PNG 图像 - 这些是在浏览器中显示的内容。
文本突出显示是通过从 PDF 中提取文本位置,然后在页面图像顶部添加透明覆盖层,并在单词顶部添加绝对定位的 html DIVS 来完成的。然后注释使用 ajax gui 将注释附加到突出显示的文本。
其他格式(MS Word、PPT 等)首先使用 openoffice 转换为 PDF,然后与 PDF 一样转换为图像和文本叠加。
我认为其他 HTML 文档网站会做类似的事情,将 PDF 渲染为 HTML(即页面图像 + 文字叠加为透明 div) - 另一种技巧是将 PDF 嵌入字体转换为 HTML5 CSS 字体,并使用绝对定位的 div 作为文本(也提取并定位图像)。
A.nnotate.com does server-side conversion of PDF pages into PNG images at a given zoom level using xpdf - these are what get displayed in the browser.
The text highlighting is done by extracting the text positions from the PDF, then adding a transparent overlay on top of the page images with absolutely positioned html DIVS on top of the words. Annotations then use an ajax gui to attach notes to highlighted text.
Other formats (MS Word, PPT etc) are first converted to PDF using openoffice, then to images and text overlays as for PDFs.
I think the other HTML document sites do something similar for rendering PDFs as HTML (i.e. page images + word overlay as transparent divs) - an alternative trick is convert the PDF embedded fonts to HTML5 CSS fonts, and use absolutely positioned divs for the text (& extract and position the images too).