从 PDF 中提取文本并将其保存到数据库 - 保留间距
我有一个仅包含文本的 PDF 文档,需要将其保存到 MSSQL 中的 varchar 列中。第一个问题是 PDF 中文本的间距也需要保留,而这不能简单地通过从 PDF 复制粘贴到 SSMS 来完成。
好的,所以我需要一个应用程序来将 PDF 作为文本读取,同时保留间距。但现在出现了第二个问题:PDF 以 Helvetica 字体呈现,但保存到数据库中的文本将以 Arial 形式显示在 Crystal Report(Crystal 8...bleh)上,并且显示时,它需要看起来像尽可能使用PDF(即相同的对齐方式)。
我提出的解决方案是将 PDF 转换为矢量图像,将生成的字节流保存到数据库中,并通过 Crystal 将字节拉入。不幸的是,由于时间限制,现在无法实现,所以我需要一个快速而肮脏的解决方案。
本质上,一旦我从 PDF 中获得了 Helvetica 版本,我就必须修改间距以将其转换为在 Arial 中看起来正确。我需要一个可以为我完成此操作的工具,因为我没有时间编写一个工具 - 有什么建议吗?
I have a PDF document containing only text that needs to be saved into a varchar column in MSSQL. The first catch is that the spacing of the text in the PDF needs to be preserved as well, which can't be done simply by copy-pasting from the PDF into SSMS.
Okay, so I need an application to read the PDF as text, while preserving spacing. But now the second catch comes in: the PDF is rendered in Helvetica font, but the text saved into the DB will be displayed in Arial on a Crystal Report (Crystal 8... bleh), and when displayed, it needs to look like the PDF (i.e. same alignment) as far as possible.
The solution that I've proposed is to convert the PDF to a vector image, save the resulting byte stream into the DB, and pull the bytes in through via Crystal. Unfortunately, due to time constraints this can't be implemented now, so I need a quick-and-dirty solution.
Essentially, once I've got the Helvetica version from the PDF, I have to muck around with the spacing to convert it to look correct in Arial. I need a tool that can do this for me, as I don't have the time to write one - any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
恐怕这是一个用户教育问题:Arial 字体的输出与 Helvetica 字体的输出的间距不同。这需要向用户解释。
对 Rathergate 的引用 - http://en.wikipedia.org/wiki/Rathergate - 可能帮助说服他们;本质上,丹·拉瑟的职业生涯结束了,因为他不理解不同字体中字符间距的重要性。 (/过度简化)
另一种方法可能是使用字体编辑器,保存具有 Helvetica 间距属性的 Arial 字体版本,然后在报告中使用这种新字体 - 这确实是一个拼凑,它看起来很糟糕并且可能很好地侵犯了字体的版权(大概是微软拥有的)。我真的不会推荐它。
I'm afraid that this is a user-education problem: output in Arial font is spaced differently to output in Helvetica font. This needs to be explained to the users.
A reference to Rathergate - http://en.wikipedia.org/wiki/Rathergate - may help convince them; essentially, Dan Rather's career was ended because he didn't understand the significance of character spacing in different fonts. (/over-simplification)
An alternative might be to use a font editor, to save a version of Arial font that has Helvetica spacing properties, then use this new font in your report - this really is a kludge, it will look terrible and may well violate the font's copyright (presumably Microsoft-owned). I really wouldn't recommend it.
您的 Crystal 版本可以处理动态图像位置吗?如果是这样,您可以保存 PDF 的图像(我确信某处有一个实用程序),然后在 Crystal Report 中创建一个图像对象,并将图像位置设置为您想要的任何 PDF。
Does your version of Crystal handle dynamic image locations? If so, you could save an image of the PDF (I'm sure there's a utility for that somewhere), and in your Crystal Report, create an image object with the image location set to whatever PDF you want.