将 PDF 文件转换为漂亮的表格
我有这个 PDF 文件,分为 5 列。
我查了又查 Stack Overflow(并疯狂地用 Google 搜索)并尝试了所有解决方案(包括尝试 Adobe Acrobat 本身的最后手段)。
但是,由于某种原因,我无法获得 csv/xls 格式的这 5 列 - 因为我需要对它们进行排列。通常,当我导出它们时,格式很糟糕,所有条目都是逐行排列的,并且会丢失一些数据。
http://www.2shared.com/document/PagE4A1T/ex1.html
这是上面文件摘录的链接,但我真的很沮丧并且没有选择。
I have this PDF file which is arranged in 5 columns.
I have looked and looked through Stack Overflow (and Googled crazily) and tried all the solutions (including the last resort of trying Adobe Acrobat itself).
However, for some reason I cannot get those 5 columns in csv/xls format - as I need them arranged. Usually when I export them, the format is horrible and all the entries are arranged line by line with some data loss.
http://www.2shared.com/document/PagE4A1T/ex1.html
Here is a link to an excerpt of the file above, but I am really getting frustrated and am running out of options.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
iText(或 iTextSharp)可以做到这一点,如果你可以给它这 5 列的边界,并且愿意处理一些开销(即重新解析每列的页面文本)
每行文本应该用
分隔\n
,这样就变成了一个简单的字符串解析问题。如果您不想为每一列重新解析整个页面,您可能会想出一个
FilteredTextRenderListener
的自定义实现,它需要多个侦听器/过滤器对。然后,您可以解析整个事情一次,而不是为每一列解析一次。iText (or iTextSharp) could do this, if you can give it the boundaries of those 5 columns, and are willing to deal with some overhead (namely reparsing the page's text for each column)
Each line of text should be separated by
\n
, so it becomes a simple matter of string parsing.If you wanted to not reparse the whole page for each column, you could probably come up with a custom implementation of
FilteredTextRenderListener
that would take multiple listener/filter pairs. You could then parse the whole thing once rather than once for each column.