从PDF中，提取具有命令行的页码的亮点

发布于 2025-02-12 17:51:16 字数 460 浏览 1 评论 0原文

有没有一种方法可以从PDF中提取具有从命令行的相应页码的突出显示段落？我找到了两个工具，但它们并不完全满足我的需求： pdf-highlights-extractor 允许我要使用页码提取突出显示的段落，但没有命令行接口，只有图形接口。 dyAnnotationExtractor 具有命令行接口，但只给了我突出显示的段落，而不是页码。有没有可以做我需要的工具？顺便说一句，我在Linux上。

预先感谢您的帮助！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你在看孤独的风景 2025-02-19 17:51:16

我建议使用漂亮的小python库 pdfannots ，这具有您想要的功能。

$ pdfannots document.pdf

如果与其他一些bash命令结合使用，它可以产生格式良好的输出。例如：

$ pdfannots document.pdf --no-condense | \
# Removing duplicate lines:
cat -n | sort -uk2 | sort -nk1 | cut -f2- | \
# Improving output formatting:
awk '{$1=$1};1' | sed 's/^\(> \)//g' | sed 's/* Page #/\n&/'

I would recommend usage of the nifty little Python library pdfannots, which has the very capability you are looking for.

$ pdfannots document.pdf

If combined with some other Bash commands, it can produce nicely formatted output. For example:

$ pdfannots document.pdf --no-condense | \
# Removing duplicate lines:
cat -n | sort -uk2 | sort -nk1 | cut -f2- | \
# Improving output formatting:
awk '{$1=$1};1' | sed 's/^\(> \)//g' | sed 's/* Page #/\n&/'

回复收藏 0 原文

~没有更多了~