Ghostscript 未从 PDF 文件中提取所有文本

发布于 2024-09-17 00:48:52 字数 474 浏览 6 评论 0原文

我正在使用 Ghostscript 8.71 从 PDF 页面中提取文本。

我正在使用的命令是：

gswin32c -q -sFONTPATH=c:\\fonts -dNODISPLAY -dSAFER -dDELAYBIND \
         -dWRITESYSTEMDICT -dSIMPLE -fps2ascii.ps -dFirstPage=1  \
         -dLastPage=1 input.pdf -dQUIET

我正在使用将文本定向到另一个文件。

但问题是 Ghostscript 无法提取一些可搜索的文本项。

某些字体文本不会被提取，例如：粗体字符的 Verdana。但 Ghostscript 正在打开字体文件。

我可以上传 PDF 文件，但在这里我没有找到任何上传选项。如果有任何可用选项请告诉我。

原文

I am using ghostscript 8.71 to extract text from the PDF pages.

The command I am using is:

gswin32c -q -sFONTPATH=c:\\fonts -dNODISPLAY -dSAFER -dDELAYBIND \
         -dWRITESYSTEMDICT -dSIMPLE -fps2ascii.ps -dFirstPage=1  \
         -dLastPage=1 input.pdf -dQUIET

And I am using <stdout> to direct the text to another file.

But the problem is some searchable text items are not extracted by Ghostscript.

Some font text is not extracted, for example: Verdana in bold characters. But Ghostscript is opening the font files.

I can upload the PDF file but here I didn't find any upload option. If any option is available let me know.

分享到QQ

分享到微博