当前位置：文江博客话题详情

Python subprocess

我写了一个python函数，从PDF中提取文本。我需要处理这种错误的帮助

发布于 2025-02-09 10:58:57 字数 1140 浏览 1 评论 0原文

def extract_clean_text_lisible(path_input, path_output):

    if spawn.find_executable("pdftotext"):
        path_input = current_app.config['PROJECT_PATH']+path_input
        path_output = current_app.config['PROJECT_PATH']+path_output
        pdftotext = current_app.config['POPPLER_PATH']+"/pdftotext.exe"
        out, err = sp.Popen([pdftotext, "-layout", "-enc", "UTF-8", path_input, "-"], stdout=sp.PIPE, stderr=sp.PIPE, stdin=sp.PIPE)
        fichier = open(path_output, "w", encoding="utf-8")
        s = out.decode("utf-8")
        fichier.write(s)
        fichier.close()
    
    else:
        raise EnvironmentError(
            "pdftotext not installed. can be downloaded from https://poppler.freedesktop.org/"
        )
    return out.decode("utf-8")

out, err = sp.Popen([pdftotext, "-layout", "-enc", "UTF-8", path_input, "-"], stdout=sp.PIPE, stderr=sp.PIPE, stdin=sp.PIPE)

TypeError: cannot unpack non-iterable Popen objec

def extract_clean_text_lisible(path_input, path_output):

    if spawn.find_executable("pdftotext"):
        path_input = current_app.config['PROJECT_PATH']+path_input
        path_output = current_app.config['PROJECT_PATH']+path_output
        pdftotext = current_app.config['POPPLER_PATH']+"/pdftotext.exe"
        out, err = sp.Popen([pdftotext, "-layout", "-enc", "UTF-8", path_input, "-"], stdout=sp.PIPE, stderr=sp.PIPE, stdin=sp.PIPE)
        fichier = open(path_output, "w", encoding="utf-8")
        s = out.decode("utf-8")
        fichier.write(s)
        fichier.close()
    
    else:
        raise EnvironmentError(
            "pdftotext not installed. can be downloaded from https://poppler.freedesktop.org/"
        )
    return out.decode("utf-8")

out, err = sp.Popen([pdftotext, "-layout", "-enc", "UTF-8", path_input, "-"], stdout=sp.PIPE, stderr=sp.PIPE, stdin=sp.PIPE)

TypeError: cannot unpack non-iterable Popen objec

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（2）

好久不见√ 2025-02-16 10:58:57

我找到了该功能的解决方案：

DEF extract_clean_text_lisible（path_input，path_output）：

if spawn.find_executable("pdftotext"):
    path_input = current_app.config['PROJECT_PATH']+path_input
    path_output = current_app.config['PROJECT_PATH']+path_output
    pdftotext = current_app.config['POPPLER_PATH']+"/pdftotext.exe"
    out = sp.Popen([pdftotext, "-layout", "-enc", "UTF-8", path_input, "-"], stdout=sp.PIPE, stderr=sp.PIPE, stdin=sp.PIPE)
    text = out.communicate()[0].decode('utf-8')    
    fichier = open(path_output, 'w', encoding='utf-8')
    fichier.write(text)
    fichier.close()

else:
    raise EnvironmentError(
        "pdftotext not installed. can be downloaded from https://poppler.freedesktop.org/"
    )
return out.decode("utf-8")

您可以使用此功能从PDF中提取干净的文本并将其存储在指定的路径中。

确保安装Poppler和Tesseract以使用此功能。

I Have found a solution to this function:

def extract_clean_text_lisible(path_input, path_output):

if spawn.find_executable("pdftotext"):
    path_input = current_app.config['PROJECT_PATH']+path_input
    path_output = current_app.config['PROJECT_PATH']+path_output
    pdftotext = current_app.config['POPPLER_PATH']+"/pdftotext.exe"
    out = sp.Popen([pdftotext, "-layout", "-enc", "UTF-8", path_input, "-"], stdout=sp.PIPE, stderr=sp.PIPE, stdin=sp.PIPE)
    text = out.communicate()[0].decode('utf-8')    
    fichier = open(path_output, 'w', encoding='utf-8')
    fichier.write(text)
    fichier.close()

else:
    raise EnvironmentError(
        "pdftotext not installed. can be downloaded from https://poppler.freedesktop.org/"
    )
return out.decode("utf-8")

you can use this function to extract clean text from pdfs and store it in a specified path.

be sure to install poppler and Tesseract to use this function.

回复收藏 0 原文

梦明 2025-02-16 10:58:57

对于Windows中的这样的基本命令，您不需要Python只是将pdftotext.exe的快捷方式发送到任何位置，但桌面最简单。

然后根据需要更改属性，因此在这种情况下，添加`-layout -utf-8，您可以添加任何图标，然后对于任何一个PDF，您可以在图标上拖放

，您可以立即将文本文件在同一文件夹中

< img src =“ https://i.sstatic.net/hvc8c.png” alt =“在此处输入映像”>

对于多个文件或PDF的文件夹，您需要更复杂的快捷方式（仍然可以作为可能一行），使得更容易在CMD文件中编写一两行以传递变量。

回复收藏 0 原文

~没有更多了~

关于作者

如若梦似彩虹

暂无简介

文章

评论

27 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

櫻之舞

文章 0 评论 0

弥枳

文章 0 评论 0

m2429

文章 0 评论 0

寻找一个思念的角度

文章 0 评论 0

野却迷人

文章 0 评论 0

我怀念的。

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文