如何在脚本中使用PYPDF2?

发布于 2025-02-02 17:12:34 字数 611 浏览 4 评论 0 原文

import PyPDF2
from PyDF2 import PdfFileReader, PdfFileWriter


file_path="sample.pdf"

pdf = PdfFileReader(file_path)


with open("sample.pdf", "w") as f:'

for page_num in range(pdf.numPages):
   
   pageObj = pdf.getPage(page_num)



   try:
       txt = pageObj.extractText()
       txt = DocumentInformation.author

   except:
       pass

   else:

       f.write(txt)
f.close()

收到的错误: ModulenotFoundError:没有名为“ pypdf2”的模块

编写我的第一个脚本,我想在PDF中扫描,然后提取文本并将其写入TXT文件。我试图使用PYPDF2,但我不确定如何在这样的脚本中使用它。

编辑:我成功地进口了OS&类似的系统。

import os
import sys
import PyPDF2
from PyDF2 import PdfFileReader, PdfFileWriter


file_path="sample.pdf"

pdf = PdfFileReader(file_path)


with open("sample.pdf", "w") as f:'

for page_num in range(pdf.numPages):
   
   pageObj = pdf.getPage(page_num)



   try:
       txt = pageObj.extractText()
       txt = DocumentInformation.author

   except:
       pass

   else:

       f.write(txt)
f.close()

Error Received:
ModuleNotFoundError: No module named 'PyPDF2'

Writing my first ever script where I want to scan in a PDF then extract the text and write it to a txt file. I was trying to use pyPDF2 but I'm not sure how to use it in a script like this.

EDIT: I had success importing the os & sys like so.

import os
import sys

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦晓ヶ微光ヅ倾城 2025-02-09 17:12:34

有多个问题:

  1. 来自PYDF2 Import ... :错字。您的意思是 pypdf2 而不是 pydf2
  2. pdffilewriter 已导入,但从未使用过(sidenote:pypdf2的PDFReader和pdfwriter,pypdf2的最新版本)
  3. 语法错误
  4. 使用open(“ sample.pdf”,“ w”)作为f:':缺少下一行的
  5. :您是否知道您可以简单地编写页面中的页面
  6. documentInformation.author 是错误的。我想您的意思是 pdf.metadata.author
  7. 您覆盖 txt 变量 - 我不明白为什么在重新分配它之前不使用它。

也许这就是您想要的:

from PyPDF2 import PdfReader

def get_text(pdf_file_path: str) -> str:
    text = ""
    reader = PdfReader(pdf_file_path)
    for page in reader.pages:
        text += page.extract_text()
    return text


text = get_text("example.pdf")

with open("example.txt", "w") as f:
    f.write(text)

安装问题,

以防万一您有安装问题,也许可以为您提供帮助?

如果您在控制台中执行脚本为 python yous_script_name.py ,则可能需要检查

python -c "import PyPDF2; print(PyPDF2.__version__)"

该输出的输出,应显示您的pypdf2版本。如果没有,则您正在使用的Python环境未安装PYPDF2。请注意,您的系统可能具有许多Python环境。

There are multiple issues:

  1. from PyDF2 import ...: A typo. You meant PyPDF2 instead of PyDF2
  2. PdfFileWriter was imported, but never used (side-note: It's PdfReader and PdfWriter in the latest version of PyPDF2)
  3. with open("sample.pdf", "w") as f:': A syntax error
  4. Lacking indentation of the next lines
  5. Side-note: Did you know that you can simply write for page in pdf.pages?
  6. DocumentInformation.author is wrong. I guess you meant pdf.metadata.author
  7. You overwrite the txt variable - I don't understand why you don't use it before you re-assign it.

Maybe this is what you want:

from PyPDF2 import PdfReader

def get_text(pdf_file_path: str) -> str:
    text = ""
    reader = PdfReader(pdf_file_path)
    for page in reader.pages:
        text += page.extract_text()
    return text


text = get_text("example.pdf")

with open("example.txt", "w") as f:
    f.write(text)

Installation issues

In case you have installation issues, maybe the docs on installing PyPDF2 can help you?

If you execute your script in the console as python your_script_name.py you might want to check the output of

python -c "import PyPDF2; print(PyPDF2.__version__)"

That should show your PyPDF2 version. If it doesn't, it the Python environment you're using doesn't have PyPDF2 installed. Please note that your system might have arbitrary many Python environments.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文