如何在脚本中使用PYPDF2？

发布于 2025-02-02 17:12:34 字数 611 浏览 4 评论 0 原文

import PyPDF2
from PyDF2 import PdfFileReader, PdfFileWriter


file_path="sample.pdf"

pdf = PdfFileReader(file_path)


with open("sample.pdf", "w") as f:'

for page_num in range(pdf.numPages):
   
   pageObj = pdf.getPage(page_num)



   try:
       txt = pageObj.extractText()
       txt = DocumentInformation.author

   except:
       pass

   else:

       f.write(txt)
f.close()

收到的错误： ModulenotFoundError：没有名为“ pypdf2”的模块

编写我的第一个脚本，我想在PDF中扫描，然后提取文本并将其写入TXT文件。我试图使用PYPDF2，但我不确定如何在这样的脚本中使用它。

编辑：我成功地进口了OS＆amp;类似的系统。

import os
import sys

原文

import PyPDF2
from PyDF2 import PdfFileReader, PdfFileWriter


file_path="sample.pdf"

pdf = PdfFileReader(file_path)


with open("sample.pdf", "w") as f:'

for page_num in range(pdf.numPages):
   
   pageObj = pdf.getPage(page_num)



   try:
       txt = pageObj.extractText()
       txt = DocumentInformation.author

   except:
       pass

   else:

       f.write(txt)
f.close()

Error Received:
ModuleNotFoundError: No module named 'PyPDF2'

Writing my first ever script where I want to scan in a PDF then extract the text and write it to a txt file. I was trying to use pyPDF2 but I'm not sure how to use it in a script like this.

EDIT: I had success importing the os & sys like so.

import os
import sys

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦晓ヶ微光ヅ倾城 2025-02-09 17:12:34

有多个问题：

来自PYDF2 Import ... ：错字。您的意思是 pypdf2 而不是 pydf2
pdffilewriter 已导入，但从未使用过（sidenote：pypdf2的PDFReader和pdfwriter，pypdf2的最新版本）
语法错误
使用open（“ sample.pdf”，“ w”）作为f：'：缺少下一行的
：您是否知道您可以简单地编写页面中的页面？
documentInformation.author 是错误的。我想您的意思是 pdf.metadata.author
您覆盖 txt 变量 - 我不明白为什么在重新分配它之前不使用它。

也许这就是您想要的：

from PyPDF2 import PdfReader

def get_text(pdf_file_path: str) -> str:
    text = ""
    reader = PdfReader(pdf_file_path)
    for page in reader.pages:
        text += page.extract_text()
    return text


text = get_text("example.pdf")

with open("example.txt", "w") as f:
    f.write(text)

安装问题，

以防万一您有安装问题，也许可以为您提供帮助？

如果您在控制台中执行脚本为 python yous_script_name.py ，则可能需要检查

python -c "import PyPDF2; print(PyPDF2.__version__)"

该输出的输出，应显示您的pypdf2版本。如果没有，则您正在使用的Python环境未安装PYPDF2。请注意，您的系统可能具有许多Python环境。

There are multiple issues:

from PyDF2 import ...: A typo. You meant PyPDF2 instead of PyDF2
PdfFileWriter was imported, but never used (side-note: It's PdfReader and PdfWriter in the latest version of PyPDF2)
with open("sample.pdf", "w") as f:': A syntax error
Lacking indentation of the next lines
Side-note: Did you know that you can simply write for page in pdf.pages?
DocumentInformation.author is wrong. I guess you meant pdf.metadata.author
You overwrite the txt variable - I don't understand why you don't use it before you re-assign it.

Maybe this is what you want:

from PyPDF2 import PdfReader

def get_text(pdf_file_path: str) -> str:
    text = ""
    reader = PdfReader(pdf_file_path)
    for page in reader.pages:
        text += page.extract_text()
    return text


text = get_text("example.pdf")

with open("example.txt", "w") as f:
    f.write(text)

Installation issues

In case you have installation issues, maybe the docs on installing PyPDF2 can help you?

If you execute your script in the console as python your_script_name.py you might want to check the output of

python -c "import PyPDF2; print(PyPDF2.__version__)"

That should show your PyPDF2 version. If it doesn't, it the Python environment you're using doesn't have PyPDF2 installed. Please note that your system might have arbitrary many Python environments.

回复收藏 0 原文

~没有更多了~