将PDF文档转换为CSV时，在CSV文件中发生空线

发布于 2025-02-11 06:10:41 字数 889 浏览 1 评论 0原文

我是Python的新手。我在将pdf文件转换为csv格式时遇到了问题。我已经使用了Tabula将我的PDF文件转换为csv。但是，在将PDF转换为CSV时，我正面临csv文件

示例PDF文件中的空线的出现。示例pdf格式

这是我尝试过的，

pdf_path = "/home/niranjan/code/html_spikes/statewise/cin/pdfreader/Manipur_company_1.pdf"

doc = tabula.read_pdf(pdf_path,pages = 'all')
tabula.convert_into(pdf_path,"manipur.csv", output_format = "csv", pages = 'all')
print(doc)

这就是结果看起来像转换的CSV格式

我期望的结果预期的CSV输出

转换后的CSV文件可以使某些单元格作为空单元为空，但我需要完美的行订单。我无法想出如何做。

任何人都建议这样做的更好的方法

原文

I am new to python. I have an issue while converting PDf file into CSV format. I have used tabula for converting my PDF file into CSV. but, while converting PDF into CSV I am facing the occurrence of empty lines in the CSV file

sample pdf file to need to be converted
sample pdf format

This is what i have tried,

pdf_path = "/home/niranjan/code/html_spikes/statewise/cin/pdfreader/Manipur_company_1.pdf"

doc = tabula.read_pdf(pdf_path,pages = 'all')
tabula.convert_into(pdf_path,"manipur.csv", output_format = "csv", pages = 'all')
print(doc)

This is the result looks like
converted CSV format

The Result I was expecting
Expected CSV output

the converted CSV file gives some cells as empty but I need perfect row order. I can't able to figure-out how to do it.

Anyone suggest better way to do it

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

∞觅青森が 2025-02-18 06:10:41

我解决了这个..
这是代码

for row in reader:
    name = " "
    if not row[0]:
       name = row[1]
       for row in reader:
           full_name = name+ " " + row[1] 
           break
       row[1] = full_name

I solved this..
Here is the code

for row in reader:
    name = " "
    if not row[0]:
       name = row[1]
       for row in reader:
           full_name = name+ " " + row[1] 
           break
       row[1] = full_name

回复收藏 0 原文

~没有更多了~