将PDF文档转换为CSV时,在CSV文件中发生空线
我是Python的新手。我在将pdf
文件转换为csv
格式时遇到了问题。我已经使用了Tabula
将我的PDF文件转换为csv
。但是,在将PDF转换为CSV
时,我正面临csv
文件
示例PDF文件 中的空线的出现。 示例pdf格式
这是我尝试过的,
pdf_path = "/home/niranjan/code/html_spikes/statewise/cin/pdfreader/Manipur_company_1.pdf"
doc = tabula.read_pdf(pdf_path,pages = 'all')
tabula.convert_into(pdf_path,"manipur.csv", output_format = "csv", pages = 'all')
print(doc)
这就是结果看起来像 转换的CSV格式
我期望的结果 预期的CSV输出
转换后的CSV文件可以使某些单元格作为空单元为空,但我需要完美的行订单。我无法想出如何做。
任何人都建议这样做的更好的方法
I am new to python. I have an issue while converting PDf
file into CSV
format. I have used tabula
for converting my PDF file into CSV
. but, while converting PDF into CSV
I am facing the occurrence of empty lines in the CSV
file
sample pdf file to need to be converted
sample pdf format
This is what i have tried,
pdf_path = "/home/niranjan/code/html_spikes/statewise/cin/pdfreader/Manipur_company_1.pdf"
doc = tabula.read_pdf(pdf_path,pages = 'all')
tabula.convert_into(pdf_path,"manipur.csv", output_format = "csv", pages = 'all')
print(doc)
This is the result looks like
converted CSV format
The Result I was expecting
Expected CSV output
the converted CSV file gives some cells as empty but I need perfect row order. I can't able to figure-out how to do it.
Anyone suggest better way to do it
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我解决了这个..
这是代码
I solved this..
Here is the code