Python 库 Camelot 未读取一页中的所有表格

发布于 2025-01-18 06:42:07 字数 2113 浏览 1 评论 0 原文

我正在使用Camelot Python库来阅读PDF文档页面中的所有

表/Auto-trend/2022/auto-trend0122.pdf“ rel =“ nofollow noreferrer”> pdf

我试图调试绘制页面的调试，如果我改变了味道，我注意到了一些味道：

这是有风味的 lattice

这是带有味道

问题是，如果我使用晶格味，它将无法正确阅读桌子一个示例在这里

如果我使用float ='stream'，它将正确读取数据，但仅读取一个表：输出是这样的。

我尝试使用table_area/table_rigions来检测带有风味='stream'的两个表，但它不起作用。我在这里粘贴代码。

带有晶格的代码：

import camelot

file = "2022/Auto-trend0122.pdf" 
tables = camelot.read_pdf(file,pages='10',flavor='lattice',edge_tool=1500) 
print("Total tables extracted:", tables.n) 
print(tables[0].df) camelot.plot(tables[0],filename="try_plot.png", kind='contour') 
print(tables[1].df)

带流的代码，不带table_area/table_rigions：

import camelot

file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='stream', edge_tool=1500)
print("Total tables extracted:", tables.n)
print(tables[0].df)
camelot.plot(tables[0],filename="try_plot.png", kind='contour')

带有流的代码，带有table_area：

import camelot

file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='stream',edge_tool=1500,table_area=['10,450,550,50','10,750,550,450'])
print("Total tables extracted:", tables.n)
print(tables[0].df)
camelot.plot(tables[0],filename="try_plot.png", kind='contour')

带有流的代码，带有table_rigions：

import camelot

file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='stream',edge_tool=1500,table_regions=['10,450,550,50','10,750,550,450'])
print("Total tables extracted:", tables.n)
print(tables[0].df)
camelot.plot(tables[0],filename="try_plot.png", kind='contour')

table_ 区域的输出/table/table _area/note ness nes nes nes nes nes。

原文

I'm using Camelot Python Library to read all tables in a page of pdf document

I'm tring to read all tables at page 10 in this pdf

I tried to debug plotting the page and I noticed something if I change the flavor:

This is with flavor lattice

This is with flavor stream

The problem is if I use lattice flavor it will not read properly the tables
an example here

If I use flavor='stream', It will read data properly but just of one table:
The output is somenthing like this.

I tried to use table_area/table_regions for detect the two tables with flavor='stream', but it didn't work.
I paste the code down here.

Code with lattice:

import camelot

file = "2022/Auto-trend0122.pdf" 
tables = camelot.read_pdf(file,pages='10',flavor='lattice',edge_tool=1500) 
print("Total tables extracted:", tables.n) 
print(tables[0].df) camelot.plot(tables[0],filename="try_plot.png", kind='contour') 
print(tables[1].df)

Code with stream, without table_area/table_regions:

import camelot

file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='stream', edge_tool=1500)
print("Total tables extracted:", tables.n)
print(tables[0].df)
camelot.plot(tables[0],filename="try_plot.png", kind='contour')

Code with stream, with table_area:

import camelot

file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='stream',edge_tool=1500,table_area=['10,450,550,50','10,750,550,450'])
print("Total tables extracted:", tables.n)
print(tables[0].df)
camelot.plot(tables[0],filename="try_plot.png", kind='contour')

Code with stream, with table_regions:

import camelot

file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='stream',edge_tool=1500,table_regions=['10,450,550,50','10,750,550,450'])
print("Total tables extracted:", tables.n)
print(tables[0].df)
camelot.plot(tables[0],filename="try_plot.png", kind='contour')

The output for table_regions/table_area/without is the same.