我正在使用Camelot Python库来阅读PDF文档页面中的所有
表/Auto-trend/2022/auto-trend0122.pdf“ rel =“ nofollow noreferrer”> pdf
我试图调试绘制页面的调试,如果我改变了味道,我注意到了一些味道:
这是有风味的 lattice
这是带有味道
问题是,如果我使用晶格味,它将无法正确阅读桌子
一个示例在这里
如果我使用float ='stream',它将正确读取数据,但仅读取一个表:
输出是这样的。
我尝试使用table_area/table_rigions来检测带有风味='stream'的两个表,但它不起作用。
我在这里粘贴代码。
带有晶格的代码:
import camelot
file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='lattice',edge_tool=1500)
print("Total tables extracted:", tables.n)
print(tables[0].df) camelot.plot(tables[0],filename="try_plot.png", kind='contour')
print(tables[1].df)
带流的代码,不带table_area/table_rigions:
import camelot
file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='stream', edge_tool=1500)
print("Total tables extracted:", tables.n)
print(tables[0].df)
camelot.plot(tables[0],filename="try_plot.png", kind='contour')
带有流的代码,带有table_area:
import camelot
file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='stream',edge_tool=1500,table_area=['10,450,550,50','10,750,550,450'])
print("Total tables extracted:", tables.n)
print(tables[0].df)
camelot.plot(tables[0],filename="try_plot.png", kind='contour')
带有流的代码,带有table_rigions:
import camelot
file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='stream',edge_tool=1500,table_regions=['10,450,550,50','10,750,550,450'])
print("Total tables extracted:", tables.n)
print(tables[0].df)
camelot.plot(tables[0],filename="try_plot.png", kind='contour')
table_ 区域的输出/table/table _area/note ness nes nes nes nes nes。
I'm using Camelot Python Library to read all tables in a page of pdf document
I'm tring to read all tables at page 10 in this pdf
I tried to debug plotting the page and I noticed something if I change the flavor:
This is with flavor lattice
This is with flavor stream
The problem is if I use lattice flavor it will not read properly the tables
an example here
If I use flavor='stream', It will read data properly but just of one table:
The output is somenthing like this.
I tried to use table_area/table_regions for detect the two tables with flavor='stream', but it didn't work.
I paste the code down here.
Code with lattice:
import camelot
file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='lattice',edge_tool=1500)
print("Total tables extracted:", tables.n)
print(tables[0].df) camelot.plot(tables[0],filename="try_plot.png", kind='contour')
print(tables[1].df)
Code with stream, without table_area/table_regions:
import camelot
file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='stream', edge_tool=1500)
print("Total tables extracted:", tables.n)
print(tables[0].df)
camelot.plot(tables[0],filename="try_plot.png", kind='contour')
Code with stream, with table_area:
import camelot
file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='stream',edge_tool=1500,table_area=['10,450,550,50','10,750,550,450'])
print("Total tables extracted:", tables.n)
print(tables[0].df)
camelot.plot(tables[0],filename="try_plot.png", kind='contour')
Code with stream, with table_regions:
import camelot
file = "2022/Auto-trend0122.pdf"
tables = camelot.read_pdf(file,pages='10',flavor='stream',edge_tool=1500,table_regions=['10,450,550,50','10,750,550,450'])
print("Total tables extracted:", tables.n)
print(tables[0].df)
camelot.plot(tables[0],filename="try_plot.png", kind='contour')
The output for table_regions/table_area/without is the same.
发布评论
评论(1)
问题是您使用的是 table_area 而不是正确的参数
table_areas
(请阅读 文档)。以下命令完美运行:
tables =camlot.read_pdf(file,pages='10',flavor='stream',edge_tool=1500,table_areas=['10,450,550,50','10,750,550,450'])
table_areas 和 table_regions 之间的差异
table_areas
应该当您知道桌子的确切位置时可以使用。相反,table_regions
使检测引擎仅在这些通用页面区域中查找表。The problem is that you are using table_area instead of the correct parameter
table_areas
(read the docs).The following command works perfectly:
tables = camelot.read_pdf(file,pages='10', flavor='stream', edge_tool=1500, table_areas=['10,450,550,50','10,750,550,450'])
Difference between table_areas and table_regions
table_areas
should be used when you know the exact position of the table. Conversely,table_regions
makes the detection engine look for tables only in those generic page regions.