The DocumentProcessorServiceAsyncClient.process_document 方法正在使用以下错误消息进行错误: 400文档页面超过限制:“ page_limit_exceeded”
。根据 api文档最多处理200页。通过使用而不是,我假设我将能够利用异步最大页面限制。事实并非如此。
我正在测试的示例代码:
api_path = f'projects/{project_id}/locations/{gcloud_region}/processors/{processor_id}'
documentai_client = documentai.DocumentProcessorServiceAsyncClient() # maybe pass some client_options here?
async def invoke_invoice_processor(self, filebytes):
raw_document = documentai.RawDocument(
content=filebytes,
mime_type="application/pdf",
)
request = documentai.ProcessRequest(
name=api_path,
raw_document=raw_document,
)
response = await documentai_client.process_document(request=request)
return response.document
上述代码块可与PDFS 10页及以下一起使用。 只有大于10页的PDF失败。
我的问题:我需要更改上述代码以成功处理10页以上的较大PDF?
The DocumentProcessorServiceAsyncClient.process_document method is erring out with the following error message: 400 Document pages exceed the limit: "PAGE_LIMIT_EXCEEDED"
. According to the API documentation this processes should be able to handle a maximum of 200 pages. By using the DocumentProcessorServiceAsyncClient and not the DocumentProcessorServiceClient, I assumed that I would be able to leverage the asynchronous maximum page limit. This does not appear to be the case.
The sample code I am testing:
api_path = f'projects/{project_id}/locations/{gcloud_region}/processors/{processor_id}'
documentai_client = documentai.DocumentProcessorServiceAsyncClient() # maybe pass some client_options here?
async def invoke_invoice_processor(self, filebytes):
raw_document = documentai.RawDocument(
content=filebytes,
mime_type="application/pdf",
)
request = documentai.ProcessRequest(
name=api_path,
raw_document=raw_document,
)
response = await documentai_client.process_document(request=request)
return response.document
The above code block works with PDFs 10 pages and under. It only fails with PDFs larger than 10 pages.
MY question: what do I need to change about the above code to successfully process larger PDFs over 10 pages?
发布评论
评论(2)
yan-hic@的评论是正确的
要添加更多详细信息,请按照批处理处理一次,一次发送多个文档,并发送比在线处理更多的页面。异步客户端不会影响处理器或平台的页面限制。
https://cloud.google.com/document-ument.com/document-ai = >
This comment from yan-hic@ is correct
To add more detail, follow the code sample provided in send a processing request for Batch processing to send multiple documents at once and send more pages than possible for Online Processing. The Async Client does not affect the page limitations for the processor or the platform.
https://cloud.google.com/document-ai/quotas#content_limits
如果一个人不一定需要所有页面来处理该文档,则可以通过一个选项设置:反馈prococessoptions
其中有以下选项:
因此,人们可以将当前最大值设置为任何最大值(例如10),它将仅基于第一页来处理请求,从而消除page_limit_exceeded错误。
示例代码看起来像这样:
请参阅 https> https:// 。
If one does not necessarily need all pages to process the document, there is an option set to pass: feedbackProcessOptions
Within that there are the following options:
Thus one can set fromStart to whatever the max-value is currently (e.g. 10) and it will process the request based on just those first pages, negating the PAGE_LIMIT_EXCEEDED error.
Example code would look something like this:
See https://cloud.google.com/document-ai/docs/reference/rest/v1/ProcessOptions for more information.