Google Drive API / GUI 导出无效 PDF - iText7 无法读取
我正在尝试通过 Google Drive API 将多个 Google 文档文件导出到 Pdf 中,并使用 iText7 将它们合并为一个,但它会抛出异常 iText.IO.Exceptions.IOException: 'PDF header not find .' 因为 Google 导出的 PDF 格式很奇怪。
Google Disk 生成的 PDF 内容(用记事本读取)不是有效的 PDF。
文件内容以 倥䙄ㄭ㐮┊ㄊ 开头,而不是 %PDF-1.4
上传的 PDF 文件可以从 Google Disk 读取,没有任何问题,并且即使我将 Stream 直接导出到磁盘也是可读的。 当我通过 Google Docs GUI 手动下载文件时,文件内容完全相同。
这是我通过 API 导出文件的代码:
var mimeType = "application/pdf";
var file = GetFile(sourceFile);
var pdfRequest = _driveService.Files.Export(sourceFile, mimeType);
var stream = pdfRequest.ExecuteAsStream();
然后我通过 API 将 PDF 上传回 Google Drive
var newFile = new Google.Apis.Drive.v3.Data.File();
newFile.MimeType = mimeType;
newFile.Parents = new List<string>() { targetFolder };
var createRequest = _driveService.Files.Create(newFile, stream, mimeType);
createRequest.SupportsAllDrives = true;
var createResult = createRequest.Upload();
格式很奇怪我使用时导出的PDF没问题 var text = pdfRequest.Execute();
而不是 pdfRequest.ExecuteAsStream
(以 %PDF-1.7 开头)。 但是 Execute() 返回字符串而不是 Stream。
有没有办法从 Google Disk API 获取标准 PDF 格式或以任何可能的方式转换它?
I'm trying to export multiple Google Docs files via Google Drive API into Pdf and merge them into one using iText7 but it throws exception iText.IO.Exceptions.IOException: 'PDF header not found.' because of the weird PDF format from Google export.
Google Disk generated PDF content (read with notepad) is not valid PDF.
File content starts like this 倥䙄ㄭ㐮┊ㄊ instead of something like %PDF-1.4
The uploaded PDF file is readable from Google Disk without any problem and it is readable even if I export the Stream directly to the disk. File content is exactly the same when I download file manually through Google Docs GUI.
Here is my code to export files via API:
var mimeType = "application/pdf";
var file = GetFile(sourceFile);
var pdfRequest = _driveService.Files.Export(sourceFile, mimeType);
var stream = pdfRequest.ExecuteAsStream();
Then I'm uploading PDF back into Google Drive via it's API
var newFile = new Google.Apis.Drive.v3.Data.File();
newFile.MimeType = mimeType;
newFile.Parents = new List<string>() { targetFolder };
var createRequest = _driveService.Files.Create(newFile, stream, mimeType);
createRequest.SupportsAllDrives = true;
var createResult = createRequest.Upload();
Weirdly enough the format of exported PDF is ok when I usevar text = pdfRequest.Execute();
instead of pdfRequest.ExecuteAsStream
(it starts with %PDF-1.7).
But Execute()
returns string instead of Stream.
Is there any way to get standard PDF format from Google Disk API or convert it in any possible way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题出在iText7本身。它认为 PDF 无效,但它可能只是不支持 iso8859_2 编码的 PDF。
我尝试使用 PDFSharp 来代替,一切都很顺利。
我已经使用 Google Disk API 中的
ExecuteAsStream()
来获取 PDF Stream,没有任何问题,所以它没有错误。感谢您的所有提示。
The problem was in the iText7 itself. It considered PDF as invalid but it probably just does not support PDFs in iso8859_2 encoding.
I tried to use PDFSharp instead and everything went smoothly.
I've used
ExecuteAsStream()
from Google Disk API to get PDF Stream with no problems at all so it wasnt at fault.Thanks for all your tips.