PdfTextExtractor 中的 iTextSharp 错误？

发布于 2024-11-14 05:19:50 字数 685 浏览 4 评论 0原文

我刚刚开始尝试使用 iTextSharp 来操作 PDF 文档。作为一个简单的练习，我尝试使用以下代码从简单的 PDF 中提取文本。

protected void btnUpload_Click(object sender, EventArgs e)
        {
            if (fuPDFUpload.HasFile)
            {
                PdfReader reader = new PdfReader(fuPDFUpload.FileBytes);
                for (int i = 0; i < reader.NumberOfPages; i++)
                {
                    lblPdfText.Text += PdfTextExtractor.GetTextFromPage(reader, i);    
                }

            }
        }

上面的代码抛出一个空引用异常，reader不为null，i显然不为null，因为它是一个int，如果reader为null，我会期望一个ArgumentNullException。 reader 有页面，因此它进入循环。我只能认为这是某种错误。它是开源的，所以我可以尝试修复它，但我真的没有时间。有谁知道这里可能发生什么或者我如何解决它？

原文

I am just starting to try and use iTextSharp for manipulating PDF documents. As a simple exercise i have tried to extract the text from a simple PDF using the below code.

protected void btnUpload_Click(object sender, EventArgs e)
        {
            if (fuPDFUpload.HasFile)
            {
                PdfReader reader = new PdfReader(fuPDFUpload.FileBytes);
                for (int i = 0; i < reader.NumberOfPages; i++)
                {
                    lblPdfText.Text += PdfTextExtractor.GetTextFromPage(reader, i);    
                }

            }
        }

The above code throws a null reference exception, reader is not null and i is obviously not null being an int, if reader was null i would expect an ArgumentNullException. reader has pages hence the fact it goes into the loop. I can only think this is some kind of bug. It is open source so i could try and fix it but i really don't have the time. Does anyone know what might be going on here or how i might work around it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

往日情怀 2024-11-21 05:19:50

好的，所以 PDF 没有第 0 页，下面的代码工作正常：

protected void btnUpload_Click(object sender, EventArgs e)
        {
            if (fuPDFUpload.HasFile)
            {
                PdfReader reader = new PdfReader(fuPDFUpload.FileBytes);
                for (int i = 1; i <= reader.NumberOfPages; i++)
                {
                    lblPdfText.Text += PdfTextExtractor.GetTextFromPage(reader, i);    
                }

            }
        }

这是一个非常无用的异常，你会认为有某种检查会抛出一个更有用的异常，也许我有时间时会提交一个补丁。

OK so PDFs do not have a page 0, the below code works fine:

protected void btnUpload_Click(object sender, EventArgs e)
        {
            if (fuPDFUpload.HasFile)
            {
                PdfReader reader = new PdfReader(fuPDFUpload.FileBytes);
                for (int i = 1; i <= reader.NumberOfPages; i++)
                {
                    lblPdfText.Text += PdfTextExtractor.GetTextFromPage(reader, i);    
                }

            }
        }

That is a very unhelpful exception, you would think there was some kind of check that would throw a more helpful exception, maybe i shall submit a patch when i have time.

回复收藏 0 原文

~没有更多了~