如何将pdf文件转换为数据表

发布于 2024-11-06 05:11:29 字数 59 浏览 0 评论 0原文

有什么方法可以将PDF文件转换为DataTable吗? PDF 文件主要仅包含表格,任何帮助将不胜感激。

Is there any way to convert PDF file to DataTable? The PDF file mainly consist of only tables any help will be highly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

池予 2024-11-13 05:11:29
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

 public DataTable ImportPDF(string Filename)
    {
        string strText = string.Empty;
        List<string[]> list = new List<string[]>();
        string[] PdfData = null;
        try
        {
            PdfReader reader = new PdfReader((string)Filename);
            for (int page = 1; page <= reader.NumberOfPages; page++)
            {
                ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
                String cipherText = PdfTextExtractor.GetTextFromPage(reader, page, its);
                cipherText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(cipherText)));
                strText = strText + "\n" + cipherText;
                PdfData = strText.Split('\n');

            }
            reader.Close();
        }
        catch (Exception ex)
        {
        }

        List<string> temp = PdfData.ToList();
        temp.RemoveAt(0);
        list = temp.ConvertAll<string[]>(x => x.Split(' ').ToArray());
        List<string> columns = list.FirstOrDefault().ToList();
        DataTable dtTemp = new DataTable();
        columns.All(x => { dtTemp.Columns.Add(new DataColumn(x)); return true; });
        list.All(x => { dtTemp.Rows.Add(dtTemp.NewRow().ItemArray = x); return true; });
        return dtTemp;
    }
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

 public DataTable ImportPDF(string Filename)
    {
        string strText = string.Empty;
        List<string[]> list = new List<string[]>();
        string[] PdfData = null;
        try
        {
            PdfReader reader = new PdfReader((string)Filename);
            for (int page = 1; page <= reader.NumberOfPages; page++)
            {
                ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
                String cipherText = PdfTextExtractor.GetTextFromPage(reader, page, its);
                cipherText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(cipherText)));
                strText = strText + "\n" + cipherText;
                PdfData = strText.Split('\n');

            }
            reader.Close();
        }
        catch (Exception ex)
        {
        }

        List<string> temp = PdfData.ToList();
        temp.RemoveAt(0);
        list = temp.ConvertAll<string[]>(x => x.Split(' ').ToArray());
        List<string> columns = list.FirstOrDefault().ToList();
        DataTable dtTemp = new DataTable();
        columns.All(x => { dtTemp.Columns.Add(new DataColumn(x)); return true; });
        list.All(x => { dtTemp.Rows.Add(dtTemp.NewRow().ItemArray = x); return true; });
        return dtTemp;
    }
椵侞 2024-11-13 05:11:29

如果 PDF 包含标记内容(您可以在我的博客文章 http://www.jpedal.org/PDFblog/2010/09/the-easy-way-to-discover-if-a- pdf-file-contains-structed-content/)您可以从 PDF 文件中提取它。否则,您将需要提取文本并尝试猜测结构。

If the PDF contains marked content (you can see how to find this in my blog article http://www.jpedal.org/PDFblog/2010/09/the-easy-way-to-discover-if-a-pdf-file-contains-structured-content/) you can extract it from the PDF file. Otherwise you will need to extract the text and try to guess the structure.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文