从 pdf 中提取选定区域或坐标中的文本和图像

发布于 2024-08-24 08:06:33 字数 239 浏览 6 评论 0原文

我有一个从 pdf 文件中的特定区域提取文本和图像的特定要求。该区域可能是选定的或突出显示的,也可能是来自给定的一组坐标。

当我经历过时,所有方法都是从 PDF 中完全提取图像和文本,而不是在指定位置。 我尝试使用 iTextSharp、Syncfussion、Apose,但无法找到更好的方法。

如果有人能在这方面帮助我,那就太好了。您能否分享您关于如何在 .net 中实现这一点的想法和建议?

问候, 阿伦·M

I have a specific requirement of extracting text and images from a specific area in a pdf file.The area might be a selected or highlighted or from a given set of coordinates.

When i went through, all the approaches are to extract images and text entirely from the PDF on not in a specified location.
I tried with iTextSharp,Syncfussion,Apose but couldn figure out a better approach for this.

If somebody could help me out in this it would be greatfull. Can you share your ideas and suggestion on how to implement this in .net.

Regards,
Arun.M

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

把昨日还给我 2024-08-31 08:06:33

此代码从pdf中提取图像

using System;
using System.Data;
using System.Configuration;
using System.Collections;
using System.Drawing.Imaging;
using System.IO;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using Bytescout.PDFExtractor;

namespace ExtractAllImages
{
    public partial class _Default : System.Web.UI.Page
    {
        protected void Page_Load(object sender, EventArgs e)
        {
            // This test file will be copied to the project directory on the pre-build event (see the project properties).
            String inputFile = Server.MapPath("sample1.pdf");

            // Create Bytescout.PDFExtractor.ImageExtractor instance
            ImageExtractor extractor = new ImageExtractor();
            extractor.RegistrationName = "demo";
            extractor.RegistrationKey = "demo";

            // Load sample PDF document
            extractor.LoadDocumentFromFile("sample1.pdf");

            Response.Clear();

            int i = 0;

            // Initialize image enumeration
            if (extractor.GetFirstImage())
            {
                do
                {
                    if (i == 0) // Write the fist image to the Response stream
                    {
                        string imageFileName = "image" + i + ".png";

                        Response.Write("<b>" + imageFileName + "</b>");

                        Response.ContentType = "image/png";
                        Response.AddHeader("Content-Disposition", "inline;filename=" + imageFileName);

                        // Write the image bytes into the Response output stream
                        Response.BinaryWrite(extractor.GetCurrentImageAsArrayOfBytes());
                    }

                    i++;

                } while (extractor.GetNextImage()); // Advance image enumeration
            }

            Response.End();
        }
    }
}

this code extract images from pdf

using System;
using System.Data;
using System.Configuration;
using System.Collections;
using System.Drawing.Imaging;
using System.IO;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using Bytescout.PDFExtractor;

namespace ExtractAllImages
{
    public partial class _Default : System.Web.UI.Page
    {
        protected void Page_Load(object sender, EventArgs e)
        {
            // This test file will be copied to the project directory on the pre-build event (see the project properties).
            String inputFile = Server.MapPath("sample1.pdf");

            // Create Bytescout.PDFExtractor.ImageExtractor instance
            ImageExtractor extractor = new ImageExtractor();
            extractor.RegistrationName = "demo";
            extractor.RegistrationKey = "demo";

            // Load sample PDF document
            extractor.LoadDocumentFromFile("sample1.pdf");

            Response.Clear();

            int i = 0;

            // Initialize image enumeration
            if (extractor.GetFirstImage())
            {
                do
                {
                    if (i == 0) // Write the fist image to the Response stream
                    {
                        string imageFileName = "image" + i + ".png";

                        Response.Write("<b>" + imageFileName + "</b>");

                        Response.ContentType = "image/png";
                        Response.AddHeader("Content-Disposition", "inline;filename=" + imageFileName);

                        // Write the image bytes into the Response output stream
                        Response.BinaryWrite(extractor.GetCurrentImageAsArrayOfBytes());
                    }

                    i++;

                } while (extractor.GetNextImage()); // Advance image enumeration
            }

            Response.End();
        }
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文