ITextSharp HTML 转 PDF?

发布于 2024-09-01 08:01:59 字数 203 浏览 12 评论 0原文

我想知道 ITextSharp 是否具有将 HTML 转换为 PDF 的功能。我要转换的所有内容都只是纯文本,但不幸的是,关于 ITextSharp 的文档很少甚至没有,所以我无法确定这对我来说是否是一个可行的解决方案。

如果它不能做到这一点,有人可以向我指出一些好的、免费的 .net 库,可以将简单的纯文本 HTML 文档并将其转换为 pdf 吗?

蒂亚。

I'd like to know if ITextSharp has the capability of converting HTML to PDF. Everything I will convert will just be plain text but unfortunately there is very little to no documentation on ITextSharp so I can't determine if that will be a viable solution for me.

If it can't do it, can someone point me to some good, free .net libraries that can take a simple plain text HTML document and convert it to a pdf?

tia.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

牵你手 2024-09-08 08:01:59

几周前我遇到了同样的问题,这是我发现的结果。此方法将 HTML 快速转储为 PDF。该文档很可能需要进行一些格式调整。

private MemoryStream createPDF(string html)
{
    MemoryStream msOutput = new MemoryStream();
    TextReader reader = new StringReader(html);

    // step 1: creation of a document-object
    Document document = new Document(PageSize.A4, 30, 30, 30, 30);

    // step 2:
    // we create a writer that listens to the document
    // and directs a XML-stream to a file
    PdfWriter writer = PdfWriter.GetInstance(document, msOutput);

    // step 3: we create a worker parse the document
    HTMLWorker worker = new HTMLWorker(document);

    // step 4: we open document and start the worker on the document
    document.Open();
    worker.StartDocument();

    // step 5: parse the html into the document
    worker.Parse(reader);

    // step 6: close the document and the worker
    worker.EndDocument();
    worker.Close();
    document.Close();

    return msOutput;
}

I came across the same question a few weeks ago and this is the result from what I found. This method does a quick dump of HTML to a PDF. The document will most likely need some format tweaking.

private MemoryStream createPDF(string html)
{
    MemoryStream msOutput = new MemoryStream();
    TextReader reader = new StringReader(html);

    // step 1: creation of a document-object
    Document document = new Document(PageSize.A4, 30, 30, 30, 30);

    // step 2:
    // we create a writer that listens to the document
    // and directs a XML-stream to a file
    PdfWriter writer = PdfWriter.GetInstance(document, msOutput);

    // step 3: we create a worker parse the document
    HTMLWorker worker = new HTMLWorker(document);

    // step 4: we open document and start the worker on the document
    document.Open();
    worker.StartDocument();

    // step 5: parse the html into the document
    worker.Parse(reader);

    // step 6: close the document and the worker
    worker.EndDocument();
    worker.Close();
    document.Close();

    return msOutput;
}
﹎☆浅夏丿初晴 2024-09-08 08:01:59

经过一番挖掘后,我找到了一种用 ITextSharp 完成我需要的好方法。

这是一些示例代码,如果它将来对其他人有帮助的话:

protected void Page_Load(object sender, EventArgs e)
{
    Document document = new Document();
    try
    {
        PdfWriter.GetInstance(document, new FileStream("c:\\my.pdf", FileMode.Create));
        document.Open();
        WebClient wc = new WebClient();
        string htmlText = wc.DownloadString("http://localhost:59500/my.html");
        Response.Write(htmlText);
        List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), null);
        for (int k = 0; k < htmlarraylist.Count; k++)
        {
            document.Add((IElement)htmlarraylist[k]);
        }

        document.Close();
    }
    catch
    {
    }
}

after doing some digging I found a good way to accomplish what I need with ITextSharp.

Here is some sample code if it will help anyone else in the future:

protected void Page_Load(object sender, EventArgs e)
{
    Document document = new Document();
    try
    {
        PdfWriter.GetInstance(document, new FileStream("c:\\my.pdf", FileMode.Create));
        document.Open();
        WebClient wc = new WebClient();
        string htmlText = wc.DownloadString("http://localhost:59500/my.html");
        Response.Write(htmlText);
        List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), null);
        for (int k = 0; k < htmlarraylist.Count; k++)
        {
            document.Add((IElement)htmlarraylist[k]);
        }

        document.Close();
    }
    catch
    {
    }
}
夜光 2024-09-08 08:01:59

这是我在 5.4.2 版本(来自 nuget 安装)上能够从 asp.net mvc 控制器返回 pdf 响应的内容。如果需要的话,可以将其修改为使用 FileStream 而不是 MemoryStream 进行输出。

我将其发布在这里是因为它是当前 iTextSharp 在 html 中的使用的完整示例 -> pdf 转换(忽略图像,我没有看过它,因为我的使用不需要它)

它使用 iTextSharp 的 XmlWorkerHelper,因此传入的 hmtl 必须是有效的 XHTML,因此您可能需要根据您的输入进行一些修复。

using iTextSharp.text.pdf;
using iTextSharp.tool.xml;
using System.IO;
using System.Web.Mvc;

namespace Sample.Web.Controllers
{
    public class PdfConverterController : Controller
    {
        [ValidateInput(false)]
        [HttpPost]
        public ActionResult HtmlToPdf(string html)
        {           

            html = @"<?xml version=""1.0"" encoding=""UTF-8""?>
                 <!DOCTYPE html 
                     PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN""
                    ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">
                 <html xmlns=""http://www.w3.org/1999/xhtml"" xml:lang=""en"" lang=""en"">
                    <head>
                        <title>Minimal XHTML 1.0 Document with W3C DTD</title>
                    </head>
                  <body>
                    " + html + "</body></html>";

            var bytes = System.Text.Encoding.UTF8.GetBytes(html);

            using (var input = new MemoryStream(bytes))
            {
                var output = new MemoryStream(); // this MemoryStream is closed by FileStreamResult

                var document = new iTextSharp.text.Document(iTextSharp.text.PageSize.LETTER, 50, 50, 50, 50);
                var writer = PdfWriter.GetInstance(document, output);
                writer.CloseStream = false;
                document.Open();

                var xmlWorker = XMLWorkerHelper.GetInstance();
                xmlWorker.ParseXHtml(writer, document, input, null);
                document.Close();
                output.Position = 0;

                return new FileStreamResult(output, "application/pdf");
            }
        }
    }
}

Here's what I was able to get working on version 5.4.2 (from the nuget install) to return a pdf response from an asp.net mvc controller. It could be modfied to use a FileStream instead of MemoryStream for the output if that's what is needed.

I post it here because it is a complete example of current iTextSharp usage for the html -> pdf conversion (disregarding images, I haven't looked at that since my usage doesn't require it)

It uses iTextSharp's XmlWorkerHelper, so the incoming hmtl must be valid XHTML, so you may need to do some fixup depending on your input.

using iTextSharp.text.pdf;
using iTextSharp.tool.xml;
using System.IO;
using System.Web.Mvc;

namespace Sample.Web.Controllers
{
    public class PdfConverterController : Controller
    {
        [ValidateInput(false)]
        [HttpPost]
        public ActionResult HtmlToPdf(string html)
        {           

            html = @"<?xml version=""1.0"" encoding=""UTF-8""?>
                 <!DOCTYPE html 
                     PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN""
                    ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">
                 <html xmlns=""http://www.w3.org/1999/xhtml"" xml:lang=""en"" lang=""en"">
                    <head>
                        <title>Minimal XHTML 1.0 Document with W3C DTD</title>
                    </head>
                  <body>
                    " + html + "</body></html>";

            var bytes = System.Text.Encoding.UTF8.GetBytes(html);

            using (var input = new MemoryStream(bytes))
            {
                var output = new MemoryStream(); // this MemoryStream is closed by FileStreamResult

                var document = new iTextSharp.text.Document(iTextSharp.text.PageSize.LETTER, 50, 50, 50, 50);
                var writer = PdfWriter.GetInstance(document, output);
                writer.CloseStream = false;
                document.Open();

                var xmlWorker = XMLWorkerHelper.GetInstance();
                xmlWorker.ParseXHtml(writer, document, input, null);
                document.Close();
                output.Position = 0;

                return new FileStreamResult(output, "application/pdf");
            }
        }
    }
}
智商已欠费 2024-09-08 08:01:59

如果我有声誉的话,我会比 mightymada 的答案更胜一筹——我刚刚使用 Pechkin 实现了一个 asp.net HTML 到 PDF 的解决方案。结果很棒。

Pechkin 有一个 nuget 包,但正如上面的海报在他的博客中提到的那样(http ://codeutil.wordpress.com/2013/09/16/convert-html-to-pdf/ - 我希望她不介意我重新发布它),此分支中已修复内存泄漏:

https://github.com/tuespetre/Pechkin

上面的博客有关于如何包含这个包的具体说明(它是一个 32位 dll 并需要 .net4)。这是我的代码。传入的 HTML 实际上是通过 HTML Agility pack 组装的(我正在自动生成发票):

public static byte[] PechkinPdf(string html)
{
  //Transform the HTML into PDF
  var pechkin = Factory.Create(new GlobalConfig());
  var pdf = pechkin.Convert(new ObjectConfig()
                          .SetLoadImages(true).SetZoomFactor(1.5)
                          .SetPrintBackground(true)
                          .SetScreenMediaType(true)
                          .SetCreateExternalLinks(true), html);

  //Return the PDF file
  return pdf;
}

再次感谢您 mightymada - 您的回答非常棒。

I would one-up'd mightymada's answer if I had the reputation - I just implemented an asp.net HTML to PDF solution using Pechkin. results are wonderful.

There is a nuget package for Pechkin, but as the above poster mentions in his blog (http://codeutil.wordpress.com/2013/09/16/convert-html-to-pdf/ - I hope she doesn't mind me reposting it), there's a memory leak that's been fixed in this branch:

https://github.com/tuespetre/Pechkin

The above blog has specific instructions for how to include this package (it's a 32 bit dll and requires .net4). here is my code. The incoming HTML is actually assembled via HTML Agility pack (I'm automating invoice generations):

public static byte[] PechkinPdf(string html)
{
  //Transform the HTML into PDF
  var pechkin = Factory.Create(new GlobalConfig());
  var pdf = pechkin.Convert(new ObjectConfig()
                          .SetLoadImages(true).SetZoomFactor(1.5)
                          .SetPrintBackground(true)
                          .SetScreenMediaType(true)
                          .SetCreateExternalLinks(true), html);

  //Return the PDF file
  return pdf;
}

again, thank you mightymada - your answer is fantastic.

雅心素梦 2024-09-08 08:01:59

我更喜欢使用另一个名为 Pechkin 的库,因为它能够转换重要的 HTML(也有 CSS 类)。这是可能的,因为该库使用 WebKit 布局引擎,Chrome 和 Safari 等浏览器也使用该引擎。

我在博客上详细介绍了我使用 Pechkin 的经历: http://codeutil.wordpress.com /2013/09/16/convert-html-to-pdf/

I prefer using another library called Pechkin because it is able to convert non trivial HTML (that also has CSS classes). This is possible because this library uses the WebKit layout engine that is also used by browsers like Chrome and Safari.

I detailed on my blog my experience with Pechkin: http://codeutil.wordpress.com/2013/09/16/convert-html-to-pdf/

盛装女皇 2024-09-08 08:01:59

上面的代码肯定有助于将 HTML 转换为 PDF,但如果 HTML 代码具有带有相对路径的 IMG 标签,则会失败。 iTextSharp 库不会自动将相对路径转换为绝对路径。

我尝试了上面的代码并添加了代码来处理 IMG 标签。

您可以在这里找到代码供您参考:
http://www.am22tech.com/html-to-pdf/

The above code will certainly help in converting HTML to PDF but will fail if the the HTML code has IMG tags with relative paths. iTextSharp library does not automatically convert relative paths to absolute ones.

I tried the above code and added code to take care of IMG tags too.

You can find the code here for your reference:
http://www.am22tech.com/html-to-pdf/

公布 2024-09-08 08:01:59

它能够将 HTML 文件转换为 pdf。

转换所需的命名空间为:

using iTextSharp.text;
using iTextSharp.text.pdf;

转换和下载文件所需的命名空间为:

// Create a byte array that will eventually hold our final PDF
Byte[] bytes;

// Boilerplate iTextSharp setup here

// Create a stream that we can write to, in this case a MemoryStream
using (var ms = new MemoryStream())
{
    // Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
    using (var doc = new Document())
    {
        // Create a writer that's bound to our PDF abstraction and our stream
        using (var writer = PdfWriter.GetInstance(doc, ms))
        {
            // Open the document for writing
            doc.Open();

            string finalHtml = string.Empty;

            // Read your html by database or file here and store it into finalHtml e.g. a string
            // XMLWorker also reads from a TextReader and not directly from a string
            using (var srHtml = new StringReader(finalHtml))
            {
                // Parse the HTML
                iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
            }

            doc.Close();
        }
    }

    // After all of the PDF "stuff" above is done and closed but **before** we
    // close the MemoryStream, grab all of the active bytes from the stream
    bytes = ms.ToArray();
}

// Clear the response
Response.Clear();
MemoryStream mstream = new MemoryStream(bytes);

// Define response content type
Response.ContentType = "application/pdf";

// Give the name of file of pdf and add in to header
Response.AddHeader("content-disposition", "attachment;filename=invoice.pdf");
Response.Buffer = true;
mstream.WriteTo(Response.OutputStream);
Response.End();

It has ability to convert HTML file in to pdf.

Required namespace for conversions are:

using iTextSharp.text;
using iTextSharp.text.pdf;

and for conversion and download file :

// Create a byte array that will eventually hold our final PDF
Byte[] bytes;

// Boilerplate iTextSharp setup here

// Create a stream that we can write to, in this case a MemoryStream
using (var ms = new MemoryStream())
{
    // Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
    using (var doc = new Document())
    {
        // Create a writer that's bound to our PDF abstraction and our stream
        using (var writer = PdfWriter.GetInstance(doc, ms))
        {
            // Open the document for writing
            doc.Open();

            string finalHtml = string.Empty;

            // Read your html by database or file here and store it into finalHtml e.g. a string
            // XMLWorker also reads from a TextReader and not directly from a string
            using (var srHtml = new StringReader(finalHtml))
            {
                // Parse the HTML
                iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
            }

            doc.Close();
        }
    }

    // After all of the PDF "stuff" above is done and closed but **before** we
    // close the MemoryStream, grab all of the active bytes from the stream
    bytes = ms.ToArray();
}

// Clear the response
Response.Clear();
MemoryStream mstream = new MemoryStream(bytes);

// Define response content type
Response.ContentType = "application/pdf";

// Give the name of file of pdf and add in to header
Response.AddHeader("content-disposition", "attachment;filename=invoice.pdf");
Response.Buffer = true;
mstream.WriteTo(Response.OutputStream);
Response.End();
东风软 2024-09-08 08:01:59

2020 更新:

将 HTML 转换为 PDF 现在非常简单。您所要做的就是使用 NuGet 安装 itext7itext7.pdfhtml。您可以在 Visual Studio 中通过转到“Project”>“来完成此操作”。 “管理 NuGet 包...”

确保包含此依赖项:

using iText.Html2pdf;

现在只需粘贴这一行即可完成:

HtmlConverter.ConvertToPdf(new FileInfo(@"temp.html"), new FileInfo(@"report.pdf"));

如果您在 Visual Studio 中运行此示例,则您的 html 文件应位于 / bin/Debug 目录。

如果您感兴趣,这里有一个很好的 资源。另请注意,itext7 已获得 AGPL 许可。

2020 UPDATE:

Converting HTML to PDF is very simple to do now. All you have to do is use NuGet to install itext7 and itext7.pdfhtml. You can do this in Visual Studio by going to "Project" > "Manage NuGet Packages..."

Make sure to include this dependency:

using iText.Html2pdf;

Now literally just paste this one liner and you're done:

HtmlConverter.ConvertToPdf(new FileInfo(@"temp.html"), new FileInfo(@"report.pdf"));

If you're running this example in visual studio, your html file should be in the /bin/Debug directory.

If you're interested, here's a good resource. Also, note that itext7 is licensed under AGPL.

庆幸我还是我 2024-09-08 08:01:59

如果您要在 html 服务器端将 html 转换为 pdf,您可以使用 Rotativa:

Install-Package Rotativa

这是基于 wkhtmltopdf 但它比 iTextSharp 具有更好的 css 支持,并且与 MVC(主要使用)集成非常简单,因为您只需返回pdf 格式的视图:

public ActionResult GetPdf()
{
    //...
    return new ViewAsPdf(model);// and you are done!
} 

If you are converting html to pdf on the html server side you can use Rotativa :

Install-Package Rotativa

This is based on wkhtmltopdf but it has better css support than iTextSharp has and is very simple to integrate with MVC (which is mostly used) as you can simply return the view as pdf:

public ActionResult GetPdf()
{
    //...
    return new ViewAsPdf(model);// and you are done!
} 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文