C# .net 将 HTML 转换为 RTF

发布于 2024-11-05 12:12:10 字数 1539 浏览 5 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

忘羡 2024-11-12 12:12:10

创建一个网络浏览器。加载 html 内容。选择全部并从中复制。粘贴到 Richtextbox 中。然后你就得到了 RTF

string html = "...."; // html content
RichTextBox rtbTemp = new RichTextBox();
WebBrowser wb = new WebBrowser();
wb.Navigate("about:blank");

wb.Document.Write(html);
wb.Document.ExecCommand("SelectAll", false, null);
wb.Document.ExecCommand("Copy", false, null);

rtbTemp.SelectAll();
rtbTemp.Paste();

现在 rtbTemp.RTF 已经从 HTML 转换成了 RTF。

Create a WebBrowser. Load it with the html content. Select all and copy from it. Paste into a richtextbox. Then you have the RTF

string html = "...."; // html content
RichTextBox rtbTemp = new RichTextBox();
WebBrowser wb = new WebBrowser();
wb.Navigate("about:blank");

wb.Document.Write(html);
wb.Document.ExecCommand("SelectAll", false, null);
wb.Document.ExecCommand("Copy", false, null);

rtbTemp.SelectAll();
rtbTemp.Paste();

Now rtbTemp.RTF has the RTF converted from the HTML.

油饼 2024-11-12 12:12:10

TL;DR: 如果可能,我建议使用 OpenXml 格式和 HtmlToOpenXml nuget 包。


Microsoft Word COM

我并没有真正深入研究这个主题,因为我的用例是在服务器上使用该功能,这使得 COM 组件不是一个很好的选择。


< strong>XHTML2RTF

正如 @IAmTimCorey 提到的,您可以使用此 codeproject 库。

缺点是:

  • 支持的 HTML 和 CSS 有限
  • 不是真正的 .NET
  • ...

Windows 窗体 Web 浏览器

正如 @Jerry 提到的,您可以使用 Windows 窗体 WebBrowser 控件。

缺点是:

  • System.Windows.Forms 的引用
  • 使用 copy & 。粘贴(多线程有问题)
  • 仅适用于 STA 线程

不支持的功能包括:

  • 字体
  • 颜色
  • 编号列表
  • 删除线(del 元素)
  • ...

DevExpress

“Paul V”的代码示例”来自 devexpress 支持中心。 (03.02.2015)

public String ConvertRTFToHTML(String RTF)
{   
    MemoryStream ms = new MemoryStream();
    StreamWriter writer = new StreamWriter(ms);
    writer.Write(RTF);
    writer.Flush();
    ms.Position = 0;
    String output = "";
    HtmlEditorExtension.Import(HtmlEditorImportFormat.Rtf, ms, (s, enumerable) => output = s);

    return output;
}

public String ConvertHTMLToRTF(String Html)
{
    MemoryStream ms = new MemoryStream();
    var editor = new ASPxHtmlEditor { Html = html };

    editor.Export(HtmlEditorExportFormat.Rtf, ms);

    ms.Position = 0;
    StreamReader reader = new StreamReader(ms);

    return reader.ReadToEnd();
}

或者您可以使用 RichEditDocumentServer 类型,如 此示例

未知实际支持什么。

缺点是:

  • 价格
  • 一件小事有很多参考
  • 更多?

不支持的功能包括:

  • Striketrough(del 元素)

Sautinsoft

public string ConvertHTMLToRTF(string html)
{
    SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();
    return h.ConvertString(htmlString);
}

public string ConvertRTFToHTML(string rtf)
{
    SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
    byte[] bytes = Encoding.ASCII.GetBytes(rtf);
    r.OpenDocx(bytes );
    return r.ToHtml();
}

更多示例和配置选项请参见 此处此处

支持以下内容

  • HTML 3.2
  • HTML 4.01
  • HTML 5
  • CSS
  • XHTML 的

缺点是:

  • 我不确定开发的活跃程度
  • 价格

使用知识库:


DIY

如果您只想支持有限的功能,您可以编写自己的转换器。如果支持的功能集太大,我不会推荐这样做。 (Sautinsoft 声称已编写了 20,000 多行代码)。

我有一个小型示例项目,但在当前状态下仅用于教育目的。


OpenXml

如果 OpenXml 格式也适合您的用例,您可以使用 HtmlToOpenXml nuget 包。它是免费的,并且支持我测试过其他解决方案的所有功能。

该项目基于由 microsoft 提供的 Open Xml SDK,并且似乎处于活动状态。

public static byte[] ConvertHtmlToOpenXml(string html)
{
    using (var generatedDocument = new MemoryStream())
    {
        using (var package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
        {
            var mainPart = package.MainDocumentPart;
            if (mainPart == null)
            {
                mainPart = package.AddMainDocumentPart();
                new Document(new Body()).Save(mainPart);
            }

            var converter = new HtmlConverter(mainPart);
            converter.ParseHtml(html);

            mainPart.Document.Save();
        }

        return generatedDocument.ToArray();
    }
}

TL;DR: I recommend using the OpenXml format and the HtmlToOpenXml nuget package if possible.


Microsoft Word COM

I haven't really searched much into this topic as a my use case is to use the functionality on a server which makes COM components not a great selection.


XHTML2RTF

As @IAmTimCorey mentioned you can use this codeproject library.

Disadvantages are:

  • Limited supported HTML and CSS
  • Not really .NET
  • ...

Windows Forms Web Browser

As @Jerry mentioned you can use the Windows Forms WebBrowser control.

Disadvantages are:

  • Reference to System.Windows.Forms
  • Uses copy & paste (problematic for multithreading)
  • Only works in an STA thread

Not supported features include:

  • Fonts
  • Colors
  • Numbered lists
  • Strikethrough (del element)
  • ...

DevExpress

Code sample of "Paul V" from the devexpress support center. (03.02.2015)

public String ConvertRTFToHTML(String RTF)
{   
    MemoryStream ms = new MemoryStream();
    StreamWriter writer = new StreamWriter(ms);
    writer.Write(RTF);
    writer.Flush();
    ms.Position = 0;
    String output = "";
    HtmlEditorExtension.Import(HtmlEditorImportFormat.Rtf, ms, (s, enumerable) => output = s);

    return output;
}

public String ConvertHTMLToRTF(String Html)
{
    MemoryStream ms = new MemoryStream();
    var editor = new ASPxHtmlEditor { Html = html };

    editor.Export(HtmlEditorExportFormat.Rtf, ms);

    ms.Position = 0;
    StreamReader reader = new StreamReader(ms);

    return reader.ReadToEnd();
}

Or you could use the RichEditDocumentServer type as shown in this example.

Unknown what actually is supported.

Disadvantages are:

  • Price
  • Quite a lot of references for one small thing
  • More?

Not supported features include:

  • Striketrough (del element)

Sautinsoft

public string ConvertHTMLToRTF(string html)
{
    SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();
    return h.ConvertString(htmlString);
}

public string ConvertRTFToHTML(string rtf)
{
    SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
    byte[] bytes = Encoding.ASCII.GetBytes(rtf);
    r.OpenDocx(bytes );
    return r.ToHtml();
}

More examples and configuration options can be found here and here.

Supported is the following:

  • HTML 3.2
  • HTML 4.01
  • HTML 5
  • CSS
  • XHTML

Disadvantages are:

  • I'm not sure how active the development is
  • Price

Usage knowledgebase:


DIY

If you only wanted to support limited functionality you could write your own converter. I would not recommend this if the supported feature set is too large. (Sautinsoft claims to have written over 20'000 lines of code).

I have a small sample project here but is only for educational purposes in its current state.


OpenXml

If the OpenXml format is also ok for your use case you can use the HtmlToOpenXml nuget package. Its free and did support all features I've tested the other solutions against.

The project is based on the Open Xml SDK by microsoft and seems active.

public static byte[] ConvertHtmlToOpenXml(string html)
{
    using (var generatedDocument = new MemoryStream())
    {
        using (var package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
        {
            var mainPart = package.MainDocumentPart;
            if (mainPart == null)
            {
                mainPart = package.AddMainDocumentPart();
                new Document(new Body()).Save(mainPart);
            }

            var converter = new HtmlConverter(mainPart);
            converter.ParseHtml(html);

            mainPart.Document.Save();
        }

        return generatedDocument.ToArray();
    }
}

寄人书 2024-11-12 12:12:10

ExpertsExchange 文章充其量是一篇糟糕的文章。基本上OP放弃了,因为他们无法给出一个好的答案。他们列出了 CodeProject 文章的链接 ( http://www.codeproject.com/KB/ HTML/XHTML2RTF.aspx )向您展示如何将 HTML 转换为 RTF,但它并不是真正的 .NET 解决方案。相反,这将是需要高度适应的东西。

根据我的经验,目前还没有一个好的开源转换器。所有的碎片似乎都在那里,但正在等待有人将它们整合在一起。然而,您问题的直接答案是目前还没有转换器。

The ExpertsExchange article is a poor one at best. Basically the OP gave up because they couldn't give a good answer. They list a link to the CodeProject article ( http://www.codeproject.com/KB/HTML/XHTML2RTF.aspx ) that shows you how to convert HTML to RTF but it isn't really a .NET solution. Instead, it would be something that would need to be highly adapted.

From my experience, there isn't a good open source converter out there. The pieces all seem to be there but it is waiting for someone to do the legwork of putting it all together. However, the immediate answer to your question is that there is not a converter already out there.

是你 2024-11-12 12:12:10

似乎有一个基于 WPF RichTextBox 的新开源解决方案。唯一需要注意的是它在核心中仅支持 STAThreaded 应用程序,并且为了在 ie ASP.net 中使用,您需要在 STAThread 中调用它(但在文章中有一个示例)。

对于在 VSTO 加载项中使用,这已被确认可以工作(即 Outlook RTFBody)

Nuget:
https://www.nuget.org/packages/MarkupConverter/

项目:
https://github.com/figuemon/MarkupConverter

写入:
https://code.msdn.microsoft.com/Converting- Between- RTF-和-aaa02a6e

There seems to be a new opensource solution based on a WPF RichTextBox. The only caveat is it in the core only supports STAThreaded applications and in order to use in a i.e. ASP.net you need to call it in a STAThread (but there is a sample for that in the writeup).

For use in VSTO add-ins this is confirmed to work (ie. Outlook RTFBody)

Nuget:
https://www.nuget.org/packages/MarkupConverter/

Project:
https://github.com/figuemon/MarkupConverter

Writeup:
https://code.msdn.microsoft.com/Converting-between-RTF-and-aaa02a6e

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文