如何使用C#编写的html创建word文档
我创建了一个必须创建 Word 文档的 C# 应用程序。
我使用 Microsoft.Office.Interop.Word
来执行此操作,并且已成功输出一些 Word 文档,但通过代码创建内容是一项非常耗时的工作。
我注意到word能够打开html页面并将其显示为普通内容,因此我在html中创建了一个简单的测试表并将其插入到word文档中。但是当我输出文档时,明显的事情发生了:标签仍然在那里! Word 未将标签格式化为 html。它只是输出了我放入其中的内容。
如何告诉 word 将文本重新格式化为 html?
编辑:(当然通过 C# 代码)
编辑 2:请注意,我正在解析一些数据来实现此目的,所以我将结束大约有 4 页相同的表/html,所以每次完成循环时我都需要能够告诉 word 从下一页开始。所以仅使用 html 的方法可能行不通。
I creating a C# application that has to create a word document.
I'm using the Microsoft.Office.Interop.Word
to do this and I've successfully managed to output some word documents, but creating the content trough the code is a very time consuming work.
I noted that word is able to open html pages and show it as a normal content so I created a simple test table in html and inserted it into the word document. But when I outputted the document the obvious happened: The tags where still there! Word did not format the tags as html. It just outputted exactly what I put in there.
How can I tell word to reformat the text as html?
edit: (trough the C# code of course)
edit 2: Please note that I'm parsing trough some data to make this, so I will end up with about 4 pages of the same table/html, so I will need to be able to tell word to start at the next page each time I've finished a loop. So a html-only method will probably not work.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您只想将简单的 HTML 内容输出为 Word 文档,您可以随时使用
.doc
扩展名编写 HTML 内容。Word 会很好地打开它。
如果您需要添加分页符,可以使用 CSS
page-break-before
,如下所示:如果您打算使用 Interop,在阅读了一些内容后,这篇文章 指出您需要一个转换器来插入 HTML,并且转换器仅在以下情况下才可访问:
因此,这个答案看起来提供了基于剪贴板的解决方案: 使用 Interop 将 html 文本添加到 Word
但是,如果有任何资金可以花在该项目上,我可以衷心推荐 Aspose.Words 将为您完成所有这些操作。
If you're only wanting to output simple HTML content as a Word document, you could always cheat and write out the HTML content with a
.doc
extension.Word will open that just fine.
If you need to add a page break, you can use a CSS
page-break-before
, like so:If you're set on using Interop, having read up a little bit, this post states that you need a converter to insert HTML, and the converters are only accessible when:
So, this answer looks like it provides a clipboard-based solution : Adding html text to Word using Interop
However, if there's any money to spend on the project, I can heartily recommend Aspose.Words which will do all of this for you.
根据OP的要求,为了让其他人更容易找到这个解决方案,这里是我作为评论发布的答案(加上测试的额外结果):
当打开HTML文件时,MS Word尊重CSS属性
分页前</code> 和
分页后
。然而,有一个警告:在“网页设计”视图中,分页符永远不会显示(这并不意味着它们不存在),就像浏览器不“显示”它们一样。 Word 默认在 Web 设计视图中打开 html 文件(这很有道理)。您需要打印文档或切换到其他视图(通常是“打印设计”)才能看到您的休息的所有荣耀。
因此,保存带有
.doc
扩展名的 HTML 文件是一个可行的解决方案(也经过测试:无论扩展名如何,Word 都能正确打开它)。注意:所有测试都是在 MS Word 2003 上使用以下代码段完成的:
asdf
new page!
As requested by the OP, and to make easier for others to find this solution, here it goes the answer I posted as a comment (plus extra results from testing):
When opening an HTML file, MS Word honors the CSS properties
page-break-before
andpage-break-after
. There is a caveat, however:On "Web design" view, page-breaks are never shown (this doesn't mean that they aren't there), just like browsers don't "show" them. And Word opens html files on Web design view by default (which quite makes sense). You need to print the document or switch to some other view (typicall "Print design") to see your breaks in all their glory.
So, saving an HTML file with a
.doc
extension is a viable solution (also tested: Word opens it properly despite of the extension).Note: all the testing was done on MS Word 2003 using this snippet:
<html>asdf<br style="page-break-before: always;">new page!</html>
不要用代码构建文档,而是在 Word 中将其创建为模板或邮件合并模板,并使用代码来合并或替换字段数据。
在这里查看这个答案
MS Word Office Automation - 填写文本表单字段和复选框表单字段以及邮件合并
并从母舰上查看此内容:
http://msdn.microsoft.com/en-us/library/ff433638.aspx
Don't build the document in code, create it in Word as template or mail merge template and the use code to merge or replace the fields data.
See this answer here
MS Word Office Automation - Filling Text Form Fields And Check Box Form Fields And Mail Merge
And See this from the mothership:
http://msdn.microsoft.com/en-us/library/ff433638.aspx
如果您不想使用外部库,Interop 对您来说太慢,并且纯 HTML 和邮件合并模板都不够灵活,您可以将内容作为文本或 HTML 写入一个或多个文件(使用 C#),创建Word 文档中的 VBA 宏本身会创建第二个 Word 文档,读取内容文件并随后执行您想要的任何格式设置。
您可以通过使用命令行开关 /m 启动 Word 以编程方式运行此宏。
If you don't want to use an external lib, Interop is too slow for you and neither pure HTML nor mail merge template are flexible enough, you could write your content as text or HTML into one or more files (using C#), create a VBA macro in a Word document which by itself creates a second Word document, reads the content files and does any formatting you want afterwards.
You can run this macro programmatically by starting Word using the command line switch /m.
另一种可能的方法是,如果您的 html 是 xhtml(即 XML 兼容),您可以使用 XSLT 将其转换为 Word XML 格式。但这需要花费非常长的时间来编码。
如果您不必使用 HTML 作为起点,您可以简单地自己构建 Word XML 文档,而不是使用 XSLT,这会更容易。耗时但可行——这是我在工作中经常做的事情。
Another possible approach, if your html is xhtml (i.e. XML compliant), you could use XSLT to convert it to a Word XML format. But this would take a LOOOOOOOOOOONG time to code.
If you don't have to use HTML as the starting point you could simply build the Word XML document yourself rather than using XSLT, which would be easier. Time consuming but possible - it's something I do quite a lot in my work.
如果可以选择第三方组件,我会推荐 Aspose 中的内容。
到目前为止,我对他们的工具非常满意。 API 有点混乱,但一切都按预期工作。
If a third party component is an option I would recommend the stuff from Aspose.
I have been pretty happy with their tools so far. The API is a little messy but everything works as one would expect.