如何使用字节流形成Word文档
我有一个字节流,实际上(如果正确的话)将形成一个有效的Word文件,我需要将此流转换为Word文件而不将其写入磁盘,我从SQL Server数据库表中获取原始流:
ID Name FileData
----------------------------------------
1 Word1 292jf2jf2ofm29fj29fj29fj29f2jf29efj29fj2f9 (actual file data)
FileData字段携带数据。
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document doc = new Microsoft.Office.Interop.Word.Document();
doc = word.Documents.Open(@"C:\SampleText.doc");
doc.Activate();
上面的代码从文件系统打开并填充一个Word文件,我不想这样,我想定义一个新的Microsoft.Office.Interop.Word.Document
,但我想填充它的内容手动从字节流。
获取内存中的Word文档后,我想做一些关键字的解析。
有什么想法吗?
I have a stream of bytes which actually (if put right) will form a valid Word file, I need to convert this stream into a Word file without writing it to disk, I take the original stream from SQL Server database table:
ID Name FileData
----------------------------------------
1 Word1 292jf2jf2ofm29fj29fj29fj29f2jf29efj29fj2f9 (actual file data)
the FileData field carries the data.
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document doc = new Microsoft.Office.Interop.Word.Document();
doc = word.Documents.Open(@"C:\SampleText.doc");
doc.Activate();
The above code opens and fill a Word file from File System, I don't want that, I want to define a new Microsoft.Office.Interop.Word.Document
, but I want to fill its content manually from byte stream.
After getting the in-memory Word document, I want to do some parsing of keywords.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
需要注意的一件重要事情是:将文件存储在数据库中通常不是一个好的设计。
One important thing to note: storing files in a database is generally not good design.
你可以看看Sharepoint是如何解决这个问题的。他们为数据库中存储的文档创建了一个网络界面。
在您的应用程序中创建或嵌入一个可以向 Word 提供页面服务的 Web 服务器并不难。您甚至不必使用标准端口。
You could look at how Sharepoint solves this. They have created a web interface for documents stored in their database.
Its not that hard to create or embed a webserver in your application that can serve pages to Word. You don't even have to use the standard ports.
可能没有任何直接的方法可以做到这一点。我找到了几个搜索它的解决方案:
互操作
我不知道这是否适合您,但显然API 无法提供您想要的东西(不幸的是)。
There probably isn't any straight-forward way of doing this. I found a couple of solutions searching for it:
Interop
I don't know if this does it for you, but apparently the API doesn't provide what you're after (unfortunately).
实际上只有两种方法可以以编程方式打开 Word 文档 - 作为物理文件或作为流。有一个“包”,但实际上并不适用。
这里介绍了流方法: https://learn.microsoft.com/en-us/office/open-xml/how-to-open-a-word-processing-document-from-a-stream
但即使它也依赖于存在一个有序的物理文件形成流:
我可以提供的最佳解决方案是将文件写入应用程序的服务帐户有权写入的临时位置:
如果它没有对我的示例中的“temp”文件夹的权限,您只需添加应用程序的服务帐户(应用程序池,如果它是网站)即可完全控制该文件夹。
您可以使用此
WriteFile()
函数:从那里,您可以使用 OpenXML 打开它并编辑该文件。无法直接在 Word 实例(Interop、OpenXML 或其他)中打开 byte[] 形式的 Word 文档,因为您需要一个
documentPath
或前面提到的依赖于此的流方法是一个物理文件。您可以通过将字节读入字符串和 XML 中来编辑所获得的字节,或者直接编辑字符串:参考:
https://learn.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part
我知道不是理想的,但我已经搜索过,但没有找到一种直接编辑
byte[]
的方法,无需进行转换,即写出文件,在 Word 中打开它进行编辑,然后将其重新上传到恢复新字节。在重新读取文件之前执行byte[] byteArray = Encoding.UTF8.GetBytes(docText);
会损坏它们,就像我尝试过的任何其他Encoding
(UTF7
、默认
、Unicode
、ASCII
),当我尝试使用我的WriteFile()
函数,上面最后一行。当不编码并简单地使用 File.ReadAllBytes() 收集,然后使用 WriteFile() 将字节写回时,它工作得很好。更新:
可以像这样操作字节:
参考:
https://learn.microsoft.com/en-us/previous-versions/office/office-12//ee945362(v=office.12)
但请注意,即使这种方法也需要保存文档,然后将其读回,以便将其保存为数据库的字节。如果文档在打开文档的行上采用
.doc
格式而不是.docx
格式,也会失败。您可以在位于
WordprocessingDocument.Open()
块之外但仍在内部时,直接获取内存流并将其保存回字节,而不用使用最后一部分将文件保存到文件系统using (MemoryStream mem = new MemoryStream() { ... }
语句:这将包含您的 Word 文档
byte[]
。There are really only 2 ways to open a Word document programmatically - as a physical file or as a stream. There's a "package", but that's not really applicable.
The stream method is covered here: https://learn.microsoft.com/en-us/office/open-xml/how-to-open-a-word-processing-document-from-a-stream
But even it relies on there being a physical file in order to form the stream:
The best solution I can offer would be to write the file out to a temp location where the service account for the application has permission to write:
If it didn't have permissions on the "temp" folder in my example, you would simply just add the service account of your application (application pool, if it's a website) to have Full Control of the folder.
You'd use this
WriteFile()
function:From there, you can open it with OpenXML and edit the file. There's no way to open a Word document in byte[] form directly into an instance of Word - Interop, OpenXML, or otherwise - because you need a
documentPath
, or the stream method mentioned earlier that relies on there being a physical file. You can edit the bytes you would get by reading the bytes into a string, and XML afterwards, or just edit the string, directly:Reference:
https://learn.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part
I know it's not ideal, but I have searched and not found a way to edit the
byte[]
directly without a conversion that involves writing out the file, opening it in Word for the edits, then essentially re-uploading it to recover the new bytes. Doingbyte[] byteArray = Encoding.UTF8.GetBytes(docText);
prior to re-reading the file will corrupt them, as would any otherEncoding
I tried (UTF7
,Default
,Unicode
,ASCII
), as I found when I tried to write them back out using myWriteFile()
function, above, in that last line. When not encoded and simply collected usingFile.ReadAllBytes()
, and then writing the bytes back out usingWriteFile()
, it worked fine.Update:
It might be possible to manipulate the bytes like this:
Reference:
https://learn.microsoft.com/en-us/previous-versions/office/office-12//ee945362(v=office.12)
But note that even this method will require saving the document, then reading it back in, in order to save it to bytes for the database. It will also fail if the document is in
.doc
format instead of.docx
on that line where the document is being opened.Instead of that last section for saving the file to the file system, you could just take the memory stream and save that back into bytes once you are outside of the
WordprocessingDocument.Open()
block, but still inside theusing (MemoryStream mem = new MemoryStream() { ... }
statement:This will have your Word document
byte[]
.