如何使用字节流形成Word文档

发布于 2024-11-29 18:42:37 字数 724 浏览 0 评论 0原文

我有一个字节流,实际上(如果正确的话)将形成一个有效的Word文件,我需要将此流转换为Word文件而不将其写入磁盘,我从SQL Server数据库表中获取原始流:

ID   Name    FileData
----------------------------------------
1    Word1   292jf2jf2ofm29fj29fj29fj29f2jf29efj29fj2f9 (actual file data)

FileData字段携带数据。

Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document doc = new Microsoft.Office.Interop.Word.Document(); 
doc = word.Documents.Open(@"C:\SampleText.doc");
doc.Activate();

上面的代码从文件系统打开并填充一个Word文件,我不想这样,我想定义一个新的Microsoft.Office.Interop.Word.Document,但我想填充它的内容手动从字节流。

获取内存中的Word文档后,我想做一些关键字的解析。

有什么想法吗?

I have a stream of bytes which actually (if put right) will form a valid Word file, I need to convert this stream into a Word file without writing it to disk, I take the original stream from SQL Server database table:

ID   Name    FileData
----------------------------------------
1    Word1   292jf2jf2ofm29fj29fj29fj29f2jf29efj29fj2f9 (actual file data)

the FileData field carries the data.

Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document doc = new Microsoft.Office.Interop.Word.Document(); 
doc = word.Documents.Open(@"C:\SampleText.doc");
doc.Activate();

The above code opens and fill a Word file from File System, I don't want that, I want to define a new Microsoft.Office.Interop.Word.Document, but I want to fill its content manually from byte stream.

After getting the in-memory Word document, I want to do some parsing of keywords.

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

神经大条 2024-12-06 18:42:37
  1. 创建一个内存文件系统,有相应的驱动程序。
  2. 为 word 提供一个 ftp 服务器路径(或其他路径)的路径,然后用它来推送数据。

需要注意的一件重要事情是:将文件存储在数据库中通常不是一个好的设计。

  1. Create an in memmory file system, there are drivers for that.
  2. Give word a path to an ftp server path (or something else) which you then use to push the data.

One important thing to note: storing files in a database is generally not good design.

梦旅人picnic 2024-12-06 18:42:37

你可以看看Sharepoint是如何解决这个问题的。他们为数据库中存储的文档创建了一个网络界面。

在您的应用程序中创建或嵌入一个可以向 Word 提供页面服务的 Web 服务器并不难。您甚至不必使用标准端口。

You could look at how Sharepoint solves this. They have created a web interface for documents stored in their database.

Its not that hard to create or embed a webserver in your application that can serve pages to Word. You don't even have to use the standard ports.

心凉怎暖 2024-12-06 18:42:37

可能没有任何直接的方法可以做到这一点。我找到了几个搜索它的解决方案:

我不知道这是否适合您,但显然API 无法提供您想要的东西(不幸的是)。

There probably isn't any straight-forward way of doing this. I found a couple of solutions searching for it:

I don't know if this does it for you, but apparently the API doesn't provide what you're after (unfortunately).

写下不归期 2024-12-06 18:42:37

实际上只有两种方法可以以编程方式打开 Word 文档 - 作为物理文件或作为流。有一个“包”,但实际上并不适用。

这里介绍了流方法: https://learn.microsoft.com/en-us/office/open-xml/how-to-open-a-word-processing-document-from-a-stream

但即使它也依赖于存在一个有序的物理文件形成流:

string strDoc = @"C:\Users\Public\Public Documents\Word13.docx";
Stream stream = File.Open(strDoc, FileMode.Open);

我可以提供的最佳解决方案是将文件写入应用程序的服务帐户有权写入的临时位置:

string newDocument = @"C:\temp\test.docx";
WriteFile(byteArray, newDocument);

如果它没有对我的示例中的“temp”文件夹的权限,您只需添加应用程序的服务帐户(应用程序池,如果它是网站)即可完全控制该文件夹。

您可以使用此 WriteFile() 函数:

/// <summary>
/// Write a byte[] to a new file at the location where you choose
/// </summary>
/// <param name="byteArray">byte[] that consists of file data</param>
/// <param name="newDocument">Path to where the new document will be written</param>
public static void WriteFile(byte[] byteArray, string newDocument)
{
    using (MemoryStream stream = new MemoryStream())
    {
        stream.Write(byteArray, 0, (int)byteArray.Length);

        // Save the file with the new name
        File.WriteAllBytes(newDocument, stream.ToArray());
    }
}

从那里,您可以使用 OpenXML 打开它并编辑该文件。无法直接在 Word 实例(Interop、OpenXML 或其他)中打开 byte[] 形式的 Word 文档,因为您需要一个 documentPath 或前面提到的依赖于此的流方法是一个物理文件。您可以通过将字节读入字符串和 XML 中来编辑所获得的字节,或者直接编辑字符串:

string docText = null;
byte[] byteArray = null;
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(documentPath, true))
{
    using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
    {
        docText = sr.ReadToEnd();  // <-- converts byte[] stream to string
    }

    // Play with the XML
    XmlDocument xml = new XmlDocument();
    xml.LoadXml(docText);  // the string contains the XML of the Word document

    XmlNodeList nodes = xml.GetElementsByTagName("w:body");
    XmlNode chiefBodyNode = nodes[0];
    // add paragraphs with AppendChild... 
    // remove a node by getting a ChildNode and removing it, like this...
    XmlNode firstParagraph = chiefBodyNode.ChildNodes[2];
    chiefBodyNode.RemoveChild(firstParagraph);

    // Or play with the string form
    docText = docText.Replace("John","Joe");

    // If you manipulated the XML, write it back to the string
    //docText = xml.OuterXml;  // comment out the line above if XML edits are all you want to do, and uncomment out this line

     // Save the file - yes, back to the file system - required
     using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
     {                    
        sw.Write(docText);
     }
 }

 // Read it back in as bytes
 byteArray = File.ReadAllBytes(documentPath); // new bytes, ready for DB saving

参考:

https://learn.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part

我知道不是理想的,但我已经搜索过,但没有找到一种直接编辑 byte[] 的方法,无需进行转换,即写出文件,在 Word 中打开它进行编辑,然后将其重新上传到恢复新字节。在重新读取文件之前执行 byte[] byteArray = Encoding.UTF8.GetBytes(docText); 会损坏它们,就像我尝试过的任何其他 Encoding (UTF7默认UnicodeASCII),当我尝试使用我的 WriteFile() 函数,上面最后一行。当不编码并简单地使用 File.ReadAllBytes() 收集,然后使用 WriteFile() 将字节写回时,它工作得很好。

更新:

可以像这样操作字节:

//byte[] byteArray = File.ReadAllBytes("Test.docx"); // you might be able to assign your bytes here, instead of from a file?
byte[] byteArray = GetByteArrayFromDatabase(fileId); // function you have for getting the document from the database
using (MemoryStream mem = new MemoryStream())
{
    mem.Write(byteArray, 0, (int)byteArray.Length);
    using (WordprocessingDocument wordDoc =
            WordprocessingDocument.Open(mem, true))
    {
        // do your updates -- see string or XML edits, above

        // Once done, you may need to save the changes....
        //wordDoc.MainDocumentPart.Document.Save();
    }

    // But you will still need to save it to the file system here....
    // You would update "documentPath" to a new name first...
    string documentPath = @"C:\temp\newDoc.docx";
    using (FileStream fileStream = new FileStream(documentPath,
            System.IO.FileMode.CreateNew))
    {
        mem.WriteTo(fileStream);
    }
}

// And then read the bytes back in, to save it to the database
byteArray = File.ReadAllBytes(documentPath); // new bytes, ready for DB saving

参考:

https://learn.microsoft.com/en-us/previous-versions/office/office-12//ee945362(v=office.12)

但请注意,即使这种方法也需要保存文档,然后将其读回,以便将其保存为数据库的字节。如果文档在打开文档的行上采用 .doc 格式而不是 .docx 格式,也会失败。

您可以在位于 WordprocessingDocument.Open() 块之外但仍在内部时,直接获取内存流并将其保存回字节,而不用使用最后一部分将文件保存到文件系统using (MemoryStream mem = new MemoryStream() { ... } 语句:

// Convert
byteArray = mem.ToArray();

这将包含您的 Word 文档 byte[]

There are really only 2 ways to open a Word document programmatically - as a physical file or as a stream. There's a "package", but that's not really applicable.

The stream method is covered here: https://learn.microsoft.com/en-us/office/open-xml/how-to-open-a-word-processing-document-from-a-stream

But even it relies on there being a physical file in order to form the stream:

string strDoc = @"C:\Users\Public\Public Documents\Word13.docx";
Stream stream = File.Open(strDoc, FileMode.Open);

The best solution I can offer would be to write the file out to a temp location where the service account for the application has permission to write:

string newDocument = @"C:\temp\test.docx";
WriteFile(byteArray, newDocument);

If it didn't have permissions on the "temp" folder in my example, you would simply just add the service account of your application (application pool, if it's a website) to have Full Control of the folder.

You'd use this WriteFile() function:

/// <summary>
/// Write a byte[] to a new file at the location where you choose
/// </summary>
/// <param name="byteArray">byte[] that consists of file data</param>
/// <param name="newDocument">Path to where the new document will be written</param>
public static void WriteFile(byte[] byteArray, string newDocument)
{
    using (MemoryStream stream = new MemoryStream())
    {
        stream.Write(byteArray, 0, (int)byteArray.Length);

        // Save the file with the new name
        File.WriteAllBytes(newDocument, stream.ToArray());
    }
}

From there, you can open it with OpenXML and edit the file. There's no way to open a Word document in byte[] form directly into an instance of Word - Interop, OpenXML, or otherwise - because you need a documentPath, or the stream method mentioned earlier that relies on there being a physical file. You can edit the bytes you would get by reading the bytes into a string, and XML afterwards, or just edit the string, directly:

string docText = null;
byte[] byteArray = null;
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(documentPath, true))
{
    using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
    {
        docText = sr.ReadToEnd();  // <-- converts byte[] stream to string
    }

    // Play with the XML
    XmlDocument xml = new XmlDocument();
    xml.LoadXml(docText);  // the string contains the XML of the Word document

    XmlNodeList nodes = xml.GetElementsByTagName("w:body");
    XmlNode chiefBodyNode = nodes[0];
    // add paragraphs with AppendChild... 
    // remove a node by getting a ChildNode and removing it, like this...
    XmlNode firstParagraph = chiefBodyNode.ChildNodes[2];
    chiefBodyNode.RemoveChild(firstParagraph);

    // Or play with the string form
    docText = docText.Replace("John","Joe");

    // If you manipulated the XML, write it back to the string
    //docText = xml.OuterXml;  // comment out the line above if XML edits are all you want to do, and uncomment out this line

     // Save the file - yes, back to the file system - required
     using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
     {                    
        sw.Write(docText);
     }
 }

 // Read it back in as bytes
 byteArray = File.ReadAllBytes(documentPath); // new bytes, ready for DB saving

Reference:

https://learn.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part

I know it's not ideal, but I have searched and not found a way to edit the byte[] directly without a conversion that involves writing out the file, opening it in Word for the edits, then essentially re-uploading it to recover the new bytes. Doing byte[] byteArray = Encoding.UTF8.GetBytes(docText); prior to re-reading the file will corrupt them, as would any other Encoding I tried (UTF7,Default,Unicode, ASCII), as I found when I tried to write them back out using my WriteFile() function, above, in that last line. When not encoded and simply collected using File.ReadAllBytes(), and then writing the bytes back out using WriteFile(), it worked fine.

Update:

It might be possible to manipulate the bytes like this:

//byte[] byteArray = File.ReadAllBytes("Test.docx"); // you might be able to assign your bytes here, instead of from a file?
byte[] byteArray = GetByteArrayFromDatabase(fileId); // function you have for getting the document from the database
using (MemoryStream mem = new MemoryStream())
{
    mem.Write(byteArray, 0, (int)byteArray.Length);
    using (WordprocessingDocument wordDoc =
            WordprocessingDocument.Open(mem, true))
    {
        // do your updates -- see string or XML edits, above

        // Once done, you may need to save the changes....
        //wordDoc.MainDocumentPart.Document.Save();
    }

    // But you will still need to save it to the file system here....
    // You would update "documentPath" to a new name first...
    string documentPath = @"C:\temp\newDoc.docx";
    using (FileStream fileStream = new FileStream(documentPath,
            System.IO.FileMode.CreateNew))
    {
        mem.WriteTo(fileStream);
    }
}

// And then read the bytes back in, to save it to the database
byteArray = File.ReadAllBytes(documentPath); // new bytes, ready for DB saving

Reference:

https://learn.microsoft.com/en-us/previous-versions/office/office-12//ee945362(v=office.12)

But note that even this method will require saving the document, then reading it back in, in order to save it to bytes for the database. It will also fail if the document is in .doc format instead of .docx on that line where the document is being opened.

Instead of that last section for saving the file to the file system, you could just take the memory stream and save that back into bytes once you are outside of the WordprocessingDocument.Open() block, but still inside the using (MemoryStream mem = new MemoryStream() { ... } statement:

// Convert
byteArray = mem.ToArray();

This will have your Word document byte[].

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文