使用MVC打开xml替换word文件中的文本并返回内存流

发布于 2024-11-01 10:20:53 字数 1424 浏览 4 评论 0原文

我有一个包含指定模式文本 {pattern} 的 Word 文件，我想用从数据库读取的新字符串替换这些模式。因此，我使用从 docx 模板文件中打开 xml 读取流来替换我的模式字符串，然后返回到支持下载文件而无需创建临时文件的流。但是当我打开它时，它在 docx 文件上生成了错误。下面是我的示例代码，

public ActionResult SearchAndReplace(string FilePath)
{
    MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(FilePath));
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(mem, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

//Instead using this code below to write text back the original file. I write new string back to memory stream and return to a stream download file
        //using (StreamWriter sw = new //StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        //{
        //    sw.Write(docText);
        //}

        using (StreamWriter sw = new StreamWriter(mem))
                    {
                        sw.Write(docText);
                    }
    }
    mem.Seek(0, SeekOrigin.Begin); 

    return File(mem, "application/octet-stream","download.docx"); //Return to download file
}

请建议我任何解决方案，而不是从 Word 文件中读取文本并替换那些预期的模式文本，然后将数据写回原始文件。是否有任何解决方案用 WordprocessingDocument 库替换文本？如何使用验证 docx 文件格式返回内存流？

原文

I have an word file that contain my specified pattern text {pattern} and I want to replace those pattern with new my string which was read from database. So I used open xml read stream from my docx template file the replace my pattern string then returned to stream which support to download file without create a temporary file. But when I opened it generated me error on docx file. Below is my example code

public ActionResult SearchAndReplace(string FilePath)
{
    MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(FilePath));
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(mem, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

//Instead using this code below to write text back the original file. I write new string back to memory stream and return to a stream download file
        //using (StreamWriter sw = new //StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        //{
        //    sw.Write(docText);
        //}

        using (StreamWriter sw = new StreamWriter(mem))
                    {
                        sw.Write(docText);
                    }
    }
    mem.Seek(0, SeekOrigin.Begin); 

    return File(mem, "application/octet-stream","download.docx"); //Return to download file
}

Please suggest me any solutions instead read a text from a word file and replace those expected pattern text then write data back to the original file. Are there any solutions replace text with WordprocessingDocument libary? How can I return to memory stream with validation docx file format?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

策马西风 2024-11-08 10:20:53

您所采取的方法是不正确的。如果您正在搜索的模式偶然与某些 Open XML 标记相匹配，则会损坏文档。如果您要搜索的文本被分割多次运行，您的搜索/替换代码将找不到该文本并且无法正确运行。如果您想要搜索并替换 WordprocessingML 文档中的文本，可以使用一个相当简单的算法：

将所有运行分解为单个运行
特点。这包括运行
有特殊字符，例如
换行符、回车符或硬符
选项卡。
然后就很容易找到一个
与字符匹配的一组运行
在您的搜索字符串中。
一旦您确定了一组匹配的运行，
然后你可以替换那组运行
与新创建的运行（其中有
运行的运行属性
包含第一个字符
与搜索字符串匹配）。
替换单字符运行后
通过新创建的运行，您可以
然后合并相邻的运行
相同的格式。

我写了一篇博文并录制了一个演示该算法的截屏视频。

博客文章： http://openxmldeveloper.org/archive/2011/05/12/ 148357.aspx
屏幕截图：http://www.youtube.com/watch?v=w128hJUu3GM

-埃里克

回复收藏 0 原文

沉默的熊 2024-11-08 10:20:53

string sourcepath = HttpContext.Server.MapPath("~/File/Form/s.docx");            
string targetPath = HttpContext.Server.MapPath("~/File/ExportTempFile/" + DateTime.Now.ToOADate() + ".docx");
System.IO.File.Copy(sourcepath, targetPath, true);
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(targetPath, true))
{
    string docText = null;
    using (StreamReader sr = new StreamReader(wordDocument.MainDocumentPart.GetStream()))
    {
        docText = sr.ReadToEnd();
    }
    Regex regexText = new Regex("Hello world!");
    docText = regexText.Replace(docText, "Hi Everyone!");
    byte[] byteArray = Encoding.UTF8.GetBytes(docText); 
    MemoryStream stream = new MemoryStream(byteArray);
    wordDocument.MainDocumentPart.FeedData(stream);
}
MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(targetPath));
return File(mem, "application/octet-stream", "download.docx");

string sourcepath = HttpContext.Server.MapPath("~/File/Form/s.docx");            
string targetPath = HttpContext.Server.MapPath("~/File/ExportTempFile/" + DateTime.Now.ToOADate() + ".docx");
System.IO.File.Copy(sourcepath, targetPath, true);
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(targetPath, true))
{
    string docText = null;
    using (StreamReader sr = new StreamReader(wordDocument.MainDocumentPart.GetStream()))
    {
        docText = sr.ReadToEnd();
    }
    Regex regexText = new Regex("Hello world!");
    docText = regexText.Replace(docText, "Hi Everyone!");
    byte[] byteArray = Encoding.UTF8.GetBytes(docText); 
    MemoryStream stream = new MemoryStream(byteArray);
    wordDocument.MainDocumentPart.FeedData(stream);
}
MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(targetPath));
return File(mem, "application/octet-stream", "download.docx");

回复收藏 0 原文

就像说晚安 2024-11-08 10:20:53

直接写入Word文档流确实会损坏它。
您应该改为写入 MainDocumentPart 流，但您应该首先截断它。
看起来 MainDocumentPart.FeedData(Stream sourceStream) 方法就可以做到这一点。

我还没有测试过，但这应该有效。

public ActionResult SearchAndReplace(string FilePath)
{
    MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(FilePath));
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(mem, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

        using (MemoryStream ms = new MemoryStream())
        {
            using (StreamWriter sw = new StreamWriter(ms))
            {
                sw.Write(docText);
            }
            ms.Seek(0, SeekOrigin.Begin);
            wordDoc.MainDocumentPart.FeedData(ms);
        }
    }
    mem.Seek(0, SeekOrigin.Begin); 

    return File(mem, "application/octet-stream","download.docx"); //Return to download file
}

Writing directly to the word document stream will indeed corrupt it.
You should instead write to the MainDocumentPart stream, but you should first truncate it.
It looks like MainDocumentPart.FeedData(Stream sourceStream) method will do just that.

I haven't tested it but this should work.

public ActionResult SearchAndReplace(string FilePath)
{
    MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(FilePath));
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(mem, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

        using (MemoryStream ms = new MemoryStream())
        {
            using (StreamWriter sw = new StreamWriter(ms))
            {
                sw.Write(docText);
            }
            ms.Seek(0, SeekOrigin.Begin);
            wordDoc.MainDocumentPart.FeedData(ms);
        }
    }
    mem.Seek(0, SeekOrigin.Begin); 

    return File(mem, "application/octet-stream","download.docx"); //Return to download file
}

回复收藏 0 原文

~没有更多了~