使用MVC打开xml替换word文件中的文本并返回内存流

发布于 2024-11-01 10:20:53 字数 1424 浏览 4 评论 0原文

我有一个包含指定模式文本 {pattern} 的 Word 文件,我想用从数据库读取的新字符串替换这些模式。因此,我使用从 docx 模板文件中打开 xml 读取流来替换我的模式字符串,然后返回到支持下载文件而无需创建临时文件的流。但是当我打开它时,它在 docx 文件上生成了错误。下面是我的示例代码,

public ActionResult SearchAndReplace(string FilePath)
{
    MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(FilePath));
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(mem, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

//Instead using this code below to write text back the original file. I write new string back to memory stream and return to a stream download file
        //using (StreamWriter sw = new //StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        //{
        //    sw.Write(docText);
        //}

        using (StreamWriter sw = new StreamWriter(mem))
                    {
                        sw.Write(docText);
                    }
    }
    mem.Seek(0, SeekOrigin.Begin); 

    return File(mem, "application/octet-stream","download.docx"); //Return to download file
}

请建议我任何解决方案,而不是从 Word 文件中读取文本并替换那些预期的模式文本,然后将数据写回原始文件。是否有任何解决方案用 WordprocessingDocument 库替换文本?如何使用验证 docx 文件格式返回内存流?

I have an word file that contain my specified pattern text {pattern} and I want to replace those pattern with new my string which was read from database. So I used open xml read stream from my docx template file the replace my pattern string then returned to stream which support to download file without create a temporary file. But when I opened it generated me error on docx file. Below is my example code

public ActionResult SearchAndReplace(string FilePath)
{
    MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(FilePath));
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(mem, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

//Instead using this code below to write text back the original file. I write new string back to memory stream and return to a stream download file
        //using (StreamWriter sw = new //StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        //{
        //    sw.Write(docText);
        //}

        using (StreamWriter sw = new StreamWriter(mem))
                    {
                        sw.Write(docText);
                    }
    }
    mem.Seek(0, SeekOrigin.Begin); 

    return File(mem, "application/octet-stream","download.docx"); //Return to download file
}

Please suggest me any solutions instead read a text from a word file and replace those expected pattern text then write data back to the original file. Are there any solutions replace text with WordprocessingDocument libary? How can I return to memory stream with validation docx file format?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

策马西风 2024-11-08 10:20:53

您所采取的方法是不正确的。如果您正在搜索的模式偶然与某些 Open XML 标记相匹配,则会损坏文档。如果您要搜索的文本被分割多次运行,您的搜索/替换代码将找不到该文本并且无法正确运行。如果您想要搜索并替换 WordprocessingML 文档中的文本,可以使用一个相当简单的算法:

  • 将所有运行分解为单个运行
    特点。这包括运行
    有特殊字符,例如
    换行符、回车符或硬符
    选项卡。
  • 然后就很容易找到一个
    与字符匹配的一组运行
    在您的搜索字符串中。
  • 一旦您确定了一组匹配的运行,
    然后你可以替换那组运行
    与新创建的运行(其中有
    运行的运行属性
    包含第一个字符
    与搜索字符串匹配)。
  • 替换单字符运行后
    通过新创建的运行,您可以
    然后合并相邻的运行
    相同的格式。

我写了一篇博文并录制了一个演示该算法的截屏视频。

博客文章: http://openxmldeveloper.org/archive/2011/05/12/ 148357.aspx
屏幕截图:http://www.youtube.com/watch?v=w128hJUu3GM

-埃里克

The approach you are taking is not correct. If, by chance, the pattern you are searching for matches some Open XML markup, you will corrupt the document. If the text you are searching for is split over multiple runs, your search/replace code will not find the text and will not operate correctly. If you want to search and replace text in a WordprocessingML document, there is a fairly easy algorithm that you can use:

  • Break all runs into runs of a single
    character. This includes runs that
    have special characters such as a
    line break, carriage return, or hard
    tab.
  • It is then pretty easy to find a
    set of runs that match the characters
    in your search string.
  • Once you have identified a set of runs that match,
    then you can replace that set of runs
    with a newly created run (which has
    the run properties of the run
    containing the first character that
    matched the search string).
  • After replacing the single-character runs
    with a newly created run, you can
    then consolidate adjacent runs with
    identical formatting.

I've written a blog post and recorded a screen-cast that walks through this algorithm.

Blog post: http://openxmldeveloper.org/archive/2011/05/12/148357.aspx
Screen cast: http://www.youtube.com/watch?v=w128hJUu3GM

-Eric

沉默的熊 2024-11-08 10:20:53
string sourcepath = HttpContext.Server.MapPath("~/File/Form/s.docx");            
string targetPath = HttpContext.Server.MapPath("~/File/ExportTempFile/" + DateTime.Now.ToOADate() + ".docx");
System.IO.File.Copy(sourcepath, targetPath, true);
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(targetPath, true))
{
    string docText = null;
    using (StreamReader sr = new StreamReader(wordDocument.MainDocumentPart.GetStream()))
    {
        docText = sr.ReadToEnd();
    }
    Regex regexText = new Regex("Hello world!");
    docText = regexText.Replace(docText, "Hi Everyone!");
    byte[] byteArray = Encoding.UTF8.GetBytes(docText); 
    MemoryStream stream = new MemoryStream(byteArray);
    wordDocument.MainDocumentPart.FeedData(stream);
}
MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(targetPath));
return File(mem, "application/octet-stream", "download.docx");
string sourcepath = HttpContext.Server.MapPath("~/File/Form/s.docx");            
string targetPath = HttpContext.Server.MapPath("~/File/ExportTempFile/" + DateTime.Now.ToOADate() + ".docx");
System.IO.File.Copy(sourcepath, targetPath, true);
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(targetPath, true))
{
    string docText = null;
    using (StreamReader sr = new StreamReader(wordDocument.MainDocumentPart.GetStream()))
    {
        docText = sr.ReadToEnd();
    }
    Regex regexText = new Regex("Hello world!");
    docText = regexText.Replace(docText, "Hi Everyone!");
    byte[] byteArray = Encoding.UTF8.GetBytes(docText); 
    MemoryStream stream = new MemoryStream(byteArray);
    wordDocument.MainDocumentPart.FeedData(stream);
}
MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(targetPath));
return File(mem, "application/octet-stream", "download.docx");
就像说晚安 2024-11-08 10:20:53

直接写入Word文档流确实会损坏它。
您应该改为写入 MainDocumentPart 流,但您应该首先截断它。
看起来 MainDocumentPart.FeedData(Stream sourceStream) 方法就可以做到这一点。

我还没有测试过,但这应该有效。

public ActionResult SearchAndReplace(string FilePath)
{
    MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(FilePath));
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(mem, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

        using (MemoryStream ms = new MemoryStream())
        {
            using (StreamWriter sw = new StreamWriter(ms))
            {
                sw.Write(docText);
            }
            ms.Seek(0, SeekOrigin.Begin);
            wordDoc.MainDocumentPart.FeedData(ms);
        }
    }
    mem.Seek(0, SeekOrigin.Begin); 

    return File(mem, "application/octet-stream","download.docx"); //Return to download file
}

Writing directly to the word document stream will indeed corrupt it.
You should instead write to the MainDocumentPart stream, but you should first truncate it.
It looks like MainDocumentPart.FeedData(Stream sourceStream) method will do just that.

I haven't tested it but this should work.

public ActionResult SearchAndReplace(string FilePath)
{
    MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(FilePath));
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(mem, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

        using (MemoryStream ms = new MemoryStream())
        {
            using (StreamWriter sw = new StreamWriter(ms))
            {
                sw.Write(docText);
            }
            ms.Seek(0, SeekOrigin.Begin);
            wordDoc.MainDocumentPart.FeedData(ms);
        }
    }
    mem.Seek(0, SeekOrigin.Begin); 

    return File(mem, "application/octet-stream","download.docx"); //Return to download file
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文