以编程方式编辑 RTF 以与写字板兼容
我正在将数据从一个文档管理系统传输到另一个系统。
在旧系统中,他们有一个用于插入替换字段的书签按钮。我需要替换替换字段的语法,以便它们能够与新系统一起使用(不是我遇到的问题)。
旧 RTF
{\rtf1\ansi\deflang1033\ftnbj\uc1\deff1
{\fonttbl{\f0 \froman \fcharset0 Times New Roman;}{\f1 \fswiss Arial;}}
{\colortbl ;\red255\green255\blue255 ;\red0\green0\blue0 ;}
{\stylesheet{\f1\fs20\cf2\cb1\ulc2 Normal;}{\cs1\cf2\cb1\ulc2 Default Paragraph Font;}}
{\*\revtbl{Unknown;}}
\paperw12240\paperh15840\margl1440\margr1440\margt1440\margb1440\headery720\footery0\deftab720\formshade\aendnotes\aftnnrlc\pgbrdrhead\pgbrdrfoot
\sectd\pgwsxn12240\pghsxn15840\marglsxn1440\margrsxn1440\margtsxn1440\margbsxn1440\headery720\footery0\sbkpage\pgncont\pgndec
\plain\plain\f1\fs20\ql\plain\f1\fs20 TEST\lang1033\f1 {\field\fldlock{\*\fldinst MERGEFIELD ID}{\fldrslt}} TEST\plain\f1\fs20\par}
在旧系统中打印:
测试 {ID} 测试
和 {ID}
将在打印时替换为正确的 ID 号。
但是这是我的问题如果我只是在写字板中打开RTF,它看起来像
测试测试
并保存 RTF 后看起来
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss Arial;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\f0\fs20 TEST TEST\par
}
我真的不关心大多数其他元数据,但我不明白为什么它要删除 {ID}
。据我所知 查看 MSDN 注意到 \field\fldlock{\*\fldinst MERGEFIELD ID}{\fldrslt}
格式错误。
我应该编写一个正则表达式来匹配字段标签并将其删除,还是有更好的解决方案?
编辑
如果我在 Word 中打开 RTF,也会发生这种情况,但它会使文件太长而无法发布在这里。
I am in the process of transferring data from one document management system to another system.
In the old system they had a bookmark button for inserting replacement fields. I need to replace the syntax for the replacement fields so they will work with the new system (Not the issue I am having).
Old RTF
{\rtf1\ansi\deflang1033\ftnbj\uc1\deff1
{\fonttbl{\f0 \froman \fcharset0 Times New Roman;}{\f1 \fswiss Arial;}}
{\colortbl ;\red255\green255\blue255 ;\red0\green0\blue0 ;}
{\stylesheet{\f1\fs20\cf2\cb1\ulc2 Normal;}{\cs1\cf2\cb1\ulc2 Default Paragraph Font;}}
{\*\revtbl{Unknown;}}
\paperw12240\paperh15840\margl1440\margr1440\margt1440\margb1440\headery720\footery0\deftab720\formshade\aendnotes\aftnnrlc\pgbrdrhead\pgbrdrfoot
\sectd\pgwsxn12240\pghsxn15840\marglsxn1440\margrsxn1440\margtsxn1440\margbsxn1440\headery720\footery0\sbkpage\pgncont\pgndec
\plain\plain\f1\fs20\ql\plain\f1\fs20 TEST\lang1033\f1 {\field\fldlock{\*\fldinst MERGEFIELD ID}{\fldrslt}} TEST\plain\f1\fs20\par}
Which prints in their old system:
TEST {ID} TEST
And {ID}
would be replaced with the correct ID number when printed.
However here is my problem If I just open the RTF in WordPad it looks like
TEST TEST
and after saving the RTF looks like
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss Arial;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\f0\fs20 TEST TEST\par
}
I really don't care about most of the other metadata, but I don't understand is why it is stripping out the {ID}
. From what I can tell by looking on MSDN there is noting malformed about \field\fldlock{\*\fldinst MERGEFIELD ID}{\fldrslt}
.
Should I just write a regular expression to match the field tags and just strip them out or is there a better solution?
EDIT
This also happens if I open up the RTF in Word, but it makes a file too long to post here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我最终使用了正则表达式,如果有人好奇,这里是模式
I ended up using regex, if anyone is curious, here is the pattern