在 PHP 中连接 RTF 文件 (REGEX)

发布于 2024-07-06 14:16:09 字数 764 浏览 11 评论 0原文

我有一个脚本,它接受用户上传的 RTF 文档并将一些个人数据合并到信件中(姓名、地址等),并为多人执行此操作。 我合并信件内容,然后将其与下一个合并信件内容合并,以供所有人记录。

实际上,我将单个 RTF 文档合并到其自身中,以容纳我需要将信件合并到的尽可能多的人员记录。 但是,我需要首先删除每个合并的关闭 RTF 标记和打开 RTF 标记,否则 RTF 将无法正确呈现。 这听起来像是正则表达式的工作。

本质上我需要一个正则表达式来删除整个字符串:

}\n\page ANYTHING \par

示例,这个正则表达式将与此匹配:

crap
}
\page{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}}
{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\f0\fs20 September 30, 2008\par
more crap

所以我可以使它只是:

crap
\page
more crap

RegEx 是这里最好的方法吗?

更新:为什么我必须使用 RTF?

我想让用户上传一封套用信函,然后系统将使用该套用信函创建合并的信函。 由于 RTF 是纯文本,因此我可以在代码中轻松完成此操作。 我知道,RTF 是一个灾难性的规范,但我不知道还有其他好的选择。

I've got a script that takes a user uploaded RTF document and merges in some person data into the letter (name, address, etc), and does this for multiple people. I merge the letter contents, then combine that with the next merge letter contents, for all people records.

Affectively I'm combining a single RTF document into itself for as many people records to which I need to merge the letter. However, I need to first remove the closing RTF markup and opening of the RTF markup of each merge or else the RTF won't render correctly. This sounds like a job for regular expressions.

Essentially I need a regex that will remove the entire string:

}\n\page ANYTHING \par

Example, this regex would match this:

crap
}
\page{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}}
{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\f0\fs20 September 30, 2008\par
more crap

So I could make it just:

crap
\page
more crap

Is RegEx the best approach here?

UPDATE: Why do I have to use RTF?

I want to enable the user to upload a form letter that the system will then use to create the merged letters. Since RTF is plain text, I can do this pretty easily in code. I know, RTF is a disaster of a spec, but I don't know any other good alternative.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

醉酒的小男人 2024-07-13 14:16:09

我对在这种情况下使用 RTF 表示质疑。 我并不完全清楚你总体上想要做什么,所以我不一定能提出更好的建议,但如果你可以尝试更广泛地解释你的项目,也许我可以提供帮助。

如果这确实是您想要的方式,则根据您的输入,此正则表达式为我提供了正确的输出:

$output = preg_replace("/}\s?\n\\\\page.*?\\\\par\s?\n/ms", "\\page\n", $input);

I would question the use of RTF in this case. It's not entirely clear to me what you're trying to do overall, so I can't necessarily suggest anything better, but if you can try to explain your project more broadly, maybe I can help.

If this is really the way you want to go though, this regex gave me the correct output given your input:

$output = preg_replace("/}\s?\n\\\\page.*?\\\\par\s?\n/ms", "\\page\n", $input);
静谧幽蓝 2024-07-13 14:16:09

对此我可以说太恶心了。 尽管如此,rcar 的 cludge 可能会起作用,除非有一些奇怪的边缘情况,即 RTF 实际上并不以这种形式结尾,或者文档范围的样式包含完全混乱格式的重要信息,或者任何其他许多失败模式。

To this I can say ick ick ick. Nevertheless, rcar's cludge probably will work, barring some weird edge-case where RTF doesn't actually end in that form, or the document-wide styles include important information that utterly messes up the formatting, or any other of the many failure modes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文