如何分割 RTF 文件

发布于 2024-11-28 05:09:51 字数 3359 浏览 1 评论 0原文

我想通过字符串 [BreakPage] 将 RTF 文件(使用 C# 或 VB.Net)拆分为 2 个或更多部分。例如,我有一个包含 [BreakPage] 的文件,需要将其分为两部分:

{\rtf1\ansi\ansicpg1251\uc1\deff0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deflang1049\deflangfe1049{\fonttbl{\f0\froman\fcharset204\fprq2{*\panose 02020603050405020304}Times New Roman;}{\f38\froman\fcharset0\fprq2 宋体;} {\f36\froman\fcharset238\fprq2 宋体 CE;}{\f39\froman\fcharset161\fprq2 宋体 希腊语;}{\f40\froman\fcharset162\fprq2 时代新罗马 Tur;}{\f41\froman\fcharset177\fprq2 Times New Roman(希伯来语);} {\f42\froman\fcharset178\fprq2 时代新罗马 (阿拉伯语);}{\f43\froman\fcharset186\fprq2 Times New Roman 波罗的海;}{\f44\froman\fcharset163\fprq2 时代新罗马 (越南语);}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255; \red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\gre en0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0; \red128\green128\blue128;\red192\green192\blue192;}{\样式表{\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1049\langfe1049\cgrid\langnp1049\langfenp1049\snext0 正常;}{*\cs10 \additive \ssemihidden 默认段落 字体;}{*\ts11\tsrowd\trftsWidthB3\trpaddl108\trpaddr108\trpaddfl3\trpaddft3\trpaddfb3\trpaddfr3\trcbpat 1\trcfpat1\tscellwidthfts0\tsvertalt\tsbrdrt\tsbrdrl\tsbrdrb\tsbrdrr\tsbrdrdgl\tsbrdrdgr\tsbrdrh\tsbrdrv \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs20\lang1024\langfe1024\cgrid\langnp1024\langfenp1024\snext11 \ssemihidden 正常 表;}}{*\latentstyles\lsdstimax156\lsdlockeddef0}{*\rsidtbl \rsid2111663\rsid7154806 \rsid15558346}{*\generator Microsoft Word 11.0.5604;}{\info{\作者程序员}{\操作员 程序员}{\creatim\yr2011\mo8\dy2\hr12\min45}{\revtim\yr2011\mo8\dy5\hr12\min34}{\version3}{\edmins1}{\nofpages1}{\nofwords5}{\nofchars34} {\nofcharsws38} {\vern24689}}\margl1701\margr850\margt1134\margb1134 \widowctrl\ftnbj\aenddoc\noxlattoyen\expshrtn\noultrlspc\dntblnsbdb\nospaceforul\hyphcaps0\horzdoc\dghspace120\dgvspace120\dghorigin1701\dgvorigin1984\dghshow0\dgvshow3 \jcompress\viewkind1\viewscale100\nolnhtadjtbl\rsidroot15558346 \fet0\sectd \linex0\sectdefaultcl\sftnbj {*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang {\pntxta .}}{*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang {\pntxta .}}{*\pnseclvl3 \pndec\pnstart1\pnindent720\pnhang {\pntxta .}}{*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang {\pntxta )}}{*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}} {*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}\pard\plain \ql \li0\ri0\nowidctlpar\faauto\rin0\lin0\itap0 \fs24\lang1049\langfe1049\cgrid\langnp1049\langfenp1049 {\b\insrsid7154806\charrsid7154806 第 1 行 \par }{\insrsid7154806 \par }{\i\insrsid7154806\charrsid7154806 第3行}{\lang1048\langfe1049\langnp1048\insrsid7154806 \par }{\lang1048\langfe1049\langnp1048\insrsid2111663 [BreakPage] \par }{\insrsid7154806 第 4 行 \par \par 第 5 行 \par }}

谁能帮助我?

谢谢!

I want to split a RTF file (with C# or VB.Net) in 2 ore more parts by the string [BreakPage]. I have for exemple this file, containing a [BreakPage], which needs to be split in 2 parts:

{\rtf1\ansi\ansicpg1251\uc1\deff0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deflang1049\deflangfe1049{\fonttbl{\f0\froman\fcharset204\fprq2{*\panose
02020603050405020304}Times New Roman;}{\f38\froman\fcharset0\fprq2
Times New Roman;} {\f36\froman\fcharset238\fprq2 Times New Roman
CE;}{\f39\froman\fcharset161\fprq2 Times New Roman
Greek;}{\f40\froman\fcharset162\fprq2 Times New Roman
Tur;}{\f41\froman\fcharset177\fprq2 Times New Roman (Hebrew);}
{\f42\froman\fcharset178\fprq2 Times New Roman
(Arabic);}{\f43\froman\fcharset186\fprq2 Times New Roman
Baltic;}{\f44\froman\fcharset163\fprq2 Times New Roman
(Vietnamese);}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;
\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;
\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\ql
\li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
\fs24\lang1049\langfe1049\cgrid\langnp1049\langfenp1049 \snext0
Normal;}{*\cs10 \additive \ssemihidden Default Paragraph
Font;}{*\ts11\tsrowd\trftsWidthB3\trpaddl108\trpaddr108\trpaddfl3\trpaddft3\trpaddfb3\trpaddfr3\trcbpat1\trcfpat1\tscellwidthfts0\tsvertalt\tsbrdrt\tsbrdrl\tsbrdrb\tsbrdrr\tsbrdrdgl\tsbrdrdgr\tsbrdrh\tsbrdrv
\ql
\li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
\fs20\lang1024\langfe1024\cgrid\langnp1024\langfenp1024 \snext11
\ssemihidden Normal
Table;}}{*\latentstyles\lsdstimax156\lsdlockeddef0}{*\rsidtbl
\rsid2111663\rsid7154806 \rsid15558346}{*\generator Microsoft Word
11.0.5604;}{\info{\author Programmer}{\operator
Programmer}{\creatim\yr2011\mo8\dy2\hr12\min45}{\revtim\yr2011\mo8\dy5\hr12\min34}{\version3}{\edmins1}{\nofpages1}{\nofwords5}{\nofchars34}{\nofcharsws38}
{\vern24689}}\margl1701\margr850\margt1134\margb1134
\widowctrl\ftnbj\aenddoc\noxlattoyen\expshrtn\noultrlspc\dntblnsbdb\nospaceforul\hyphcaps0\horzdoc\dghspace120\dgvspace120\dghorigin1701\dgvorigin1984\dghshow0\dgvshow3
\jcompress\viewkind1\viewscale100\nolnhtadjtbl\rsidroot15558346
\fet0\sectd \linex0\sectdefaultcl\sftnbj
{*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang {\pntxta
.}}{*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang {\pntxta
.}}{*\pnseclvl3 \pndec\pnstart1\pnindent720\pnhang {\pntxta
.}}{*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang {\pntxta
)}}{*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta
)}}{*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb
(}{\pntxta )}} {*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang
{\pntxtb (}{\pntxta
)}}{*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb
(}{\pntxta )}}{*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang
{\pntxtb (}{\pntxta )}}\pard\plain \ql
\li0\ri0\nowidctlpar\faauto\rin0\lin0\itap0
\fs24\lang1049\langfe1049\cgrid\langnp1049\langfenp1049
{\b\insrsid7154806\charrsid7154806 Line 1 \par }{\insrsid7154806 \par
}{\i\insrsid7154806\charrsid7154806
Line3}{\lang1048\langfe1049\langnp1048\insrsid7154806 \par
}{\lang1048\langfe1049\langnp1048\insrsid2111663 [BreakPage] \par
}{\insrsid7154806 Line4 \par \par Line5 \par }}

Can anyone help me?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

国产ˉ祖宗 2024-12-05 05:09:51

问题是 RTF 在全局标头中有一些(但不一定是全部)格式信息。为了分割 RTF 文本,以便结果再次成为应用了格式的有效 RTF,您基本上需要知道标题信息在哪里,并在分割中复制它。

有两种方法可以做到这一点:

  1. 编写 RTF 解析器
  2. 使用现有的 RTF 解析器

(1) 是可行的,但需要时间。幸运的是,RTF 解析器已经存在,例如 CodeProject 上的这个解析器

或者,您也可以将 RTF 文本加载到 RichTextBox,然后在 RichTextBox 中搜索分割文本 "[BreakPage]",以编程方式选择第一部分和第二部分,并使用 <一个href="http://msdn.microsoft.com/en-us/library/system.windows.forms.richtextbox.selectedrtf.aspx">SelectedRtf 属性。

The problem is that RTF has some (but not necessarily all) formatting information in a global header. In order to split the RTF text so that the results are again valid RTF with formatting applied you essentially need to know where the header information is, and replicate it across a splits.

There are two ways of doing this:

  1. Write an RTF parser
  2. Use an existing RTF parser

(1) is doable, but will take time. Luckily, RTF parsers already exist, for example this one on CodeProject.

Alternatively, you can also load the RTF text into a RichTextBox, then search for the split text "[BreakPage]" inside the RichTextBox, programmatically select the first and second part and retrieve the RTF text using the SelectedRtf property.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文