如何将 RTF 文件分割成行?

发布于 2024-08-02 22:59:53 字数 142 浏览 6 评论 0 原文

我试图将 RTF 文件分割成行(在我的代码中),但我不太正确,主要是因为我并没有真正理解整个 RTF 格式。看起来行可以用 \par 或 \pard 或 \par\pard 或任何数量的有趣组合来分割。

我正在寻找一段代码,可以将文件分割成任何语言的行。

I am trying to split an RTF file into lines (in my code) and I am not quite getting it right, mostly because I am not really grokking the entirety of the RTF format. It seems that lines can be split by \par or \pard or \par\pard or any number of fun combinations.

I am looking for a piece of code that splits the file into lines in any language really.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

⒈起吃苦の倖褔 2024-08-09 22:59:53

您可以尝试 规范 ( 1.9.1)(参见维基百科页面上的外部链接 - 其中还有一些指向多种编程语言的示例或模块的链接)。

这很可能会让您了解行插入“单词”,因此您可以使用一组明确定义的规则将文件拆分为行,而不是猜测它。

You could try the specification (1.9.1) (see External Links on the Wikipedia page - which also has a couple of links to examples or modules in several programming languages).

That would most likely give you an idea of the line insertion "words", so you can split the file into lines using a well-defined set of rules rather than taking a guess at it.

书间行客 2024-08-09 22:59:53

您是否读过 O'Reilly 的 RTF Pocket Guide,作者:Sean M. Burke?

在第 13 页上,它说

以下是在 RTF 中放置换行符的一些经验法则:

  • 在每个 \pard 或 \ 之前放置一个换行符(在“段落”部分中解释的命令。在
  • RTF 字体表之前和之后放置一个换行符、样式表和其他类似的结构(如颜色表,稍后描述)。
  • 您可以在每个N个空格、{或}之后放置一个换行符。 {, 或 } 位于第 60 列之后。)

或者您是否正在考虑将明文提取为行,并且无论明文的语言是什么?

Have you come across O'Reilly's RTF Pocket Guide, by Sean M. Burke ?

On page 13, it says

Here are some rules of thumb for putting linebreaks in RTF:

  • Put a newline before every \pard or \ (commands that are explained in the "Paragraphs" section.
  • Put a newline before and after the RTF font-table, stylesheet, and other similar constructs (like the color table, decribed later).
  • You can put a newline after every Nth space, {, or }. (Alternatively: put a newline after every space, {, or } that's after the 60th column.)

Or were you thinking of extracting the plaintext as lines, and doing it whatever the language of the plaintext?

假扮的天使 2024-08-09 22:59:53

我编写了一个快速而肮脏的例程,它似乎适用于我能扔给它的几乎所有东西。它是用 VB6 编写的,但可以轻松转换为其他任何内容。

Private Function ParseRTFIntoLines(ByVal strSource As String) As Collection
    Dim colReturn As Collection
    Dim lngPosStart As Long
    Dim strLine As String
    Dim sSplitters(1 To 4) As String
    Dim nIndex As Long

    ' return collection of lines '

    ' The lines can be split by the following '
    ' "\par"                                  '
    ' "\par "                                 '
    ' "\par\pard "                            '

    ' Add these splitters in order so that we do not miss '
    ' any possible split combos, for instance, "\par\pard" is added before "\par" '
    ' because if we look for "\par" first, we will miss "\par\pard" '
    sSplitters(1) = "\par \pard"
    sSplitters(2) = "\par\pard"
    sSplitters(3) = "\par "
    sSplitters(4) = "\par"

    Set colReturn = New Collection

    ' We have to find each variation '
    ' We will look for \par and then evaluate which type of separator is there '

    Do
        lngPosStart = InStr(1, strSource, "\par", vbTextCompare)
        If lngPosStart > 0 Then
            strLine = Left$(strSource, lngPosStart - 1)

            For nIndex = 1 To 4
                If StrComp(sSplitters(nIndex), Mid$(strSource, lngPosStart, Len(sSplitters(nIndex))), vbTextCompare) = 0 Then
                    ' remove the 1st line from strSource '
                    strSource = Mid$(strSource, lngPosStart + Len(sSplitters(nIndex)))

                    ' add to collection '
                    colReturn.Add strLine

                    ' get out of here '
                    Exit For
                End If
            Next
        End If

    Loop While lngPosStart > 0

    ' check to see whether there is a last line '
    If Len(strSource) > 0 Then colReturn.Add strSource

    Set ParseRTFIntoLines = colReturn
End Function

I coded up a quick and dirty routine and it seems to work for pretty much anything I've been able to throw at it. It's in VB6, but easily translatable into anything else.

Private Function ParseRTFIntoLines(ByVal strSource As String) As Collection
    Dim colReturn As Collection
    Dim lngPosStart As Long
    Dim strLine As String
    Dim sSplitters(1 To 4) As String
    Dim nIndex As Long

    ' return collection of lines '

    ' The lines can be split by the following '
    ' "\par"                                  '
    ' "\par "                                 '
    ' "\par\pard "                            '

    ' Add these splitters in order so that we do not miss '
    ' any possible split combos, for instance, "\par\pard" is added before "\par" '
    ' because if we look for "\par" first, we will miss "\par\pard" '
    sSplitters(1) = "\par \pard"
    sSplitters(2) = "\par\pard"
    sSplitters(3) = "\par "
    sSplitters(4) = "\par"

    Set colReturn = New Collection

    ' We have to find each variation '
    ' We will look for \par and then evaluate which type of separator is there '

    Do
        lngPosStart = InStr(1, strSource, "\par", vbTextCompare)
        If lngPosStart > 0 Then
            strLine = Left$(strSource, lngPosStart - 1)

            For nIndex = 1 To 4
                If StrComp(sSplitters(nIndex), Mid$(strSource, lngPosStart, Len(sSplitters(nIndex))), vbTextCompare) = 0 Then
                    ' remove the 1st line from strSource '
                    strSource = Mid$(strSource, lngPosStart + Len(sSplitters(nIndex)))

                    ' add to collection '
                    colReturn.Add strLine

                    ' get out of here '
                    Exit For
                End If
            Next
        End If

    Loop While lngPosStart > 0

    ' check to see whether there is a last line '
    If Len(strSource) > 0 Then colReturn.Add strSource

    Set ParseRTFIntoLines = colReturn
End Function
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文