用于解析 JSON 等文本的正则表达式

发布于 2024-11-01 16:49:19 字数 483 浏览 3 评论 0原文

我有以下形式的正则表达式：

Field1:Value
Field2:Value
Field3:Value

Field1:Value
Field2:Value
Field3:Value

Field1:Value
Field2:Value
Field3:Value

Field1:Value
Field2:Value
Field3:Value

冒号左侧的内容是标准字母字符 ([a-zA-Z])，第一个字符始终以大写字母开头。它们不能是 Field1、Field2 或 Field3 以外的任何内容。但是，右侧的值可以跨越多行，并且可以包含任何字符：[a-zA-Z]、空格、$、%< /code>、^ 等。我正在尝试在 TCL 中单独匹配 {Field1:value}{Field2:value}{Field3:value} 的正则表达式。

原文

I have regular expression of the form:

Field1:Value
Field2:Value
Field3:Value

Field1:Value
Field2:Value
Field3:Value

Field1:Value
Field2:Value
Field3:Value

Field1:Value
Field2:Value
Field3:Value

Things to the left of the colon are standard alphabetical characters ([a-zA-Z]) and the first character always starts with a capital letter. They can't be anything other than Field1 or Field2 or Field3. The value to the right, however, can span multiple lines and can contain any character: [a-zA-Z], white space, $ , %, ^, etc. I am trying for a regular expression that could match {Field1:value}{Field2:value}{Field3:value} separately in TCL.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

岁吢 2024-11-08 16:49:20

一般来说，我的工作方式是首先将数据解析为行，然后为每行分配一个解释（例如，起始行或延续行），然后将起始行与其后续延续相结合（形成“逻辑”行）。只有完成此操作后，我才会使用 RE 将键与值分开。作为格式建议，如果该行以空格开头，请尝试将其作为延续。这非常容易实现，并且在文件中看起来不错。

作为代码：

# Read the data from a file and split into lines
set f [open "filename"]
set lines [split [read $f] "\n"]
close $f

# Recombine into logical lines
set logicalLines {}
foreach realline $lines {
    if {[regexp "^ (.*)" $realline -> tail]} {
        append current "\n$tail"
    } else {
        if {[info exist current]} {
            lappend logicalLines $current
        }
        set current $realline
    }
}
lappend logicalLines $current         ;# Assume at least one line :-)

# Parse the logical lines
foreach line $logicalLines {
    if {[regexp {^([A-Z]\w+):(.*)$} $line -> key value]} {
        # OK, got $key mapping to $value
    } else {
        # It's a bogus line; waaaah!
    }
}

好的，您可能有不同的规则来组合线条，但是通过将事情分成这样的两个阶段，您的生活会变得更加轻松。同样，可以对行有效性使用更严格的测试（例如，将 ([AZ]\w+) 替换为 (Field[123]) ），但我不是确信这实际上是明智的。

In general, I'd work by parsing the data first into lines, then assigning to each line an interpretation (e.g., start line or continuation line), then combining the start lines with their following continuations (forming “logical” lines). Only once that was done would I then use an RE to split the key from the value. As a suggestion for format, try having the line be a continuation if it starts with a space. That's dead easy to implement and looks good in a file.

As code:

# Read the data from a file and split into lines
set f [open "filename"]
set lines [split [read $f] "\n"]
close $f

# Recombine into logical lines
set logicalLines {}
foreach realline $lines {
    if {[regexp "^ (.*)" $realline -> tail]} {
        append current "\n$tail"
    } else {
        if {[info exist current]} {
            lappend logicalLines $current
        }
        set current $realline
    }
}
lappend logicalLines $current         ;# Assume at least one line :-)

# Parse the logical lines
foreach line $logicalLines {
    if {[regexp {^([A-Z]\w+):(.*)$} $line -> key value]} {
        # OK, got $key mapping to $value
    } else {
        # It's a bogus line; waaaah!
    }
}

OK, you might have different rules for combining the lines, but by splitting things up into two stages like this, you make your life much easier. Similarly, it's possible to use a tighter test for line validity (replacing ([A-Z]\w+) with (Field[123]) for example) but I'm not convinced it's actually sensible.

回复收藏 0 原文

~没有更多了~