在 Tcl 中处理带引号的字符串中的逗号

发布于 2024-10-11 14:41:37 字数 512 浏览 9 评论 0原文

我在 Tcl 中使用以下行来解析逗号分隔的字段行。某些字段可能会被引用,因此它们可以包含逗号:(

set line {12,"34","56"}
set fresult [regsub -all {(\")([^\"]+)(\",)|([^,\"]+),} $line {{\2\4} } fields]
puts $fields
{12} {34} "56"

最后一个字段被引用而不是大括号有点奇怪,但这不是这里的问题)

但是,当引号中有逗号时,它不起作用:

set line {12,"34","56,78"}
set fresult [regsub -all {(\")([^\"]+)(\",)|([^,\"]+),} $line {{\2\4} } fields]
puts $fields
{12} {34} "{56} 78"

我期望: {12} {34} {56,78}

我的正则表达式有问题还是发生了类似 tcl 的事情?

I'm using the following line in Tcl to parse a comma-separated line of fields. Some of the fields may be quoted so they can contain comma's:

set line {12,"34","56"}
set fresult [regsub -all {(\")([^\"]+)(\",)|([^,\"]+),} $line {{\2\4} } fields]
puts $fields
{12} {34} "56"

(It's a bit strange that the last field is quoted instead of braced but that's not the problem here)

However, when there is a comma in the quote, it does not work:

set line {12,"34","56,78"}
set fresult [regsub -all {(\")([^\"]+)(\",)|([^,\"]+),} $line {{\2\4} } fields]
puts $fields
{12} {34} "{56} 78"

I would expect:
{12} {34} {56,78}

Is there something wrong with my regexp or it there something tcl-ish going on?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

無心 2024-10-18 14:41:37

我想到的一种选择是使用 TclLib 中的 CSV 功能。 (没有理由重新发明轮子,除非你必须......)

http://tcllib.sourceforge。 net/doc/csv.html

文档摘录

::csv::分割? - 替代?线
{sepChar ,} {delChar "} 转换
将 CSV 格式的行放入列表中
行中包含的值。这
用于分隔值的字符
彼此之间可以定义为
调用者,通过 sepChar,但这是
选修的。默认为“,”。这
引用字符可以定义为
调用者,但这是可选的。这
默认为 '"'。如果该选项
-alternate 指定使用稍微不同的语法来解析
输入。该语法解释如下,
在“格式”部分中。

One option that comes to mind is using the CSV functionality in TclLib. (No reason to reinvent the wheel unless you have to...)

http://tcllib.sourceforge.net/doc/csv.html

Docs Excerpt

::csv::split ? -alternate ? line
{sepChar ,} {delChar "} converts a
line in CSV format into a list of the
values contained in the line. The
character used to separate the values
from each other can be defined by the
caller, via sepChar, but this is
optional. The default is ",". The
quoting character can be defined by
the caller, but this is optional. The
default is '"'. If the option
-alternate is spcified a slightly different syntax is used to parse the
input. This syntax is explained below,
in the section FORMAT.

安稳善良 2024-10-18 14:41:37

问题似乎是一个额外的逗号:您只接受带引号的字符串(如果它们后面有逗号),并对非带引号的标记执行相同的操作,这有效:

set fresult [regsub -all {(\")([^\"]+)(\")|([^,\"]+)} $line {{\2\4} } fields]
                                        ^(no commas)^

工作示例:http://ideone.com/O2hss

您可以安全地将逗号排除在模式之外 - 正则表达式引擎将热衷于搜索新的匹配项:它将跳过无法匹配的逗号,并从下一个字符开始。

奖励:这还将使用 \" 处理转义引号(如果您需要,您应该能够通过使用 "" 而不是\\. ).:

set fresult [regsub -all {"((?:[^"\\]|\\.)+)"|([^,"]+)} $line {{\1\2} } fields]

示例:http://ideone.com/ztkBh

The problem seems to be an extra comma: you only accept quoted strings if they have a comma after them., and do the same for non-quoted tokens, This works:

set fresult [regsub -all {(\")([^\"]+)(\")|([^,\"]+)} $line {{\2\4} } fields]
                                        ^(no commas)^

Working Example: http://ideone.com/O2hss

You can safely keep the commas out of the pattern - the regex engine will keen searching new matches: it will skip a comma it cannot match, and start at the next character.

Bonus: this will also handle escaped quotes, using \" (if you need you should be able to adapt easily by using "" instead of \\. ).:

set fresult [regsub -all {"((?:[^"\\]|\\.)+)"|([^,"]+)} $line {{\1\2} } fields]

Example: http://ideone.com/ztkBh

儭儭莪哋寶赑 2024-10-18 14:41:37

使用以下 regsub

% set line {12,"34","56,78"}

% regsub -all {(,")|(",)|"} $line " " line

% set line

12 34  56,78  <<< Result

这里所有出现的 ,""," (按顺序)都被空格替换

Use the following regsub

% set line {12,"34","56,78"}

% regsub -all {(,")|(",)|"} $line " " line

% set line

12 34  56,78  <<< Result

Here all the occurrences of ," or ", or " (in order) are replaced by space

怀里藏娇 2024-10-18 14:41:37

正如您对@Kobi所说,如果允许空字段,则应该允许空字符串“”
{((\")([^\"]*)(\")|([^,\"]*))(,|$)} 其中感兴趣的字段转移到 3和 5

扩展: { ( (\")([^\"]*)(\") | ([^,\"]*) ) (,|$) } 我承认,我不知道tcl是否允许(?:)非捕获分组。

As you said to @Kobi, if you allow for empty fields, you should allow for empty strings ""
{((\")([^\"]*)(\")|([^,\"]*))(,|$)} where the fields of interest shifted to 3 and 5

Expanded: { ( (\")([^\"]*)(\") | ([^,\"]*) ) (,|$) } I admit, I don't know if tcl allows (?:) non-capture grouping.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文