在 Tcl 中处理带引号的字符串中的逗号
我在 Tcl 中使用以下行来解析逗号分隔的字段行。某些字段可能会被引用,因此它们可以包含逗号:(
set line {12,"34","56"}
set fresult [regsub -all {(\")([^\"]+)(\",)|([^,\"]+),} $line {{\2\4} } fields]
puts $fields
{12} {34} "56"
最后一个字段被引用而不是大括号有点奇怪,但这不是这里的问题)
但是,当引号中有逗号时,它不起作用:
set line {12,"34","56,78"}
set fresult [regsub -all {(\")([^\"]+)(\",)|([^,\"]+),} $line {{\2\4} } fields]
puts $fields
{12} {34} "{56} 78"
我期望: {12} {34} {56,78}
我的正则表达式有问题还是发生了类似 tcl 的事情?
I'm using the following line in Tcl to parse a comma-separated line of fields. Some of the fields may be quoted so they can contain comma's:
set line {12,"34","56"}
set fresult [regsub -all {(\")([^\"]+)(\",)|([^,\"]+),} $line {{\2\4} } fields]
puts $fields
{12} {34} "56"
(It's a bit strange that the last field is quoted instead of braced but that's not the problem here)
However, when there is a comma in the quote, it does not work:
set line {12,"34","56,78"}
set fresult [regsub -all {(\")([^\"]+)(\",)|([^,\"]+),} $line {{\2\4} } fields]
puts $fields
{12} {34} "{56} 78"
I would expect:
{12} {34} {56,78}
Is there something wrong with my regexp or it there something tcl-ish going on?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我想到的一种选择是使用 TclLib 中的 CSV 功能。 (没有理由重新发明轮子,除非你必须......)
http://tcllib.sourceforge。 net/doc/csv.html
文档摘录
One option that comes to mind is using the CSV functionality in TclLib. (No reason to reinvent the wheel unless you have to...)
http://tcllib.sourceforge.net/doc/csv.html
Docs Excerpt
问题似乎是一个额外的逗号:您只接受带引号的字符串(如果它们后面有逗号),并对非带引号的标记执行相同的操作,这有效:
工作示例:http://ideone.com/O2hss
您可以安全地将逗号排除在模式之外 - 正则表达式引擎将热衷于搜索新的匹配项:它将跳过无法匹配的逗号,并从下一个字符开始。
奖励:这还将使用
\"
处理转义引号(如果您需要,您应该能够通过使用""
而不是\\.
).:示例:http://ideone.com/ztkBh
The problem seems to be an extra comma: you only accept quoted strings if they have a comma after them., and do the same for non-quoted tokens, This works:
Working Example: http://ideone.com/O2hss
You can safely keep the commas out of the pattern - the regex engine will keen searching new matches: it will skip a comma it cannot match, and start at the next character.
Bonus: this will also handle escaped quotes, using
\"
(if you need you should be able to adapt easily by using""
instead of\\.
).:Example: http://ideone.com/ztkBh
使用以下 regsub
这里所有出现的
,"
或",
或"
(按顺序)都被空格替换Use the following regsub
Here all the occurrences of
,"
or",
or"
(in order) are replaced by space正如您对@Kobi所说,如果允许空字段,则应该允许空字符串“”
{((\")([^\"]*)(\")|([^,\"]*))(,|$)}
其中感兴趣的字段转移到 3和 5扩展:
{ ( (\")([^\"]*)(\") | ([^,\"]*) ) (,|$) }
我承认,我不知道tcl是否允许(?:)非捕获分组。As you said to @Kobi, if you allow for empty fields, you should allow for empty strings ""
{((\")([^\"]*)(\")|([^,\"]*))(,|$)}
where the fields of interest shifted to 3 and 5Expanded:
{ ( (\")([^\"]*)(\") | ([^,\"]*) ) (,|$) }
I admit, I don't know if tcl allows (?:) non-capture grouping.