如何使用 ruby​​ gsub Regexp 进行多个匹配?

发布于 2025-01-01 14:17:19 字数 405 浏览 2 评论 0原文

我的 csv 文件内容在引用的文本内有双引号,

test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good

我需要将前面或后面没有逗号的每个双引号替换为“”,

test,first,line,"you are a ""kind"" man",thanks
again,second,li,"my ""boss"" is you",good

因此“被替换为”“

我尝试过

x.gsub(/([^,])"([^,])/, "#{$1}\"\"#{$2}")

但没有成功

I have csv file contents having double quotes inside quoted text

test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good

I need to replace every double quote not preceded or succeeded by a comma by ""

test,first,line,"you are a ""kind"" man",thanks
again,second,li,"my ""boss"" is you",good

so " is replaced by ""

I tried

x.gsub(/([^,])"([^,])/, "#{$1}\"\"#{$2}")

but didn't work

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

野の 2025-01-08 14:17:19

您的正则表达式需要更粗一些,以防引号出现在第一个值的开头或最后一个值的末尾:

csv = <<ENDCSV
test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good
more,""Someone" said that you're "cute"",yay
"watch out for this",and,also,"this test case"
ENDCSV

puts csv.gsub(/(?<!^|,)"(?!,|$)/,'""')
#=> test,first,line,"you are a ""kind"" man",thanks
#=> again,second,li,"my ""boss"" is you",good
#=> more,"""Someone"" said that you're ""cute""",yay
#=> "watch out for this",and,also,"this test case"

上面的正则表达式使用 Ruby 1.9 中提供的负后向断言和负前向断言(锚点) 。

  • (? — 在此位置之前不能有行首 (^) 或逗号
  • " — 找到一个双引号
  • (?!,|$) — 紧跟在此位置之后不能有逗号或行尾 ($

)额外的好处,因为你实际上并没有捕捉到其中的字符另一方面,您无需担心在替换字符串中正确使用 \1

有关详细信息,请参阅 官方 Ruby 正则表达式文档


但是,对于您这样做的情况< /em> 需要替换你的匹配项输出,您可以使用以下任何内容:

"hello".gsub /([aeiou])/, '<\1>'            #=> "h<e>ll<o>"
"hello".gsub /([aeiou])/, "<\\1>"           #=> "h<e>ll<o>"
"hello".gsub(/([aeiou])/){ |m| "<#{$1}>" }  #=> "h<e>ll<o>"

您不能像您那样在替换字符串中使用字符串插值:

"hello".gsub /([aeiou])/, "<#{$1}>"
 #=> "h<previousmatch>ll<previousmatch>"

…因为该字符串插值发生一次, gsub 已运行。使用 gsub 的块形式为每个匹配重新调用该块,此时全局 $1 已被适当填充并可供使用。


编辑:对于 Ruby 1.8(你到底为什么要使用它?),你可以使用:

puts csv.gsub(/([^,\n\r])"([^,\n\r])/,'\1""\2')

Your regex needs to be a little more bold, in case the quotes occur at the start of the first value, or at the end of the last value:

csv = <<ENDCSV
test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good
more,""Someone" said that you're "cute"",yay
"watch out for this",and,also,"this test case"
ENDCSV

puts csv.gsub(/(?<!^|,)"(?!,|$)/,'""')
#=> test,first,line,"you are a ""kind"" man",thanks
#=> again,second,li,"my ""boss"" is you",good
#=> more,"""Someone"" said that you're ""cute""",yay
#=> "watch out for this",and,also,"this test case"

The above regex is using negative lookbehind and negative lookahead assertions (anchors) available in Ruby 1.9.

  • (?<!^|,) — immediately preceding this spot there must not be either a start of line (^) or a comma
  • " — find a double quote
  • (?!,|$) — immediately following this spot there must not be either a comma or end of line ($)

As a bonus, since you didn't actually capture the characters on either side, you don't need to worry about using \1 correctly in your replacement string.

For more information, see the section "Anchors" in the official Ruby regex documentation.


However, for the case where you do need to replace matches in your output, you can use any of the following:

"hello".gsub /([aeiou])/, '<\1>'            #=> "h<e>ll<o>"
"hello".gsub /([aeiou])/, "<\\1>"           #=> "h<e>ll<o>"
"hello".gsub(/([aeiou])/){ |m| "<#{$1}>" }  #=> "h<e>ll<o>"

You can't use String interpolation in the replacement string, as you did:

"hello".gsub /([aeiou])/, "<#{$1}>"
 #=> "h<previousmatch>ll<previousmatch>"

…because that string interpolation happens once, before the gsub has been run. Using the block form of gsub re-invokes the block for each match, at which point the global $1 has been appropriately populated and is available for use.


Edit: For Ruby 1.8 (why on earth are you using that?) you can use:

puts csv.gsub(/([^,\n\r])"([^,\n\r])/,'\1""\2')
萌无敌 2025-01-08 14:17:19

假设 s 是一个字符串,这将起作用:

puts s.gsub(/([^,])"([^,])/, "\\1\"\"\\2")

Assuming s is a string, this will work:

puts s.gsub(/([^,])"([^,])/, "\\1\"\"\\2")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文