替换csv中的双引号

发布于 2024-11-06 23:54:07 字数 500 浏览 0 评论 0原文

我几乎遇到了以下问题,但没有找到解决方案。这可能是我的 CSV 文件结构:

1223;"B630521 ("L" fixed bracket)";"2" width";"length: 5"";2;alternate A
1224;"B630522 ("L" fixed bracket)";"3" width";"length: 6"";2;alternate B

如您所见,在封闭的 " 中有一些为英寸编写的 ""L"

现在我正在寻找一个 UNIX shell 脚本来用 2 个单引号替换 " (英寸)和 "L" 双引号,如下例所示:

sed "s/$OLD/$NEW/g" $QFILE > $TFILE && mv $TFILE $QFILE

任何人都可以帮忙吗我?

I've got nearly the following problem and didn't find the solution. This could be my CSV file structure:

1223;"B630521 ("L" fixed bracket)";"2" width";"length: 5"";2;alternate A
1224;"B630522 ("L" fixed bracket)";"3" width";"length: 6"";2;alternate B

As you can see there are some " written for inch and "L" in the enclosing ".

Now I'm looking for a UNIX shell script to replace the " (inch) and "L" double quotes with 2 single quotes, like the following example:

sed "s/$OLD/$NEW/g" $QFILE > $TFILE && mv $TFILE $QFILE

Can anyone help me?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

毁梦 2024-11-13 23:54:08

对于“L”,请尝试以下操作:

 sed "s/\"L\"/'L'/g"

对于英寸,您可以尝试:

sed "s/\([0-9]\)\"\"/\1''\"/g" 

我不确定这是最好的选择,但我已经尝试过并且有效。我希望这有帮助。

For the "L" try this:

 sed "s/\"L\"/'L'/g"

For inches you can try:

sed "s/\([0-9]\)\"\"/\1''\"/g" 

I am not sure it is the best option, but I have tried and it works. I hope this is helpful.

亣腦蒛氧 2024-11-13 23:54:07

更新(使用 perl 很容易,因为你可以获得完整的前瞻功能)

perl -pe 's/(?<!^)(?<!;)"(?!(;|$))/'"'"'/g' file

输出

1223;"B630521 ('L' fixed bracket)";"2' width";"length: 5'";2;alternate A
1224;"B630522 ('L' fixed bracket)";"3' width";"length: 6'";2;alternate B

仅使用 sed、grep 仅

使用 grep、sed(而不是 perl、php、python 等)不是那么优雅的解决方案 可以是:

grep -o '[^;]*' file | sed  's/"/`/; s/"$/`/; s/"/'"'"'/g; s/`/"/g' 

输出 - 对于您的输入文件,它给出:

1223
"B630521 ('L' fixed bracket)"
"2' width"
"length: 5'"
2
alternate A
1224
"B630522 ('L' fixed bracket)"
"3' width"
"length: 6'"
2
alternate B
  • grep -o 基本上是按 分割输入;
  • sed 首先将行开头的 " 替换为 `
  • 然后它将行尾的 " 替换为另一个 `
  • 然后将所有剩余双引号 " 替换为单引号 '
  • 最后将返回开头和结尾处的所有 "

Update (Using perl it easy since you get full lookahead features)

perl -pe 's/(?<!^)(?<!;)"(?!(;|$))/'"'"'/g' file

Output

1223;"B630521 ('L' fixed bracket)";"2' width";"length: 5'";2;alternate A
1224;"B630522 ('L' fixed bracket)";"3' width";"length: 6'";2;alternate B

Using sed, grep only

Just by using grep, sed (and not perl, php, python etc) a not so elegant solution can be:

grep -o '[^;]*' file | sed  's/"/`/; s/"$/`/; s/"/'"'"'/g; s/`/"/g' 

Output - for your input file it gives:

1223
"B630521 ('L' fixed bracket)"
"2' width"
"length: 5'"
2
alternate A
1224
"B630522 ('L' fixed bracket)"
"3' width"
"length: 6'"
2
alternate B
  • grep -o is basically splitting the input by ;
  • sed first replaces " at start of line by `
  • then it replaces " at end of line by another `
  • it then replaces all remaining double quotes " by single quite '
  • finally it puts back all " at the start and end
静谧 2024-11-13 23:54:07

也许这就是您想要的:

sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g"

即:在数字([0-9])后面查找双引号("),但后面不跟分号([ ^;])并将其替换为两个单引号

编辑:
我可以扩展我的命令(现在它变得很长):

sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g"

当您使用 SunOS 时,我猜您不能使用扩展正则表达式(sed -r)?因此我这样做了:第一个 s 命令将所有英寸 " 替换为 '',第二个和第三个 s< /code> 是相同的。它们将所有不是 ; 直接邻居的 " 替换为单个 '。我必须执行两次才能替换例如 "L" 的第二个 " 因为 " 和此之间只有一个字符字符已与 \([^;]\) 匹配。这样您也可以将 "" 替换为 ''。如果您有 """"""" 等,则必须再添加一个(但只能再添加一个)s

Maybe this is what you want:

sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g"

I.e.: Find double quotes (") following a number ([0-9]) but not followed by a semicolon ([^;]) and replace it with two single quotes.

Edit:
I can extend my command (it's becoming quite long now):

sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g"

As you are using SunOS I guess you cannot use extended regular expressions (sed -r)? Therefore I did it that way: The first s command replaces all inch " with '', the second and the third s are the same. They substitute all " that are not a direct neighbor of a ; with a single '. I have to do it twice to be able to substitute the second " of e.g. "L" because there's only one character between both " and this character is already matched by \([^;]\). This way you would also substitute "" with ''. If you have """ or """" etc. you have to put one more (but only one more) s.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文