KSH 脚本:如何拆分 ',' 当值转义逗号时?

发布于 2024-07-06 08:27:55 字数 687 浏览 8 评论 0原文

我尝试编写 KSH 脚本来处理由名称-值对组成的文件,每行有几个。

格式是:

NAME1 VALUE1,NAME2 VALUE2,NAME3 VALUE3, etc

假设我写:

read l
IFS=","
set -A nvls $l
echo "$nvls[2]"

这将为我提供第二个名称-值对,既好又简单。 现在,假设任务已扩展,以便值可以包含逗号。 它们应该被转义,如下所示:

NAME1 VALUE1,NAME2 VALUE2_1\,VALUE2_2,NAME3 VALUE3, etc

显然,我的代码不再有效,因为“读取”会删除所有引用,并且数组的第二个元素将只是“NAME2 VALUE2_1”。

我坚持使用没有“read -A array”的旧ksh。 我尝试了“read -r”和“eval set -A ....”的各种技巧,但没有成功。 我无法使用“read nvl1 nvl2 nvl3”在读取中进行转义和分割,因为我事先不知道每行中有多少个名称-值对。

有人有一个有用的技巧给我吗?

聚苯乙烯 我知道我曾经在 Perl、Python、甚至 awk 中做过这件事。 但是,我必须在 ksh 中执行此操作(...或者尝试失败;)

I try to write KSH script for processing a file consisting of name-value pairs, several of them on each line.

Format is:

NAME1 VALUE1,NAME2 VALUE2,NAME3 VALUE3, etc

Suppose I write:

read l
IFS=","
set -A nvls $l
echo "$nvls[2]"

This will give me second name-value pair, nice and easy. Now, suppose that the task is extended so that values could include commas. They should be escaped, like this:

NAME1 VALUE1,NAME2 VALUE2_1\,VALUE2_2,NAME3 VALUE3, etc

Obviously, my code no longer works, since "read" strips all quoting and second element of array will be just "NAME2 VALUE2_1".

I'm stuck with older ksh that does not have "read -A array". I tried various tricks with "read -r" and "eval set -A ....", to no avail. I can't use "read nvl1 nvl2 nvl3" to do unescaping and splitting inside read, since I dont know beforehand how many name-value pairs are in each line.

Does anyone have a useful trick up their sleeve for me?

PS
I know that I have do this in a nick of time in Perl, Python, even in awk. However, I have to do it in ksh (... or die trying ;)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

待天淡蓝洁白时 2024-07-13 08:27:55

正如经常发生的那样,我在公共论坛中提出问题几分钟后就得到了答案:(

我通过以下 sed 脚本管道输入文件来解决引用/取消引用问题:

sed -e 's/\([^\]\),/\1\
/g;s/$/\
/

它将输入转换为:

NAME1.1 VALUE1.1
NAME1.2 VALUE1.2_1\,VALUE1.2_2
NAME1.3 VALUE1.3
<empty line>
NAME2.1 VALUE2.1
<second record continues>

现在,我可以解析这个像这样输入:

while read name value ; do
  echo "$name => $value"
done

值的逗号不被“read”引用,如果我愿意的话,我可以将“名称”和“值”填充到一些关联数组中


既然我不能接受自己的答案,我应该删除这个问题,还是......?

As it often happens, I deviced an answer minutes after asking the question in public forum :(

I worked around the quoting/unquoting issue by piping the input file through the following sed script:

sed -e 's/\([^\]\),/\1\
/g;s/$/\
/

It converted the input into:

NAME1.1 VALUE1.1
NAME1.2 VALUE1.2_1\,VALUE1.2_2
NAME1.3 VALUE1.3
<empty line>
NAME2.1 VALUE2.1
<second record continues>

Now, I can parse this input like this:

while read name value ; do
  echo "$name => $value"
done

Value will have its commas unquoted by "read", and I can stuff "name" and "value" in some associative array, if I like.

PS
Since I cant accept my own answer, should I delete the question, or ...?

自控 2024-07-13 08:27:55

您还可以将 \, 模式更改为已知不会出现在任何字符串中的其他内容,然后在将输入拆分为数组后将其更改回来。 您可以使用 ksh 内置模式替换语法来执行此操作,不需要使用 sed 或 awk 或任何东西。

read l
l=${l//\\,/!!}
IFS=","
set -A nvls $l
unset IFS
echo ${nvls[2]/!!/,}

You can also change the \, pattern to something else that is known not to appear in any of your strings, and then change it back after you've split the input into an array. You can use the ksh builtin pattern-substitution syntax to do this, you don't need to use sed or awk or anything.

read l
l=${l//\\,/!!}
IFS=","
set -A nvls $l
unset IFS
echo ${nvls[2]/!!/,}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文