使用 gsub 将一个变量替换为另一个来自函数调用的值的变量
我有一个函数可以用文件中的某些模式替换实际值。我在这里试图实现的目标是调用一个使用 gsub 的函数来查找和替换字符串,替换值基本上来自另一个函数调用。
$ cat pat-file
name 10101010
phone 10101010
code 10101010
bankaccount 1010101010101
$ cat data_sub.sh
abc()
{
awk '
function mask(str, str_masked) {
for (j=1; j<=length(str); j++) {
if (substr(masks[i], j, 1)==1) {
c = substr(str, j, 1)
} else {
c = "*"
}
str_masked = str_masked c
}
return str_masked
}
FNR == NR {
tags[NR-1] = $1
masks[NR-1] = $2
}
FNR != NR {
line = $0
for (i in tags) {
regex = "<"tags[i]">[^<]+</"tags[i]">"
masked_line = ""
l = length(tags[i])
while (match(line, regex) > 0) {
fulltag = substr(line, RSTART, RLENGTH)
tagval = substr(fulltag, l+3, RLENGTH-l-l-5)
fulltag_masked = "<"tags[i]">" mask(tagval) "</"tags[i]">"
masked_line = masked_line substr(line, 1, RSTART-1) fulltag_masked
line = substr(line, RSTART + RLENGTH)
}
line = masked_line line
}
print line
}' "$@" pat-file file-1 > output_file
}
abc
tagval
变量存储 XML 标记的值,该标记在 XML 内部被屏蔽,但由于它也存在于 XML 外部,因此我也需要屏蔽这些值。请参阅输入文件
file-1
This is a demo data = ABCD
This is a demo data = XYCD
This is a demo data = ABCD
This is a demo data = BLAH
This is a demo data = ABCD
This is a demo data = MEH
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD and MEH
This is a demo data <tag changed="yes"<name>ABCD</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
This is a demo data <tag changed="yes"<name>ABCD</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
This is a demo data <tag changed="yes"<name>ABCD</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
逻辑非常简单且非常直接,即存储所有提取的被屏蔽的标记值,然后对这些值(但在 XML 之外)执行相同的屏蔽算法。我怎样才能实现这个目标?
输出文件
This is a demo data = ABCD
This is a demo data = XYCD
This is a demo data = ABCD
This is a demo data = BLAH
This is a demo data = ABCD
This is a demo data = MEH
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD and MEH
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
预期输出文件
This is a demo data = A*C*
This is a demo data = XYCD
This is a demo data = A*C*
This is a demo data = BLAH
This is a demo data = A*C*
This is a demo data = M*H
This is a demo data = A*C*
This is a demo data = A*C*
This is a demo data = A*C*
This is a demo data = A*C* and M*H
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
I have a function which substitutes actual values with some pattern from a file. The objective I'm trying to achieve here is to call a function which uses gsub
to find and replace the string in a way that the substitution value is basically coming from another function call.
$ cat pat-file
name 10101010
phone 10101010
code 10101010
bankaccount 1010101010101
$ cat data_sub.sh
abc()
{
awk '
function mask(str, str_masked) {
for (j=1; j<=length(str); j++) {
if (substr(masks[i], j, 1)==1) {
c = substr(str, j, 1)
} else {
c = "*"
}
str_masked = str_masked c
}
return str_masked
}
FNR == NR {
tags[NR-1] = $1
masks[NR-1] = $2
}
FNR != NR {
line = $0
for (i in tags) {
regex = "<"tags[i]">[^<]+</"tags[i]">"
masked_line = ""
l = length(tags[i])
while (match(line, regex) > 0) {
fulltag = substr(line, RSTART, RLENGTH)
tagval = substr(fulltag, l+3, RLENGTH-l-l-5)
fulltag_masked = "<"tags[i]">" mask(tagval) "</"tags[i]">"
masked_line = masked_line substr(line, 1, RSTART-1) fulltag_masked
line = substr(line, RSTART + RLENGTH)
}
line = masked_line line
}
print line
}' "$@" pat-file file-1 > output_file
}
abc
The tagval
variable stores the value of the XML tag which gets masked inside the XML but as it is present outside the XML as well, I need to mask those values too. See the input file
file-1
This is a demo data = ABCD
This is a demo data = XYCD
This is a demo data = ABCD
This is a demo data = BLAH
This is a demo data = ABCD
This is a demo data = MEH
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD and MEH
This is a demo data <tag changed="yes"<name>ABCD</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
This is a demo data <tag changed="yes"<name>ABCD</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
This is a demo data <tag changed="yes"<name>ABCD</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
The logic is simple and pretty straight forward i.e store all the extracted tag value that get masked, then perform the same masking algorithm on those values but outside XML. How can I achieve this?
Output file
This is a demo data = ABCD
This is a demo data = XYCD
This is a demo data = ABCD
This is a demo data = BLAH
This is a demo data = ABCD
This is a demo data = MEH
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD and MEH
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
Expected Output file
This is a demo data = A*C*
This is a demo data = XYCD
This is a demo data = A*C*
This is a demo data = BLAH
This is a demo data = A*C*
This is a demo data = M*H
This is a demo data = A*C*
This is a demo data = A*C*
This is a demo data = A*C*
This is a demo data = A*C* and M*H
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设:
name=ABCD
和code=ABCD
),则awk
找到的第一个掩码将用于屏蔽字符串(即,我们不会优先处理标记/掩码对的顺序)ABCD
时,我们也会屏蔽ABCD-XYZ
,但不会屏蔽ABCDABCD
或ABCD_XYZ
>)111111111...
(全部为1
),则 会去提前并执行(有效)无操作操作一般操作:
file-1
),查找“标签”条目,END
中处理再次运行我们的行数组,查找以前屏蔽的任何(字边界)字符串,如果找到,则11111111...
的情况下 替换为保存的掩码值(所有1's
)此END
处理也将重新屏蔽“标记”条目(仍然有效,无操作)stdout示例输入文件的一些行:
一想法建立在OP当前的
awk
代码上:这会生成:
Assumptions:
name=ABCD
andcode=ABCD
) then the 1st mask found byawk
will be used to mask the string (ie, we won't prioritize the order in which tag/mask pairs are processed)awk
word boundaries (eg, when maskingABCD
we'll also maskABCD-XYZ
but we won't maskABCDABCD
norABCD_XYZ
)111111111...
(all1's
) we'll go ahead and perform the (effective) no-op operationGeneral operation:
file-1
) looking for 'tag' entriesEND
processing runs through our array of lines again, looking for any (word-boundaried) strings that were previously masked and if found, replace with the saved mask value11111111...
(all1's
) thisEND
processing will re-mask the 'tag' entries, too (still, effectively, a no-op)Adding some lines to the sample input file:
One idea building on OP's current
awk
code:This generates: