任何人都可以帮我使用 awk 或 sed 查找 xml 标记出现的次数

发布于 2024-12-22 07:37:17 字数 662 浏览 1 评论 0原文

我必须编写一个脚本，使用 shell 脚本来计算 xml 文件中 xml 标签（例如代码）的数量。 XML 文件可以是以下任何一种格式：

Format #1: 
<Code>value1</Code> <Code>value2</Code>

 Format #2: 
<Code Attr1=va>value1</Code> <Code Attr1=va
Attr2=va>value1</Code>

Format #3: 
<Code>value1</Code><Code>value2</Code> (All Codes can be in
a single line or multiple lines)

Format #4 
   <Code Attr1=va>value1</Code><Code Attr2=va>value1</Code>

Format #5: 
<Cod 
e>Value1</Code
<Code Attr=1> </C
ode>

简而言之，XML 文件可以是任何格式，并且可以在任何位置包含换行符。请帮助我，我需要尽快做到这一点..

提前致谢。

原文

I have to write a script that will count the number of xml tags(say Code) in a xml file using shell script. XML file can be anyone of the following formats:

Format #1: 
<Code>value1</Code> <Code>value2</Code>

 Format #2: 
<Code Attr1=va>value1</Code> <Code Attr1=va
Attr2=va>value1</Code>

Format #3: 
<Code>value1</Code><Code>value2</Code> (All Codes can be in
a single line or multiple lines)

Format #4 
   <Code Attr1=va>value1</Code><Code Attr2=va>value1</Code>

Format #5: 
<Cod 
e>Value1</Code
<Code Attr=1> </C
ode>

In short XML file can in any format and can have new lines anywhere.
Please help me, I need to do this soon..

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

新雨望断虹 2024-12-29 07:37:17

正则表达式是解析 XML 的不好方法，使用某种 XML 解析器更好。

如果你真的想使用 sed/awk/shell/grep 等，我首先想到的是：

 cat tst | xargs | grep -o '<\s*C\s*o\s*d\s*e[^>]*>' | wc -l

我不太了解 awk，但我确信有 awk 忍者可以做得比这更优雅。

它只计算 （& 变体）的出现次数，而不计算结束标记，因此，如果您的文件中有（例如）10 个但只有 9 ，它将返回 10 而不是 9。

基本上：

cat tst | xargs 在一行上将 'tst' 全部发送到 shell（因此我不必担心新行）；
grep -o '<\s*C\s*o\s*d\s*e[^>]*>' 打印，您可以在 Code 的所有字母之间添加换行符/空格（-o 打印只是与正则表达式匹配，每个线）;
wc -l 计算行数。

依次尝试每一位，看看我的意思。

对我来说 tst 只是上面内容的复制粘贴。

[foo@bar ~]$cat tst
Format #1: 
<Code>value1</Code> <Code>value2</Code>

 Format #2: 
<Code Attr1=va>value1</Code> <Code Attr1=va
Attr2=va>value1</Code>

Format #3: 
<Code>value1</Code><Code>value2</Code> (All Codes can be in
a single line or multiple lines)

Format #4 
   <Code Attr1=va>value1</Code><Code Attr2=va>value1</Code>

Format #5: 
<Cod 
e>Value1</Code
<Code Attr=1> </C
ode>

[foo@bar ~]$cat tst | xargs
Format #1: <Code>value1</Code> <Code>value2</Code> Format #2: <Code Attr1=va>value1</Code> <Code Attr1=va Attr2=va>value1</Code> Format #3: <Code>value1</Code><Code>value2</Code> (All Codes can be in a single line or multiple lines) Format #4 <Code Attr1=va>value1</Code><Code Attr2=va>value1</Code> Format #5: <Cod e>Value1</Code <Code Attr=1> </C ode>

[foo@bar ~]$cat tst | xargs | grep -o '<\s*C\s*o\s*d\s*e[^>]*>'
<Code>
<Code>
<Code Attr1=va>
<Code Attr1=va Attr2=va>
<Code>
<Code>
<Code Attr1=va>
<Code Attr2=va>
<Cod e>
<Code Attr=1>

[foo@bar ~]$cat tst | xargs | grep -o '<\s*C\s*o\s*d\s*e[^>]*>' | wc -l
10

Regular expressions are a bad way to parse XML, using some sort of XML parser is better.

If you really want to use sed/awk/shell/grep etc, the first thing I can think of is:

 cat tst | xargs | grep -o '<\s*C\s*o\s*d\s*e[^>]*>' | wc -l

I don't know awk very well, but I'm sure there are awk ninjas out there who can do it more elegantly than this.

It only counts occurences of <Code> (& variations) but not the closing tag, so if you have (for example) 10 <Code> in your file but only 9 </Code>, it will return 10 and not 9.

Basically:

cat tst | xargs cats 'tst' to the shell all on one line (so I don't have to worry about new lines);
grep -o '<\s*C\s*o\s*d\s*e[^>]*>' prints all matches of <Code{optional other stuff}> where you can have newlines/spaces in between all letters of Code (the -o prints just the matches to the regex, one per line);
wc -l counts the lines.

Try each bit successively to see what I mean.

For me tst was just a copy-paste of what you have above.

[foo@bar ~]$cat tst
Format #1: 
<Code>value1</Code> <Code>value2</Code>

 Format #2: 
<Code Attr1=va>value1</Code> <Code Attr1=va
Attr2=va>value1</Code>

Format #3: 
<Code>value1</Code><Code>value2</Code> (All Codes can be in
a single line or multiple lines)

Format #4 
   <Code Attr1=va>value1</Code><Code Attr2=va>value1</Code>

Format #5: 
<Cod 
e>Value1</Code
<Code Attr=1> </C
ode>

[foo@bar ~]$cat tst | xargs
Format #1: <Code>value1</Code> <Code>value2</Code> Format #2: <Code Attr1=va>value1</Code> <Code Attr1=va Attr2=va>value1</Code> Format #3: <Code>value1</Code><Code>value2</Code> (All Codes can be in a single line or multiple lines) Format #4 <Code Attr1=va>value1</Code><Code Attr2=va>value1</Code> Format #5: <Cod e>Value1</Code <Code Attr=1> </C ode>

[foo@bar ~]$cat tst | xargs | grep -o '<\s*C\s*o\s*d\s*e[^>]*>'
<Code>
<Code>
<Code Attr1=va>
<Code Attr1=va Attr2=va>
<Code>
<Code>
<Code Attr1=va>
<Code Attr2=va>
<Cod e>
<Code Attr=1>

[foo@bar ~]$cat tst | xargs | grep -o '<\s*C\s*o\s*d\s*e[^>]*>' | wc -l
10

回复收藏 0 原文

终止放荡 2024-12-29 07:37:17

根据需要通过 DOMParser 或 XMLDOM 将 XML 加载到文档树中。然后使用 jQuery $(xml).find("code") 返回出现次数的数组。数组的长度给出了计数。

回复收藏 0 原文

冷…雨湿花 2024-12-29 07:37:17

快速而肮脏的方法：

由于 xml 文件具有不同类型的标签，因此这里有一种快速而肮脏的方法来获取文件中 xml 标签的近似值。

awk -v FS="" '
BEGIN{rc=lc=0} 
{for (i=1;i<=NF;i++) if ($i~/</) {lc++} else if ($i~/>/) {rc++}}
END{print "< = "lc " and > = "rc}' xmlfile

示例文件：

[jaypal:~/Temp] cat xmlfile
Format #1: 
<Code>value1</Code> <Code>value2</Code>

 Format #2: 
<Code Attr1=va>value1</Code> <Code Attr1=va
Attr2=va>value1</Code>

Format #3: 
<Code>value1</Code><Code>value2</Code> (All Codes can be in
a single line or multiple lines)

Format #4 
   <Code Attr1=va>value1</Code><Code Attr2=va>value1</Code>

Format #5: 
<Cod 
e>Value1</Code>
<Code Attr=1> </C
ode>

执行：

[jaypal:~/Temp] awk -v FS="" '
    BEGIN{rc=lc=0} 
    {for (i=1;i<=NF;i++) if ($i~/</) {lc++} else if ($i~/>/) {rc++}}
    END{print "< = "lc " and > = "rc}' xmlfile
< = 20 and > = 20

我们现在知道有 20 * < 和 20 * >。因此，您可以近似知道文件中有 10 个 xml 标签，因为 和 构成 1 个标签。

我之所以说它是一个近似值，是因为您的文件中可能有 > 或 <，它们可能不是 xml 标签的一部分。这可能是一个开始，当然不是最终的解决方案。

Quick and Dirty way:

Since the xml file have different types of tags, here is a quick and dirty way to get an approximation of xml-tags in your file.

awk -v FS="" '
BEGIN{rc=lc=0} 
{for (i=1;i<=NF;i++) if ($i~/</) {lc++} else if ($i~/>/) {rc++}}
END{print "< = "lc " and > = "rc}' xmlfile

Sample File:

[jaypal:~/Temp] cat xmlfile
Format #1: 
<Code>value1</Code> <Code>value2</Code>

 Format #2: 
<Code Attr1=va>value1</Code> <Code Attr1=va
Attr2=va>value1</Code>

Format #3: 
<Code>value1</Code><Code>value2</Code> (All Codes can be in
a single line or multiple lines)

Format #4 
   <Code Attr1=va>value1</Code><Code Attr2=va>value1</Code>

Format #5: 
<Cod 
e>Value1</Code>
<Code Attr=1> </C
ode>

Execution:

[jaypal:~/Temp] awk -v FS="" '
    BEGIN{rc=lc=0} 
    {for (i=1;i<=NF;i++) if ($i~/</) {lc++} else if ($i~/>/) {rc++}}
    END{print "< = "lc " and > = "rc}' xmlfile
< = 20 and > = 20

We now have an idea that there are 20 * < and 20 * >. So you can have an approximation that there are 10 xml-tags in your file, as <code> and </code> makes 1 tag.

The reason I am saying it as an approximation because there may be > or < in your file which may not be a part of xml-tag. This can be a start certainly not the final solution.

回复收藏 0 原文

伤痕我心 2024-12-29 07:37:17

这可能（？）对你有用：

sed -n ':a;N;$!ba;s/\n//g;s/<\s*\/[[:alpha:]][[:alnum:]_-]*\s*>/\n&\n/gp' example |
sed -n 's/^<\//</p' | 
sort | 
uniq -c
9 <Code>

如果你有更多奇特的元素名称，你将需要将 [[:alpha:]][[:alnum:]_-]* 修改为任何内容。

This might(?) work for you:

sed -n ':a;N;$!ba;s/\n//g;s/<\s*\/[[:alpha:]][[:alnum:]_-]*\s*>/\n&\n/gp' example |
sed -n 's/^<\//</p' | 
sort | 
uniq -c
9 <Code>

If you have more exotic element names you will need to amend [[:alpha:]][[:alnum:]_-]* to whatever.

回复收藏 0 原文

三月梨花 2024-12-29 07:37:17

如果XML gawk是一个选项：

xmlgawk -lxml 'END { print c }
XMLSTARTELEM == "Code" { c++ }
  ' input.xml

If XML gawk is an option:

xmlgawk -lxml 'END { print c }
XMLSTARTELEM == "Code" { c++ }
  ' input.xml

回复收藏 0 原文

~没有更多了~

关于作者

泛滥成性

暂无简介

文章

1008 人气

关注发私信

友情链接

文江博客

任何人都可以帮我使用 awk 或 sed 查找 xml 标记出现的次数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

快速而肮脏的方法：

示例文件：

执行：

Quick and Dirty way:

Sample File:

Execution:

关于作者

相关话题

热门标签

推荐作者

搞钱吧！！！

zhangMack

꦳꦳ꦵ꣖꣖꣖ꦜ

qq_je1Wlq

fsdcds

unknown

友情链接

任何人都可以帮我使用 awk 或 sed 查找 xml 标记出现的次数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

快速而肮脏的方法：

示例文件：

执行：

Quick and Dirty way:

Sample File:

Execution:

关于作者

相关话题

热门标签

推荐作者

搞钱吧！！！

zhangMack

꦳꦳ꦵ꣖꣖꣖ꦜ

qq_je1Wlq

fsdcds

unknown

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。