如何从 Unix 命令行删除 XML 标签？

发布于 2024-10-23 21:31:59 字数 349 浏览 8 评论 0原文

我正在 grep 一个 XML 文件，它给出如下输出：

<tag>data</tag>
<tag>more data</tag>
...

注意，这是一个平面文件，而不是 XML 树。我想删除 XML 标签并只显示其间的数据。我正在从命令行执行所有这些操作，并且想知道是否有比将其通过管道传输到 awk 两次更好的方法...

cat file.xml | awk -F'>' '{print $2}' | awk -F'<' '{print $1}'

理想情况下，我想在一个命令中执行此操作

原文

I am grepping an XML File, which gives me output like this:

<tag>data</tag>
<tag>more data</tag>
...

Note, this is a flat file, not an XML tree. I want to remove the XML tags and just display the data in between. I'm doing all this from the command line and was wondering if there is a better way than piping it into awk twice...

cat file.xml | awk -F'>' '{print $2}' | awk -F'<' '{print $1}'

Ideally, I would like to do this in one command

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ゃ人海孤独症 2024-10-30 21:31:59

如果您的文件看起来像这样，那么 sed 可以帮助您：

sed -e 's/<[^>]*>//g' file.xml

当然你不应该使用正则表达式来解析 XML 因为它很难< /a>.

If your file looks just like that, then sed can help you:

sed -e 's/<[^>]*>//g' file.xml

Of course you should not use regular expressions for parsing XML because it's hard.

回复收藏 0 原文

三生路 2024-10-30 21:31:59

使用 awk：

awk '{gsub(/<[^>]*>/,"")};1' file.xml

Using awk:

awk '{gsub(/<[^>]*>/,"")};1' file.xml

回复收藏 0 原文

小情绪 2024-10-30 21:31:59

尝试一下：

grep -Po '<.*?>\K.*?(?=<.*?>)' inputfile

说明：

使用 Perl 兼容的正则表达式 (-P) 并仅输出指定的匹配项 (-o)：

<.*?> ; - 尖括号内任何字符的非贪婪匹配
\K - 不在输出中包含前面的匹配（重置匹配开始 - 类似于正向后向查找，但它适用于可变长度匹配）
.*? - 非贪婪匹配在下一个匹配处停止（这部分将被输出）
(?=<.*?>) - 尖括号内的任何字符的非贪婪匹配，并且不在输出中包含匹配（正向前视 - 适用于可变长度匹配）

Give this a try:

grep -Po '<.*?>\K.*?(?=<.*?>)' inputfile

Explanation:

Using Perl Compatible Regular Expressions (-P) and outputting only the specified matches (-o):

<.*?> - Non-greedy match of any characters within angle brackets
\K - Don't include the preceding match in the output (reset match start - similar to positive look-behind, but it works with variable-length matches)
.*? - Non-greedy match stopping at the next match (this part will be output)
(?=<.*?>) - Non-greedy match of any characters within angle brackets and don't include the match in the output (positive look-ahead - works with variable-length matches)

回复收藏 0 原文

吖咩 2024-10-30 21:31:59

使用 html2text 命令行工具，将 html 转换为纯文本。

或者，您可以尝试ex-way：

ex -s +'%s/<[^>].\{-}>//ge' +%p +q! file.txt

或：

cat file.txt | ex -s +'%s/<[^>].\{-}>//ge' +%p +q! /dev/stdin

Use html2text command-line tool, which converts html into plain text.

Alternatively you may try ex-way:

ex -s +'%s/<[^>].\{-}>//ge' +%p +q! file.txt

or:

cat file.txt | ex -s +'%s/<[^>].\{-}>//ge' +%p +q! /dev/stdin

回复收藏 0 原文

吹梦到西洲 2024-10-30 21:31:59

我知道这不是一场“pergolf大赛”，但我曾经使用过这个技巧。

将记录分隔符设置为 < 或 >，然后仅打印奇数行：

awk -vRS='<|>' NR%2 file.xml

I know this is not a "perlgolf contest", but I used to use this trick.

Set Record Separator for < or >, then print only odd lines:

awk -vRS='<|>' NR%2 file.xml

回复收藏 0 原文

~没有更多了~

关于作者

把时间冻结

暂无简介

文章

25 人气

关注发私信

搞钱吧！！！

文章 0 评论 0

关注

zhangMack

文章 0 评论 0

关注

꦳꦳ꦵ꣖꣖꣖ꦜ

文章 0 评论 0

关注

qq_je1Wlq

文章 0 评论 0

关注

fsdcds

文章 0 评论 0

关注

unknown

文章 0 评论 0

友情链接

文江博客

如何从 Unix 命令行删除 XML 标签？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

搞钱吧！！！

zhangMack

꦳꦳ꦵ꣖꣖꣖ꦜ

qq_je1Wlq

fsdcds

unknown

友情链接

如何从 Unix 命令行删除 XML 标签？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

搞钱吧！！！

zhangMack

꦳꦳ꦵ꣖꣖꣖ꦜ

qq_je1Wlq

fsdcds

unknown

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。