当前位置：文江博客话题详情

如何使用awk删除文件的一部分

发布于 2024-07-25 20:21:25 字数 120 浏览 6 评论 0原文

我正在编写一个 shell 脚本，它在某些时候必须获取一个文件，在其中搜索特定单词并删除该单词后面的整个文本（包括单词本身） - 我认为 awk 是正确的工具，但我对其中的编程不太了解。

有人可以帮助我吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沫离伤花 2024-08-01 20:21:25

我认为“awk”是完成这项工作的工具之一，尽管我认为“sed”对于这个特定的操作更简单。该规范有点模糊。简单的版本是：

找到包含给定单词的第一行。
删除该行和所有后续行。

为此，我会使用“sed”：

sed '/word/,$d' file

更复杂的版本是：

查找包含给定单词的第一行。
删除该行上从单词开始的文本。
删除所有后续文本行。

我可能仍然会使用“sed”：

sed -n '1,/word/{s/word.*//;p}' file

这颠倒了逻辑。默认情况下，它不打印任何内容，但对于第 1 行，直到包含单词的第一行，它会进行替换（直到包含单词的行为止不执行任何操作），然后打印。

可以在“awk”中完成吗？这并不完全是微不足道的，因为“awk”会自动将输入行分割成单词，并且因为您必须使用函数来进行替换。

awk '/word/ { if (found == 0) {
                # First line with word
                sub("word.*", "")
                print $0;
                found = 1
              }
            }
            { if (found == 0) print $0; }' file

（编辑：将“delete”更改为“found”，因为“delete”是“awk”中的保留字。）

在所有这些示例中，输入文件的截断版本都写入标准输出。要就地修改文件，您需要使用 Perl 或 Python 或类似语言，或者将输出捕获到临时文件中，在命令完成后将其复制到原始文件上。（如果您尝试“脚本文件”，您将处理一个空文件。）

有多种早期退出优化可以应用于 sed 和 awk 脚本，例如：

sed '/word/q' file

并且，如果您假设使用 GNU 版本的 awk 或 sed ，有各种非标准扩展可以帮助就地修改文件。

I suppose 'awk' is one tool for the job, though I think 'sed' is simpler for this particular operation. The specification is a bit vague. The simple version is:

Find the first line containing a given word.
Delete that line and all following lines.

For that, I'd use 'sed':

sed '/word/,$d' file

The more complex version is:

Find the first line containing a given word.
Delete the text on that line from the word onwards.
Delete all subsequent lines of text.

I'd probably still use 'sed':

sed -n '1,/word/{s/word.*//;p}' file

This inverts the logic. It doesn't print anything by default, but for lines 1 until the first line containing word it does a substitute (which does nothing until the line containing the word), and then print.

Can it be done in 'awk'? Not completely trivially because 'awk' autosplits input lines into words, and because you have to use functions to do substitutions.

awk '/word/ { if (found == 0) {
                # First line with word
                sub("word.*", "")
                print $0;
                found = 1
              }
            }
            { if (found == 0) print $0; }' file

(Edited: change 'delete' to 'found' since 'delete' is a reserved word in 'awk'.)

In all these examples, the truncated version of the input file is written to standard output. To modify the file in situ, you either need to use Perl or Python or a similar language, or you capture the output in a temporary file which you copy over the original once the command has completed. (If you try 'script file' you process an empty file.)

There are various early exit optimizations that could be applied to the sed and awk scripts, such as:

sed '/word/q' file

And, if you assume the use of the GNU versions of awk or sed, there are various non-standard extensions that can help with in-situ modification of the file.

回复收藏 0 原文

离不开的别离 2024-08-01 20:21:25

我假设您的输入是这样的：

Lorem ipsum dolor sat amet，
consectetur adipiscing velit。
Nullam neque sapien, molestie vel congue non,
feugiat quis Tellus。是的
努拉米 Maecenas 舌舌草。

并且您希望输出在单词 'vel' 处被切断，如下所示：

Lorem ipsum dolor sat amet，
consectetur adipiscing velit。
Nullam neque sapien, 骚扰

在这种情况下，您的 awk 脚本将是：

cat lorem.txt | awk ' 
  /\<vel\>/ 
  {
     print substr($0, 0, match($0, /\<vel\>/) - 1); 
     exit; 
  } 

  { print }
'

您想要在需要处截断的单词需要替换脚本中单词 vel 的两个实例。

您也可以安全地将整个脚本放在一行上。

I'm assuming your input is something like this:

Lorem ipsum dolor sit amet,
consectetur adipiscing velit.
Nullam neque sapien, molestie vel congue non,
feugiat quis tellus. Ut quis
nulla mi. Maecenas a ligula.

and you want the output to be cut off at the word 'vel' like so:

Lorem ipsum dolor sit amet,
consectetur adipiscing velit.
Nullam neque sapien, molestie

In that case, your awk script would be:

cat lorem.txt | awk ' 
  /\<vel\>/ 
  {
     print substr($0, 0, match($0, /\<vel\>/) - 1); 
     exit; 
  } 

  { print }
'

The word you want to cut off at needs to replace both instances of the word vel in the script.

You can safely put the entire script on one line, too.

回复收藏 0 原文

何必那么矫情 2024-08-01 20:21:25

awk '/word/{exit}1' file

awk '/word/{exit}1' file

回复收藏 0 原文

秋意浓 2024-08-01 20:21:25

我不确定如何使用 awk 执行此操作，但您可以使用 sed 执行此操作：

sed -i~ -e 's/the-word-to-find.*$//' the-file

这将删除从 the-word-to-find 到行尾的所有内容，在每一行上包含要查找的单词。如果您想在第一次出现 the-word-to-find 时删除文件的其余部分，您可以这样做：

sed -i~ -e 's/\(the-word-to-find\).*$/\1/;/the-word-to-find/,$d'

I'm not sure how to do it with awk, but you could do it with sed:

sed -i~ -e 's/the-word-to-find.*$//' the-file

This will delete everything from the-word-to-find to the end of the line, on every line that contains the-word-to-find. If you want to delete the rest of the file upon the first occurrence of the-word-to-find, you could do:

sed -i~ -e 's/\(the-word-to-find\).*$/\1/;/the-word-to-find/,$d'

回复收藏 0 原文

执笔绘流年 2024-08-01 20:21:25

这个 awk 一行应该可以解决问题：
{ sub(/ 单词。*/, ""); 打印 }
对于每一行，如果该行包含以单词开头（以空格开头）并到达该行末尾的模式 - 用空字符串替换该模式 - 然后打印更新的行。

[认为问题可以以任何一种方式阅读（该行的整个文本或文件中的整个文本）。如果想跳过文件的其余部分，可以： {skip = gsub(/ word.*/, ""); 打印 ; if (跳过) 退出 } ]

回复收藏 0 原文

零時差 2024-08-01 20:21:25

使用 sed 删除部分行，例如：

$ echo '12345 John Smith / red black or blue it is a test' | sed -e 's/\/.*//'

$ 12345 John Smith

To delete part of line with sed, eg:

$ echo '12345 John Smith / red black or blue it is a test' | sed -e 's/\/.*//'

$ 12345 John Smith

回复收藏 0 原文

~没有更多了~

关于作者

↙温凉少女

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

如何使用awk删除文件的一部分

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

内心激荡

JSmiles

赏烟花じ飞满天

左秋

迪街小绵羊

瞳孔里扚悲伤

友情链接

如何使用awk删除文件的一部分

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

内心激荡

JSmiles

赏烟花じ飞满天

左秋

迪街小绵羊

瞳孔里扚悲伤

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。