如何编写 sed 脚本来从文本文件中 grep 信息

发布于 2025-01-02 11:05:22 字数 1038 浏览 3 评论 0原文

我正在尝试做我的作业,仅限于使用 sed 将输入文件过滤为某种输出格式。这是输入文件(名为 stocks):

Symbol;Name;Volume
================================================

BAC;Bank of America Corporation Com;238,059,612
CSCO;Cisco Systems, Inc.;28,159,455
INTC;Intel Corporation;22,501,784
MSFT;Microsoft Corporation;23,363,118
VZ;Verizon Communications Inc. Com;5,744,385
KO;Coca-Cola Company (The) Common;3,752,569
MMM;3M Company Common Stock;1,660,453

================================================

输出需要是:

BAC, CSCO, INTC, MSFT, VZ, KO, MMM

我确实想出了一个解决方案,但效率不高。这是我的 sed 脚本(名为 try.sed):

/.*;.*;[0-9].*/ { N
N
N
N
N
N
s/\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*/\1, \2, \3, \4, \5, \6, \7/gp
}

我在 shell 上运行的命令是:

$ sed -nf try.sed stocks

我的问题是,是否有更好的方法使用 sed 来获取相同的结果?我写的脚本只能处理 7 行数据。如果数据较长,我需要重新修改我的脚本。我不知道如何才能让它变得更好,所以我在这里寻求帮助!

感谢您的任何建议。

I'm trying to do my homework that is restricted to only using sed to filter an input file to a certain format of output. Here is the input file (named stocks):

Symbol;Name;Volume
================================================

BAC;Bank of America Corporation Com;238,059,612
CSCO;Cisco Systems, Inc.;28,159,455
INTC;Intel Corporation;22,501,784
MSFT;Microsoft Corporation;23,363,118
VZ;Verizon Communications Inc. Com;5,744,385
KO;Coca-Cola Company (The) Common;3,752,569
MMM;3M Company Common Stock;1,660,453

================================================

And the output needs to be:

BAC, CSCO, INTC, MSFT, VZ, KO, MMM

I did come up with a solution, but it's not efficient. Here is my sed script (named try.sed):

/.*;.*;[0-9].*/ { N
N
N
N
N
N
s/\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*/\1, \2, \3, \4, \5, \6, \7/gp
}

The command that I run on shell is:

$ sed -nf try.sed stocks

My question is, is there a better way of using sed to get the same result? The script I wrote only works with 7 lines of data. If the data is longer, I need to re-modify my script. I'm not sure how I can make it any better, so I'm here asking for help!

Thanks for any recommendations.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

口干舌燥 2025-01-09 11:05:22

另一种使用 sed 的方法:

sed -ne '/^====/,/^====/ { /;/ { s/;.*$// ; H } }; $ { g ; s/\n// ; s/\n/, /g ; p }' stocks

输出:

BAC, CSCO, INTC, MSFT, VZ, KO, MMM

解释:

-ne               # Process each input line without printing and execute next commands...
/^====/,/^====/   # For all lines between these...
{
  /;/             # If line has a semicolon...
  { 
    s/;.*$//      # Remove characters from first semicolon until end of line.
    H             # Append content to 'hold space'.
  }
};
$                 # In last input line...
{
  g               # Copy content of 'hold space' to 'pattern space' to work with it.
  s/\n//          # Remove first newline character.
  s/\n/, /g       # substitute the rest with output separator, comma in this case.
  p               # Print to output.

One more way using sed:

sed -ne '/^====/,/^====/ { /;/ { s/;.*$// ; H } }; $ { g ; s/\n// ; s/\n/, /g ; p }' stocks

Output:

BAC, CSCO, INTC, MSFT, VZ, KO, MMM

Explanation:

-ne               # Process each input line without printing and execute next commands...
/^====/,/^====/   # For all lines between these...
{
  /;/             # If line has a semicolon...
  { 
    s/;.*$//      # Remove characters from first semicolon until end of line.
    H             # Append content to 'hold space'.
  }
};
$                 # In last input line...
{
  g               # Copy content of 'hold space' to 'pattern space' to work with it.
  s/\n//          # Remove first newline character.
  s/\n/, /g       # substitute the rest with output separator, comma in this case.
  p               # Print to output.
生活了然无味 2025-01-09 11:05:22

编辑:我已经编辑了我的算法,因为我忽略了页眉和页脚(我认为它们只是为了我们的利益)。

sed 根据其设计,访问输入文件的每一行,然后对与某些规范匹配(或不匹配)的行执行表达式。如果您将脚本定制为一定数量的行,那么您肯定做错了什么!我不会为您编写脚本,因为这是家庭作业,但一种方法的总体思路是编写一个执行以下操作的脚本。将顺序视为脚本中事物应有的顺序。

  1. 使用 d 跳过前三行,这会删除模式空间并立即移至下一行。
  2. 对于不是空行的每一行,执行以下步骤。 (这一切都在一组花括号中。)
    1. 使用 s(替换)命令将第一个分号 (;) 之后的所有内容替换为逗号和空格 (", ")。李>
    2. 将当前模式空间附加到保持缓冲区(查看H)。
    3. 删除模式空间并移至下一行,如步骤 1 所示。
  3. 对于脚本中到达此点的每一行(应该是第一个空行),将保留空间的内容检索到模式空间中。 (这将在上面的大括号之后。)将
  4. 模式空间中的所有换行符替换为空。
  5. 接下来,将模式空间中的最后一个逗号和空格替换为空。
  6. 最后,退出程序,这样就不再处理任何行。我的脚本在没有这个的情况下也能工作,但我不能 100% 确定为什么。

话虽这么说,这只是解决问题的一种方法。 sed 通常提供不同复杂程度的不同方法来完成任务。我用这种方法编写的解决方案有 10 行长。

请注意,我不费心抑制打印(使用 -n)或手动打印(使用 p);默认情况下打印每一行。我的脚本运行如下:

$ sed -f companies.sed companies 
BAC, CSCO, INTC, MSFT, VZ, KO, MMM

Edit: I've edited my algorithm, since I had neglected to consider the header and footer (I thought they were just for our benefit).

sed, by its design, accesses every line of an input file, and then performs expressions on ones that match some specification (or none). If you're tailoring your script to a certain number of lines, you're definitely doing something wrong! I won't write you a script since this is homework, but the general idea for one way to go about it is to write a script that does the following. Think of the ordering as the order things should be in a script.

  1. Skip the first three lines using d, which deletes the pattern space and immediately moves on to the next line.
  2. For each line that isn't a blank line, do the following steps. (This would all be in a single set of curly braces.)
    1. Replace everything after and including the first semicolon (;) with a comma-and-space (", ") using the s (substitute) command.
    2. Append the current pattern space into the hold buffer (look at H).
    3. Delete the pattern space and move on to the next line, like in step 1.
  3. For each line that gets to this point in the script (should be the first blank line), retrieve the contents of the hold space into the pattern space. (This would be after the curly braces above.)
  4. Substitute all newlines in the pattern space with nothing.
  5. Next, substitute the last comma-and-space in the pattern space with nothing.
  6. Finally, quit the program so you don't process any more lines. My script worked without this, but I'm not 100% sure why.

That being said, that's just one way to go about it. sed often offers varying ways of varying complexity to accomplish a task. A solution I wrote with this method is 10 lines long.

As a note, I don't bother suppressing printing (with -n) or manually printing (with p); each line is printed by default. My script runs like this:

$ sed -f companies.sed companies 
BAC, CSCO, INTC, MSFT, VZ, KO, MMM
め可乐爱微笑 2025-01-09 11:05:22

此 sed 命令应生成您所需的输出:

sed -rn '/[0-9]+$/{s/^([^;]*).*$/\1/p;}' file.txt

或者在 Mac 上:

sed -En '/[0-9]+$/{s/^([^;]*).*$/\1/p;}' file.txt

This sed command should produce your required output:

sed -rn '/[0-9]+$/{s/^([^;]*).*$/\1/p;}' file.txt

OR on Mac:

sed -En '/[0-9]+$/{s/^([^;]*).*$/\1/p;}' file.txt
零時差 2025-01-09 11:05:22

这可能对您有用:

sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;q};d' stocks
  • 我们不需要标题,所以让我们删除它们。 1d
  • 所有数据项均由 ; 分隔,因此让我们重点关注这些行。 /;/
  • 上述内容中,删除从第一个 ; 到行尾的所有内容,然后将其塞入保留空间 (HS) {s /;.*//;H}
  • 当到达最后一行时,使用 g 命令用 HS 覆盖它,删除第一个换行符(由 H 生成) 命令),用逗号和空格替换所有后续换行符并打印出剩下的内容。 ${g;s/.//;s/\n/, /g;q}
  • 删除其他所有内容 d

这是一个终端会话,显示了构建sed命令:

cat <<! >stock # paste the file into a here doc and pass it on to a file
> Symbol;Name;Volume
> ================================================
> 
> BAC;Bank of America Corporation Com;238,059,612
> CSCO;Cisco Systems, Inc.;28,159,455
> INTC;Intel Corporation;22,501,784
> MSFT;Microsoft Corporation;23,363,118
> VZ;Verizon Communications Inc. Com;5,744,385
> KO;Coca-Cola Company (The) Common;3,752,569
> MMM;3M Company Common Stock;1,660,453
> 
> ================================================
> !
sed '1d;/;/!d' stock # delete headings and everything but data lines
BAC;Bank of America Corporation Com;238,059,612
CSCO;Cisco Systems, Inc.;28,159,455
INTC;Intel Corporation;22,501,784
MSFT;Microsoft Corporation;23,363,118
VZ;Verizon Communications Inc. Com;5,744,385
KO;Coca-Cola Company (The) Common;3,752,569
MMM;3M Company Common Stock;1,660,453
sed '1d;/;/{s/;.*//p};d' stock # delete all non essential data
BAC
CSCO
INTC
MSFT
VZ
KO
MMM
sed '1d;/;/{s/;.*//;H};${g;l};d' stock # use the l command to see what's really there!
\nBAC\nCSCO\nINTC\nMSFT\nVZ\nKO\nMMM$
sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;l};d' stock # refine refine
BAC, CSCO, INTC, MSFT, VZ, KO, MMM$
sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;q};d' stock # all done!
BAC, CSCO, INTC, MSFT, VZ, KO, MMM

This might work for you:

sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;q};d' stocks
  • We don't want the headings so let's delete them. 1d
  • All data items are delimited by ;'s so let's concentrate on those lines. /;/
  • Of the things above delete everything from the first ; to the end of line and then stuff it away in the the hold space (HS) {s/;.*//;H}
  • When you get to the last line, overwrite it with the HS using the g command, delete the first newline (generated by the H command), replace all subsequent newlines with a comma and a space and print out what's left. ${g;s/.//;s/\n/, /g;q}
  • Delete everything else d

Here's a terminal session showing the incremental refinement of building a sed command:

cat <<! >stock # paste the file into a here doc and pass it on to a file
> Symbol;Name;Volume
> ================================================
> 
> BAC;Bank of America Corporation Com;238,059,612
> CSCO;Cisco Systems, Inc.;28,159,455
> INTC;Intel Corporation;22,501,784
> MSFT;Microsoft Corporation;23,363,118
> VZ;Verizon Communications Inc. Com;5,744,385
> KO;Coca-Cola Company (The) Common;3,752,569
> MMM;3M Company Common Stock;1,660,453
> 
> ================================================
> !
sed '1d;/;/!d' stock # delete headings and everything but data lines
BAC;Bank of America Corporation Com;238,059,612
CSCO;Cisco Systems, Inc.;28,159,455
INTC;Intel Corporation;22,501,784
MSFT;Microsoft Corporation;23,363,118
VZ;Verizon Communications Inc. Com;5,744,385
KO;Coca-Cola Company (The) Common;3,752,569
MMM;3M Company Common Stock;1,660,453
sed '1d;/;/{s/;.*//p};d' stock # delete all non essential data
BAC
CSCO
INTC
MSFT
VZ
KO
MMM
sed '1d;/;/{s/;.*//;H};${g;l};d' stock # use the l command to see what's really there!
\nBAC\nCSCO\nINTC\nMSFT\nVZ\nKO\nMMM$
sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;l};d' stock # refine refine
BAC, CSCO, INTC, MSFT, VZ, KO, MMM$
sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;q};d' stock # all done!
BAC, CSCO, INTC, MSFT, VZ, KO, MMM
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文