如何从文件中提取文本行?

发布于 2024-07-08 08:26:11 字数 338 浏览 9 评论 0原文

我有一个充满文件的目录,我需要从中提取页眉和页脚。 它们的长度都是可变的,因此使用头部或尾部是行不通的。 每个文件都有一行我可以搜索,但我不想在结果中包含该行。

通常

*** Start (more text here)

以 And 结尾,

*** Finish (more text here)

我希望文件名保持不变,因此我需要覆盖原始文件,或者写入不同的目录,然后我自己覆盖它们。

哦,是的,当然是在 Linux 服务器上,所以我有 Perl、sed、awk、grep 等。

I have a directory full of files and I need to pull the headers and footers off of them. They are all variable length so using head or tail isn't going to work. Each file does have a line I can search for, but I don't want to include the line in the results.

It's usually

*** Start (more text here)

And ends with

*** Finish (more text here)

I want the file names to stay the same, so I need to overwrite the originals, or write to a different directory and I'll overwrite them myself.

Oh yeah, it's on a linux server of course, so I have Perl, sed, awk, grep, etc.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

一身仙ぐ女味 2024-07-15 08:26:12

尝试使用触发器!“..”运算符。

# flip-flop.pl
use strict;
use warnings;

my $start  = qr/^\*\*\* Start/;
my $finish = qr/^\*\*\* Finish/;

while ( <> ) {
    if ( /$start/ .. /$finish/ ) {
        next  if /$start/ or /$finish/;
        print $_;
    }
}

然后,您可以使用 -i perl 开关来更新您的文件,如下所示......

 $ perl -i'copy_*' flip-flop.pl data.txt 

这会更改 data.txt 但事先将其复制为“copy_data.txt”。

Try the flip flop! ".." operator.

# flip-flop.pl
use strict;
use warnings;

my $start  = qr/^\*\*\* Start/;
my $finish = qr/^\*\*\* Finish/;

while ( <> ) {
    if ( /$start/ .. /$finish/ ) {
        next  if /$start/ or /$finish/;
        print $_;
    }
}

U can then use the -i perl switch to update your file(s) like so.....

 $ perl -i'copy_*' flip-flop.pl data.txt 

...which changes data.txt but makes a copy beforehand as "copy_data.txt".

初心 2024-07-15 08:26:12

GNU coreutils 是你的朋友...

csplit inputfile %^\*\*\* Start%1 /^\*\*\* Finish/ %% {*}

这会生成你想要的文件 xx00。 您可以通过选项 --prefix--suffix--digits 更改此行为,但请参阅 手册为您自己。 由于 csplit 旨在生成多个文件,因此不可能生成没有后缀的文件,因此您必须手动或通过脚本进行覆盖:

csplit $1 %^\*\*\* Start%1 /^\*\*\* Finish/ %% {*}
mv -f xx00 $1

根据需要添加循环。

GNU coreutils are your friend...

csplit inputfile %^\*\*\* Start%1 /^\*\*\* Finish/ %% {*}

This produces your desired file as xx00. You can change this behaviour through the options --prefix, --suffix, and --digits, but see the manual for yourself. Since csplit is designed to produce a number of files, it is not possible to produce a file without suffix, so you will have to do the overwriting manually or through a script:

csplit $1 %^\*\*\* Start%1 /^\*\*\* Finish/ %% {*}
mv -f xx00 $1

Add loops as you desire.

咽泪装欢 2024-07-15 08:26:12

获取页眉

cat yourFileHere | awk '{if (d > 0) print $0} /.*Start.*/ {d = 1}'

获取页脚

cat yourFileHere | awk '/.*Finish.*/ {d = 1} {if (d < 1) print $0}'

根据需要从页眉到页脚获取文件:

cat yourFileHere | awk '/.*Start.*/ {d = 1; next} /.*Finish.*/ {d = 0; next} {if (d > 0) print $0}'

还有一种方法,使用 csplit命令,您应该尝试类似的操作:

csplit yourFileHere /Start/ /Finish/

并检查名为“xxNN”的文件,其中 NN 正在运行,还请查看 csplit 联机帮助页

To get the header:

cat yourFileHere | awk '{if (d > 0) print $0} /.*Start.*/ {d = 1}'

To get the footer:

cat yourFileHere | awk '/.*Finish.*/ {d = 1} {if (d < 1) print $0}'

To get the file from header to footer as you want:

cat yourFileHere | awk '/.*Start.*/ {d = 1; next} /.*Finish.*/ {d = 0; next} {if (d > 0) print $0}'

There's one more way, with csplit command, you should try something like:

csplit yourFileHere /Start/ /Finish/

And examine files named 'xxNN' where NN is running number, also take a look at csplit manpage.

另类 2024-07-15 08:26:12

或许? 从不删除开始到结束。

$ sed -i '/^\*\*\* Start/,/^\*\*\* Finish/d!' *

或者...不太确定...但是,如果它有效,也应该删除开始和结束行:

$ sed -i -e '/./,/^\*\*\* Start/d' -e '/^\*\*\* Finish/,/./d' *

d!可能取决于sed的构建你有——不确定。
而且,我完全凭记忆(可能很差)写下了这篇文章。

Maybe? Start to Finish with not-delete.

$ sed -i '/^\*\*\* Start/,/^\*\*\* Finish/d!' *

or...less sure of it...but, if it works, should remove the Start and Finish lines as well:

$ sed -i -e '/./,/^\*\*\* Start/d' -e '/^\*\*\* Finish/,/./d' *

d! may depend on the build of sed you have -- not sure.
And, I wrote that entirely on (probably poor) memory.

冰雪之触 2024-07-15 08:26:12

一个快速的 Perl hack,未经测试。 我对 sed 或 awk 的使用不够流利,无法使用它们获得这种效果,但我对如何做到这一点很感兴趣。

#!/usr/bin/perl -w
use strict;
use Tie::File;
my $Filename=shift;  
tie my @File, 'Tie::File', $Filename or die "could not access $Filename.\n";  
while (shift @File !~ /^\*\*\* Start/) {};  
while (pop @File !~ /^\*\*\* Finish/) {};  
untie @File;  

A quick Perl hack, not tested. I am not fluent enough in sed or awk to get this effect with them, but I would be interested in how that would be done.

#!/usr/bin/perl -w
use strict;
use Tie::File;
my $Filename=shift;  
tie my @File, 'Tie::File', $Filename or die "could not access $Filename.\n";  
while (shift @File !~ /^\*\*\* Start/) {};  
while (pop @File !~ /^\*\*\* Finish/) {};  
untie @File;  
活雷疯 2024-07-15 08:26:12

perlfaq5:如何在文件中更改、删除或插入行中的一些示例,或附加到文件的开头? 可能会有所帮助。 您必须使它们适应您的情况。 另外,Leon 的触发器运算符答案是在 Perl 中执行此操作的惯用方法,尽管您不必修改文件即可使用它。

Some of the examples in perlfaq5: How do I change, delete, or insert a line in a file, or append to the beginning of a file? may help. You'll have to adapt them to your situation. Also, Leon's flip-flop operator answer is the idiomatic way to do this in Perl, although you don't have to modify the file in place to use it.

故事还在继续 2024-07-15 08:26:12

覆盖原始文件的 Perl 解决方案。

#!/usr/bin/perl -ni
if(my $num = /^\*\*\* Start/ .. /^\*\*\* Finish/) {
    print if $num != 1 and $num + 0 eq $num;
}

A Perl solution that overwrites the original file.

#!/usr/bin/perl -ni
if(my $num = /^\*\*\* Start/ .. /^\*\*\* Finish/) {
    print if $num != 1 and $num + 0 eq $num;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文