根据文件内容和模式匹配拆分文件

发布于 2024-12-18 05:01:33 字数 500 浏览 1 评论 0原文

我需要你的帮助来使用 bash/linux 格式化一个 txt 文件。该文件如下所示,它总是有一行名为“Rate: Sth”的行,然后后面是非常具体格式的详细信息。我想将文件拆分为每个文件一个费率。在此示例中,我想要 3 个文件,每个文件都有相应的行说明速率值是多少。

您将如何处理这个问题?

line No. Main Text
1    Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated
211  Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated
1002 Rate: USD
1003 21/11/11,-0.004419534,Validated

I need your help with formate a txt file using bash/linux. The file looks like the following, it always has a line called Rate: Sth then it follows with the details in the very specific format. I'd like to split the file up with one rate for each file. In this example, I'd like to have 3 file, and each has the corresponding line says what the Rate value was.

How will you approach this?

line No. Main Text
1    Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated
211  Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated
1002 Rate: USD
1003 21/11/11,-0.004419534,Validated

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

极致的悲 2024-12-25 05:01:33

这可能对您有用:

csplit -z -f 'temp' -b '%02d.txt' file /Rate/ {*}

这将生成文件 temp00.txt、temp01.txt...

如果您只需要 Rate 行;

sed -i '/Rate/!d' temp*.txt

This might work for you:

csplit -z -f 'temp' -b '%02d.txt' file /Rate/ {*}

This will produce files temp00.txt, temp01.txt...

If you only want the Rate line then;

sed -i '/Rate/!d' temp*.txt
阳光下的泡沫是彩色的 2024-12-25 05:01:33

我会在 perl 中执行此操作:

#!/usr/bin/perl

use strict;
use warnings;

open (my $out, ">-") or die "oops";

while(<>)
{
    if (m/^Rate: (\w+)/o)
    {
        close $out and open ($out, ">$1") or die "oops";
        next;
    }

    print $out $_
}

使用它就像

perl ./test.pl input.txt

I'd do this in perl:

#!/usr/bin/perl

use strict;
use warnings;

open (my $out, ">-") or die "oops";

while(<>)
{
    if (m/^Rate: (\w+)/o)
    {
        close $out and open ($out, ">$1") or die "oops";
        next;
    }

    print $out $_
}

Use it like

perl ./test.pl input.txt
不忘初心 2024-12-25 05:01:33

(g)awk 来救援:

awk '/^Rate:/ {output_file_name=$2; getline } 
     { print $0 >> ( output_file_name ) }' INPUT_FILE

第一个规则和命令针对以 Rate: 开头的行执行,并且仅设置输出文件名,然后从输入文件。然后处理下一行并将其写入输出文件。之后,下一行仅由第二个命令处理(写入输出文件),但前提是它与 Rate: 不匹配。

注意:如果输入文件中有一个部分包含两行连续的 Rate:,则上述解决方案可能会失败,如下所示:

... DATA ...
Rate: GBP
Rate: CHF
... DATA ...

应该这样做(假设行号不是原始文件的一部分)。

华泰

(g)awk to the rescue:

awk '/^Rate:/ {output_file_name=$2; getline } 
     { print $0 >> ( output_file_name ) }' INPUT_FILE

The first rule and command executes for the lines that starts with Rate: and only sets the output file name, then gets the next line from the input file. Then this next line is processed and gets written to the output file. After that the next line is processed by only the second command (gets written to the output file), but only if it not matches Rate:.

NOTE: The above solution might fail if there is a section in the input file with two continuous lines of Rate:s, like this:

... DATA ...
Rate: GBP
Rate: CHF
... DATA ...

should do (assuming that the line numbers are not part of the original file).

HTH

讽刺将军 2024-12-25 05:01:33

受 sehe 答案启发的一行:

>perl -pwe '
> if (/^Rate: (.+)/) { 
>    open $out, ">", "Rate_$1.txt" or die $!; 
>    select $out; 
> }' gasdata.txt

-p 选项将读取一行并在计算 -e 中的代码后打印它。 select 将为 print 选择默认文件句柄。因此,基本上,我们所做的只是根据当前处于活动状态的速率来调整文件句柄。

这是解析后的代码:

>perl -MO=Deparse -pwe 'if (/^Rate: (.+)/) { open $out, ">", "output/Rate_$1.txt" or die $!; select $out; }' gasdata.txt
BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
    if (/^Rate: (.+)/) {
        die $! unless open $out, '>', "output/Rate_$1.txt";
        select $out;
    }
}
continue {
    die "-p destination: $!\n" unless print $_;
}
-e syntax OK

A one-liner inspired by sehe's answer:

>perl -pwe '
> if (/^Rate: (.+)/) { 
>    open $out, ">", "Rate_$1.txt" or die $!; 
>    select $out; 
> }' gasdata.txt

The -p option will read a line and print it after the code in -e is evaluated. select will choose a default filehandle for print. So, basically, what we are doing is simply juggling the filehandle around, depending on which Rate is currently the active one.

Here's the code deparsed:

>perl -MO=Deparse -pwe 'if (/^Rate: (.+)/) { open $out, ">", "output/Rate_$1.txt" or die $!; select $out; }' gasdata.txt
BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
    if (/^Rate: (.+)/) {
        die $! unless open $out, '>', "output/Rate_$1.txt";
        select $out;
    }
}
continue {
    die "-p destination: $!\n" unless print $_;
}
-e syntax OK
神经暖 2024-12-25 05:01:33

另一个解决方案:它只是将您的输入文件放入脚本中,然后运行它:

sed 's/^Rate:/cat <<EOF >/; 1!s/^cat <<EOF/EOF\n&/; $aEOF' input.txt | bash

我假设行号不是文件的一部分。

Another solution: It just makes your input file into a script and then runs it:

sed 's/^Rate:/cat <<EOF >/; 1!s/^cat <<EOF/EOF\n&/; $aEOF' input.txt | bash

I assumed the line numbers are not part of the file.

时光匆匆的小流年 2024-12-25 05:01:33

您可以在 perl 中使用类似的东西 -

Perl 脚本:

#!/usr/bin/perl

undef $/;
$_ = <>;
$n = 0;

for $match (split(/(?=Rate)/)) {
      open(O, '>temp' . ++$n);
      print O $match;
      close(O);
}

执行:

[jaypal~/temp]$ ./spl.pl temp.file

[jaypal~/temp]$ **cat temp.file**
Line No. Main Text
1    Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated
211  Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated
1002 Rate: USD
1003 21/11/11,-0.004419534,Validated

[jaypal~/temp]$ cat temp1
Line No. Main Text
1    

[jaypal~/temp]$ cat temp2
Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated

211  

[jaypal~/temp]$ cat temp3
Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated

1002 [jaypal~/temp]$ cat temp4
Rate: USD
1003 21/11/11,-0.004419534,Validated
[jaypal~/temp]$ 

You can use something like this in perl -

Perl Script:

#!/usr/bin/perl

undef $/;
$_ = <>;
$n = 0;

for $match (split(/(?=Rate)/)) {
      open(O, '>temp' . ++$n);
      print O $match;
      close(O);
}

Execution:

[jaypal~/temp]$ ./spl.pl temp.file

[jaypal~/temp]$ **cat temp.file**
Line No. Main Text
1    Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated
211  Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated
1002 Rate: USD
1003 21/11/11,-0.004419534,Validated

[jaypal~/temp]$ cat temp1
Line No. Main Text
1    

[jaypal~/temp]$ cat temp2
Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated

211  

[jaypal~/temp]$ cat temp3
Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated

1002 [jaypal~/temp]$ cat temp4
Rate: USD
1003 21/11/11,-0.004419534,Validated
[jaypal~/temp]$ 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文