如何在 Perl 中使用变量进行替换?

发布于 2024-09-10 13:28:08 字数 1345 浏览 5 评论 0原文

我有几个文本文件,它们曾经是数据库中的表,现在已被反汇编。我正在尝试重新组装它们,一旦我将它们变成可用的形式,这就会很容易。第一个文件“keys.text”只是一个标签列表,格式不一致。例如:

Sa 1 #
Sa 2
U 328 #*

它总是字母、[空格]、数字、[空格],有时还包括符号。与这些键匹配的文本文件是相同的,后跟一行文本,也用空格分隔或定界。

Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...

我在下面的代码中尝试做的是将“keys.text”中的键与 .txt 文件中的相同键进行匹配,并在键和文本之间放置一个制表符。我确信我忽略了一些非常基本的东西,但我得到的结果看起来与源 .txt 文件相同。

预先感谢您的任何线索或帮助!

#!/usr/bin/perl

use strict;
use warnings;
use diagnostics;
open(IN1, "keys.text");

my $key;

# Read each line one at a time
while ($key = <IN1>) {

# For each txt file in the current directory
foreach my $file (<*.txt>) {
  open(IN, $file) or die("Cannot open TXT file for reading: $!");
  open(OUT, ">temp.txt") or die("Cannot open output file: $!");

  # Add temp modified file into directory 
  my $newFilename = "modified\/keyed_" . $file;
  my $line;

  # Read each line one at a time
  while ($line = <IN>) {

     $line =~ s/"\$key"/"\$key" . "\/t"/;
     print(OUT "$line");

  }
  rename("temp.txt", "$newFilename");
 }   
}

编辑:为了澄清,结果还应该保留按键中的符号(如果有)。所以它们看起来像:

Sa 1 #      Random line of text follows.
Sa 2        This text is just as random.
U 328 #*    Continuing text...

I have several text files, that were once tables in a database, which is now disassembled. I'm trying to reassemble them, which will be easy, once I get them into a usable form. The first file, "keys.text" is just a list of labels, inconsistently formatted. Like:

Sa 1 #
Sa 2
U 328 #*

It's always letter(s), [space], number(s), [space], and sometime symbol(s). The text files that match these keys are the same, then followed by a line of text, also separated, or delimited, by a SPACE.

Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...

What I'm trying to do in the code below, is match the key from "keys.text", with the same key in the .txt files, and put a tab between the key, and the text. I'm sure I'm overlooking something very basic, but the result I'm getting, looks identical to the source .txt file.

Thanks in advance for any leads or assistance!

#!/usr/bin/perl

use strict;
use warnings;
use diagnostics;
open(IN1, "keys.text");

my $key;

# Read each line one at a time
while ($key = <IN1>) {

# For each txt file in the current directory
foreach my $file (<*.txt>) {
  open(IN, $file) or die("Cannot open TXT file for reading: $!");
  open(OUT, ">temp.txt") or die("Cannot open output file: $!");

  # Add temp modified file into directory 
  my $newFilename = "modified\/keyed_" . $file;
  my $line;

  # Read each line one at a time
  while ($line = <IN>) {

     $line =~ s/"\$key"/"\$key" . "\/t"/;
     print(OUT "$line");

  }
  rename("temp.txt", "$newFilename");
 }   
}

EDIT: Just to clarify, the results should retain the symbols from the keys as well, if there are any. So they'd look like:

Sa 1 #      Random line of text follows.
Sa 2        This text is just as random.
U 328 #*    Continuing text...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

坠似风落 2024-09-17 13:28:08

对我来说,正则表达式的引用似乎很奇怪。 效果不是

$line =~ s/$key/$key\t/;

更好吗?

此外,IIRC, 会将换行符保留在 $key 的末尾。 chomp $key 来摆脱它。

并且不要在 print 参数两边加上括号,尤其是在写入文件句柄时。无论是否正确,它看起来都是错误的,并且会分散人们对真正问题的注意力。

The regex seems quoted rather oddly to me. Wouldn't

$line =~ s/$key/$key\t/;

work better?

Also, IIRC, <IN1> will leave the newline on the end of your $key. chomp $key to get rid of that.

And don't put parentheses around your print args, esp when you're writing to a file handle. It looks wrong, whether it is or not, and distracts people from the real problems.

无法言说的痛 2024-09-17 13:28:08

如果 Perl 不是必须的,你可以使用 awk oneliner

$ cat keys.txt
Sa 1 #
Sa 2
U 328 #*

$ cat mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...

$ awk 'FNR==NR{ k[$1 SEP $2];next }($1 SEP $2 in k) {$2=$2"\t"}1 ' keys.txt mytext.txt
Sa 1     # Random line of text follows.
Sa 2     This text is just as random.
U 328    #* Continuing text...

if Perl is not a must, you can use this awk one liner

$ cat keys.txt
Sa 1 #
Sa 2
U 328 #*

$ cat mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...

$ awk 'FNR==NR{ k[$1 SEP $2];next }($1 SEP $2 in k) {$2=$2"\t"}1 ' keys.txt mytext.txt
Sa 1     # Random line of text follows.
Sa 2     This text is just as random.
U 328    #* Continuing text...
梦毁影碎の 2024-09-17 13:28:08

使用 split 而不是 s/// 使问题变得简单。在下面的代码中,read_keyskeys.text 中提取密钥并将它们记录在哈希中。

然后,对于在命令行上命名的所有文件(在特殊 Perl 数组 @ARGV 中可用),我们检查每一行以查看它是否以键开头。如果没有,我们就不管它,但否则在键和文本之间插入一个制表符。

请注意,由于 Perl 方便的 -i 选项,我们就地编辑了文件:

-i[扩展名]

指定由 <> 构造处理的文件将被就地编辑。它通过重命名输入文件、按原始名称打开输出文件并选择该输出文件作为 print 语句的默认文件来实现此目的。扩展名(如果提供)用于修改旧文件的名称以制作备份副本......

split " ", $_, 3 将当前行精确地分成三个字段。这对于保护可能出现在该行文本部分中的空白是必要的。

#! /usr/bin/perl -i.bak

use warnings;
use strict;

sub usage { "Usage: $0 text-file\n" }

sub read_keys {
  my $path = "keys.text";
  open my $fh, "<", $path
    or die "$0: open $path: $!";

  my %key;
  while (<$fh>) {
    my($text,$num) = split;
    ++$key{$text}{$num} if defined $text && defined $num;
  }

  wantarray ? %key : \%key;
}

die usage unless @ARGV;
my %key = read_keys;

while (<>) {
  my($text,$num,$line) = split " ", $_, 3;
  $_ = "$text $num\t$line" if defined $text &&
                              defined $num &&
                              $key{$text}{$num};
  print;
}

示例运行:

$ ./add-tab input

$ diff -u input.bak input
--- input.bak   2010-07-20 20:47:38.688916978 -0500
+++ input   2010-07-20 21:00:21.119531937 -0500
@@ -1,3 +1,3 @@
-Sa 1 # Random line of text follows.
-Sa 2 This text is just as random.
-U 328 #* Continuing text...
+Sa 1   # Random line of text follows.
+Sa 2   This text is just as random.
+U 328  #* Continuing text...

Using split rather than s/// makes the problem straightforward. In the code below, read_keys extracts the keys from keys.text and records them in a hash.

Then for all files named on the command line, available in the special Perl array @ARGV, we inspect each line to see whether it begins with a key. If not, we leave it alone, but otherwise insert a TAB between the key and the text.

Note that we edit the files in-place thanks to Perl's handy -i option:

-i[extension]

specifies that files processed by the <> construct are to be edited in-place. It does this by renaming the input file, opening the output file by the original name, and selecting that output file as the default for print statements. The extension, if supplied, is used to modify the name of the old file to make a backup copy …

The line split " ", $_, 3 separates the current line into exactly three fields. This is necessary to protect whitespace that's likely to be present in the text portion of the line.

#! /usr/bin/perl -i.bak

use warnings;
use strict;

sub usage { "Usage: $0 text-file\n" }

sub read_keys {
  my $path = "keys.text";
  open my $fh, "<", $path
    or die "$0: open $path: $!";

  my %key;
  while (<$fh>) {
    my($text,$num) = split;
    ++$key{$text}{$num} if defined $text && defined $num;
  }

  wantarray ? %key : \%key;
}

die usage unless @ARGV;
my %key = read_keys;

while (<>) {
  my($text,$num,$line) = split " ", $_, 3;
  $_ = "$text $num\t$line" if defined $text &&
                              defined $num &&
                              $key{$text}{$num};
  print;
}

Sample run:

$ ./add-tab input

$ diff -u input.bak input
--- input.bak   2010-07-20 20:47:38.688916978 -0500
+++ input   2010-07-20 21:00:21.119531937 -0500
@@ -1,3 +1,3 @@
-Sa 1 # Random line of text follows.
-Sa 2 This text is just as random.
-U 328 #* Continuing text...
+Sa 1   # Random line of text follows.
+Sa 2   This text is just as random.
+U 328  #* Continuing text...
梦断已成空 2024-09-17 13:28:08

有趣的答案:

$line =~ s/(?<=$key)/\t/;

其中 (?<=XXXX)XXXX 的零宽度正向后查找。这意味着它匹配紧接着 XXXX,而不是被替换的比赛的一部分。

并且:

$line =~ s/$key/$key . "\t"/e;

末尾的 /e 标志表示在填充之前对 s/// 后半部分的内容进行一次 eval 重要说明

:我不推荐其中任何一个,它们会混淆程序。但它们很有趣。 :-)

Fun answers:

$line =~ s/(?<=$key)/\t/;

Where (?<=XXXX) is a zero-width positive lookbehind for XXXX. That means it matches just after XXXX without being part of the match that gets substituted.

And:

$line =~ s/$key/$key . "\t"/e;

Where the /e flag at the end means to do one eval of what's in the second half of the s/// before filling it in.

Important note: I'm not recommending either of these, they obfuscate the program. But they're interesting. :-)

放肆 2024-09-17 13:28:08

对每个文件进行两次单独的吸食怎么样?对于第一个文件,您打开密钥并创建初步哈希。对于第二个文件,您所需要做的就是将文本添加到哈希中。

use strict;
use warnings;

my $keys_file = "path to keys.txt";
my $content_file = "path to content.txt";
my $output_file = "path to output.txt";

my %hash = ();

my $keys_regex = '^([a-zA-Z]+)\s*\(d+)\s*([^\da-zA-Z\s]+)';

open my $fh, '<', $keys_file or die "could not open $key_file";
while(<$fh>){
    my $line = $_;
    if ($line =~ /$keys_regex/){
        my $key = $1;
        my $number = $2;
        my $symbol = $3;
        $hash{$key}{'number'} = $number;
        $hash{$key}{'symbol'} = $symbol;
    }
}
close $fh;

open my $fh, '<', $content_file or die "could not open $content_file";
while(<$fh>){
    my $line = $_;
    if ($line =~ /^([a-zA-Z]+)/){
        my $key = $1;
// strip content_file line from keys/number/symbols to leave text
        line =~ s/^$key//;
        line =~ s/\s*$hash{$key}{'number'}//;
        line =~ s/\s*$hash{$key}{'symbol'}//;
        $line =~ s/^\s+//g;
        $hash{$key}{'text'} = $line;
    }
}
close $fh;

open my $fh, '>', $output_file or die "could not open $output_file";
for my $key (keys %hash){
    print $fh $key . " " . $hash{$key}{'number'} . " " . $hash{$key}{'symbol'} . "\t" . $hash{$key}{'text'} . "\n";
}
close $fh;

我还没有机会测试它,并且该解决方案对于所有正则表达式似乎有点老套,但可能会让您了解可以尝试的其他方法。

How about doing two separate slurps of each file. For the first file you open the keys and create a preliminary hash. For the second file then all you need to do is add the text to the hash.

use strict;
use warnings;

my $keys_file = "path to keys.txt";
my $content_file = "path to content.txt";
my $output_file = "path to output.txt";

my %hash = ();

my $keys_regex = '^([a-zA-Z]+)\s*\(d+)\s*([^\da-zA-Z\s]+)';

open my $fh, '<', $keys_file or die "could not open $key_file";
while(<$fh>){
    my $line = $_;
    if ($line =~ /$keys_regex/){
        my $key = $1;
        my $number = $2;
        my $symbol = $3;
        $hash{$key}{'number'} = $number;
        $hash{$key}{'symbol'} = $symbol;
    }
}
close $fh;

open my $fh, '<', $content_file or die "could not open $content_file";
while(<$fh>){
    my $line = $_;
    if ($line =~ /^([a-zA-Z]+)/){
        my $key = $1;
// strip content_file line from keys/number/symbols to leave text
        line =~ s/^$key//;
        line =~ s/\s*$hash{$key}{'number'}//;
        line =~ s/\s*$hash{$key}{'symbol'}//;
        $line =~ s/^\s+//g;
        $hash{$key}{'text'} = $line;
    }
}
close $fh;

open my $fh, '>', $output_file or die "could not open $output_file";
for my $key (keys %hash){
    print $fh $key . " " . $hash{$key}{'number'} . " " . $hash{$key}{'symbol'} . "\t" . $hash{$key}{'text'} . "\n";
}
close $fh;

I haven't had a chance to test it yet and the solution seems a little hacky with all the regex but might give you an idea of something else you can try.

蓬勃野心 2024-09-17 13:28:08

这看起来是 Perl 中 map 函数的完美位置!将整个文本文件读入数组,然后在整个数组上应用映射函数。您可能想做的唯一一件事是使用 quotemeta 函数转义键中任何可能的正则表达式。

使用map非常高效。我还将密钥读入数组,以便不必在循环中不断打开和关闭密钥文件。这是一个 O^2 算法,但如果你的密钥不是那么大,它应该不会太糟糕。

#! /usr/bin/env perl

use strict;
use vars;
use warnings;

open (KEYS, "keys.text")
    or die "Cannot open 'keys.text' for reading\n";
my @keys = <KEYS>;
close (KEYS);

foreach my $file (glob("*.txt")) {
    open (TEXT, "$file")
        or die "Cannot open '$file' for reading\n";
    my @textArray = <TEXT>;
    close (TEXT);

    foreach my $line (@keys) {
        chomp $line;
        map($_ =~ s/^$line/$line\t/, @textArray);
    }
    open (NEW_TEXT, ">$file.new") or
        die qq(Can't open file "$file" for writing\n);

    print TEXT join("\n", @textArray) . "\n";
close (TEXT);
}

This looks like the perfect place for the map function in Perl! Read in the entire text file into an array, then apply the map function across the entire array. The only other thing you might want to do is use the quotemeta function to escape out any possible regular expressions in your keys.

Using map is very efficient. I also read the keys into an array in order to not have to keep opening and closing the keys file in my loop. It's an O^2 algorithm, but if your keys aren't that big, it shouldn't be too bad.

#! /usr/bin/env perl

use strict;
use vars;
use warnings;

open (KEYS, "keys.text")
    or die "Cannot open 'keys.text' for reading\n";
my @keys = <KEYS>;
close (KEYS);

foreach my $file (glob("*.txt")) {
    open (TEXT, "$file")
        or die "Cannot open '$file' for reading\n";
    my @textArray = <TEXT>;
    close (TEXT);

    foreach my $line (@keys) {
        chomp $line;
        map($_ =~ s/^$line/$line\t/, @textArray);
    }
    open (NEW_TEXT, ">$file.new") or
        die qq(Can't open file "$file" for writing\n);

    print TEXT join("\n", @textArray) . "\n";
close (TEXT);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文