比较 2 个哈希中的 2 个处理过的键

发布于 2025-01-08 10:24:07 字数 1367 浏览 0 评论 0 原文

我想读取一个带有“!”等符号的文件和“^”,并希望在将它们与另一行中的其他字符串进行比较之前删除它们。如果删除符号后两个字符串相同,我想将它们存储在另一个名为“common”的哈希中。 例如... FileA:

hello!world
help?!3233
oh no^!!
yes!

FileB:

hello
help?
oh no
yes

在这种情况下,FileA 和 FileB 应该相同,因为我正在比较字符到“!”的位置。或出现“^”。 我使用以下代码读取文件:

open FILEA, "< script/".$fileA or die;
my %read_file;
while (my $line=<FILEA>) {
   (my $word1,my $word2) = split /\n/, $line;
   $word1 =~ s/(!.+)|(!.*)|(\^.+)|(\^.*)//;#to remove ! and ^
   $read_file{$word1} = $word1;
}
close(FILEA);

我打印出了哈希中的键,它显示了正确的结果(即将 FileA 转换为“你好,帮助?,哦不,是的)。但是,当我进行比较时使用以下代码的 FileA 和 FileB ,它总是失败,

while(($key,$value)=each(%config))
{
    $num=keys(%base_config);
    $num--;#to get the correct index
    while($num>=0)
    {
        $common{$value}=$value if exists $read_file{$key};#stored the correct matches in %common
        $num--;
    }
}

我尝试使用以下示例来测试我的替换并比较两个字符串,但它不知道为什么它不能从 a 读取字符串到哈希中。 。

use strict;
use warnings;

my $str="hello^vsd";
my $test="hello";
$str =~ s/(!.+)|(!.*)|(\^.+)|(\^.*)//;
my %hash=();
$hash{$str}=();
foreach my $key(keys %hash)
{
    print "$key\n";
}
print "yay\n" if exists $hash{$test};
print "boo\n" unless exists $hash{$test};

两个文件可以有不同的行数 搜索时,文本和文本行的顺序不必相同,即“哦不”可以出现在“你好”之前。

I want to read in a file with some symbols like "!" and "^" and would like to remove them before I compare them with other strings from another line. If both strings are the same after removing the symbols, I want to store them in another hash called "common".
For example...
FileA:

hello!world
help?!3233
oh no^!!
yes!

FileB:

hello
help?
oh no
yes

In this case, FileA and FileB should be identical as I am comparing characters up to the place where "!" or "^" appears.
I read the files by using the following code:

open FILEA, "< script/".$fileA or die;
my %read_file;
while (my $line=<FILEA>) {
   (my $word1,my $word2) = split /\n/, $line;
   $word1 =~ s/(!.+)|(!.*)|(\^.+)|(\^.*)//;#to remove ! and ^
   $read_file{$word1} = $word1;
}
close(FILEA);

I printed out the keys in the hash and it shows the correct result (ie. it converts FileA to "hello, help?, oh no, yes). However, when I do a comparison of FileA and FileB using the following code, it always fails.

while(($key,$value)=each(%config))
{
    $num=keys(%base_config);
    $num--;#to get the correct index
    while($num>=0)
    {
        $common{$value}=$value if exists $read_file{$key};#stored the correct matches in %common
        $num--;
    }
}

I tried to test my substitution and comparing between 2 strings using the following example and it works. I don't know why is it not working for reading strings into a hash from a file.

use strict;
use warnings;

my $str="hello^vsd";
my $test="hello";
$str =~ s/(!.+)|(!.*)|(\^.+)|(\^.*)//;
my %hash=();
$hash{$str}=();
foreach my $key(keys %hash)
{
    print "$key\n";
}
print "yay\n" if exists $hash{$test};
print "boo\n" unless exists $hash{$test};

Both files can have different number of lines of text and the lines of text need not be in the same order when searching. ie. "oh no" can come before "hello".

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

糖粟与秋泊 2025-01-15 10:24:07

这是同时读取两个文件的另一种解决方案(假设两个文件具有相同的行数):

use strict;
use warnings;

our $INVALID = '!\^'; #regexp character class, must escape

my $fileA = "file1.txt";
my $fileB = "file2.txt";

sub readl
{
  my $fh = shift;
  my $ln = "";

  if ($fh and $ln = <$fh>)
  {
    chomp $ln;
    $ln =~ s/[$INVALID]+.*//g;
  }

  return $ln;
}

my ($fhA, $fhB);
my ($wdA, $wdB);
my %common = ();

open $fhA, $fileA or die "$!\n";
open $fhB, $fileB or die "$!\n";

while ($wdA = readl($fhA) and $wdB = readl($fhB))
{
  $common{$wdA} = undef if $wdA eq $wdB;
}

print "$_\n" foreach keys %common;

输出

andrew@gidget:comparefiles$ cat file1.txt 
hello!world
help?!3233
oh no^!!
yes!

andrew@gidget:comparefiles$ cat file2.txt 
hello
help?
oh no
yes

andrew@gidget:comparefiles$ perl comparefiles.pl 
yes
oh no
hello
help?

Here's another solution that reads both files simultaneously (assumes both files have an equal number of lines):

use strict;
use warnings;

our $INVALID = '!\^'; #regexp character class, must escape

my $fileA = "file1.txt";
my $fileB = "file2.txt";

sub readl
{
  my $fh = shift;
  my $ln = "";

  if ($fh and $ln = <$fh>)
  {
    chomp $ln;
    $ln =~ s/[$INVALID]+.*//g;
  }

  return $ln;
}

my ($fhA, $fhB);
my ($wdA, $wdB);
my %common = ();

open $fhA, $fileA or die "$!\n";
open $fhB, $fileB or die "$!\n";

while ($wdA = readl($fhA) and $wdB = readl($fhB))
{
  $common{$wdA} = undef if $wdA eq $wdB;
}

print "$_\n" foreach keys %common;

Output

andrew@gidget:comparefiles$ cat file1.txt 
hello!world
help?!3233
oh no^!!
yes!

andrew@gidget:comparefiles$ cat file2.txt 
hello
help?
oh no
yes

andrew@gidget:comparefiles$ perl comparefiles.pl 
yes
oh no
hello
help?
情魔剑神 2025-01-15 10:24:07

首先将可重用段打包到子例程中:

sub read_file {
    open my $fh, "<", $_[0] or die "read_file($_[0]) error: $!";
      # lexical handles auto-close when they fall out of scope
      # and detailed error messages are good
    my %file;
    while (my $line = <$fh>) {
        chomp $line;          # remove newline
        $line =~ s{[!^].*}{}; # remove everything starting from ! or ^
        $file{$line}++;
    }
    \%file
}

read_file 获取输入文件名并返回任何 !^ 字符之前的行段的哈希值。每条线段都是一个键,值是它出现的次数。

使用此功能,下一步是找出文件之间匹配的行:

my ($fileA, $fileB) = map {read_file $_} your_file_names_here();

my %common;
$fileA{$_} and $common{$_}++ for keys %$fileB;

print "common: $_\n" for keys %common;

将打印:

common: yes
common: oh no
common: hello
common: help?

如果您想测试它,可以按如下方式定义 your_file_names_here

sub your_file_names_here {\(<<'/A', <<'/B')}
hello!world
help?!3233
oh no^!!
yes!
/A
hello
help?
oh no
yes
/B

Start by packaging up reusable segments into subroutines:

sub read_file {
    open my $fh, "<", $_[0] or die "read_file($_[0]) error: $!";
      # lexical handles auto-close when they fall out of scope
      # and detailed error messages are good
    my %file;
    while (my $line = <$fh>) {
        chomp $line;          # remove newline
        $line =~ s{[!^].*}{}; # remove everything starting from ! or ^
        $file{$line}++;
    }
    \%file
}

read_file takes an input file name and returns a hash of the line segments before any ! or ^ characters. Each line segment is a key, and the value is the number of times it appeared.

Using this, the next step is to figure out which lines match between files:

my ($fileA, $fileB) = map {read_file $_} your_file_names_here();

my %common;
$fileA{$_} and $common{$_}++ for keys %$fileB;

print "common: $_\n" for keys %common;

Which will print:

common: yes
common: oh no
common: hello
common: help?

You could define your_file_names_here as follows if you wanted to test it:

sub your_file_names_here {\(<<'/A', <<'/B')}
hello!world
help?!3233
oh no^!!
yes!
/A
hello
help?
oh no
yes
/B
写给空气的情书 2025-01-15 10:24:07

您可以使用正则表达式字符类 s/[?^]//g 删除 ^ 和 ?,注意 ^ 需要是组中的最后一个,否则需要对其进行转义。 (转义它可能更安全,以防您稍后添加其他字符,这样它们就不会被否定)。

我处理所有文件,使用哈希来计算该单词存在于哪个文件中。

为了比较差异,我使用 2**(文件数),因此得到值 2**0=1、2**1=2、2**2=4,依此类推。我用来显示字符串属于哪个文件。如果它们存在于所有文件中,则它们将等于总文件数,因此在本例中为 2 - 3 (2+1) 表示它们在两个文件中,1 表示仅 FileA,2 表示 FileB。您可以通过按位(&)来检查这一点。

编辑:添加测试条件

<!-- language: perl -->

my @files = qw(FileA.txt FileB.txt);
my %words;
foreach my $i (0 .. $#files) {
    my $file = $files[$i];
    open(FILE,$file) or die "Error: missing file $file\n$!\n";
    while (<FILE>) {
        chomp;
        next if /^$/;
        my ($word) = split /[!\^]/;
        $word =~ s/[?\^]//g; # removes ^ and ?
        $words{$word} += 2**$i;
    }
    close(FILE);
}

my %common;
foreach my $key (sort keys %words) {
    my @found;
    foreach my $i (0 .. $#files) {
        if ( $words{$key} & 2**$i ) { push @found, $files[$i] }
    }
    if ( $words{$key} & 2**$#files ) { $common{$key}++ }
    printf "%10s %d: @found\n",$key,$words{$key};
}

my @tests = qw(hello^vsd chuck help? test marymary^);
print "\nTesting Words: @tests\n";
foreach (@tests) {
    my ($word) = split /[!\^]/;
    $word =~ s/[?\^]//g; # removes ^ and ?
    if ( exists $common{ $word } ) {
        print "Found: $word\n";
    }
    else {
        print "Cannot find: $word\n";
    }
}

输出:

    bahbah 2: FileB.txt
   chucker 1: FileA.txt
     hello 3: FileA.txt FileB.txt
      help 3: FileA.txt FileB.txt
  marymary 2: FileB.txt
     oh no 3: FileA.txt FileB.txt
      test 1: FileA.txt
       yes 3: FileA.txt FileB.txt

Testing Words: hello^vsd chuck help? test marymary^
Found: hello
Cannot find: chuck
Found: help
Cannot find: test
Found: marymary

You can use regex character classes s/[?^]//g to remove ^ and ?, note that the ^ needs to be the last in the group, or you need to escape it. (might be safer to escape it, in case you add other characters later, so they don't get negated).

I process all the files, using the hash to calculate which file the word exists.

To compare the differences, I use 2**( # of file) so I get values 2**0=1, 2**1=2, 2**2=4, and so on. I use to show which file the strings belong to. If they exist in all they will be equal the total files, so 2 in this case - 3 (2+1) means they are in both files, 1 means FileA only, 2 means FileB. You check this by doing bitwise and (&).

Edit: added the test conditions

<!-- language: perl -->

my @files = qw(FileA.txt FileB.txt);
my %words;
foreach my $i (0 .. $#files) {
    my $file = $files[$i];
    open(FILE,$file) or die "Error: missing file $file\n$!\n";
    while (<FILE>) {
        chomp;
        next if /^$/;
        my ($word) = split /[!\^]/;
        $word =~ s/[?\^]//g; # removes ^ and ?
        $words{$word} += 2**$i;
    }
    close(FILE);
}

my %common;
foreach my $key (sort keys %words) {
    my @found;
    foreach my $i (0 .. $#files) {
        if ( $words{$key} & 2**$i ) { push @found, $files[$i] }
    }
    if ( $words{$key} & 2**$#files ) { $common{$key}++ }
    printf "%10s %d: @found\n",$key,$words{$key};
}

my @tests = qw(hello^vsd chuck help? test marymary^);
print "\nTesting Words: @tests\n";
foreach (@tests) {
    my ($word) = split /[!\^]/;
    $word =~ s/[?\^]//g; # removes ^ and ?
    if ( exists $common{ $word } ) {
        print "Found: $word\n";
    }
    else {
        print "Cannot find: $word\n";
    }
}

Output:

    bahbah 2: FileB.txt
   chucker 1: FileA.txt
     hello 3: FileA.txt FileB.txt
      help 3: FileA.txt FileB.txt
  marymary 2: FileB.txt
     oh no 3: FileA.txt FileB.txt
      test 1: FileA.txt
       yes 3: FileA.txt FileB.txt

Testing Words: hello^vsd chuck help? test marymary^
Found: hello
Cannot find: chuck
Found: help
Cannot find: test
Found: marymary
和影子一齐双人舞 2025-01-15 10:24:07

首先我们必须规范您的输入。下面的代码为每个路径创建一个哈希值。对于给定文件中的每一行,删除以第一个 !^ 字符开头的所有内容并记录其存在。

sub read_inputs {
  my @result;

  foreach my $path (@_) {
    my $data = {};

    open my $fh, "<", $path or die "$0: open $path: $!";
    while (<$fh>) {
      chomp;
      s/[!^].*//;  # don't put the caret first without escaping!
      ++$data->{$_};
    }

    push @result, $data;
  }

  wantarray ? @result : \@result;
}

计算两个数组的交集包含在数据操作 Perl 常见问题解答列表部分。根据您的具体情况调整技术,我们想知道所有输入共有的线路。

sub common {
  my %matches;
  for (@_) {
    ++$matches{$_} for keys %$_;
  }

  my @result = grep $matches{$_} == @_, keys %matches;
  wantarray ? @result : \@result;
}

将其与 结合在一起

my @input = read_inputs "FileA", "FileB";
my @common = common @input;
print "$_\n" for sort @common;

给出输出

hello
help?
oh no
yes

First we must normalize your input. The code below creates one hash for each path. For each line in a given file, remove everything beginning with the first ! or ^ character and record its presence.

sub read_inputs {
  my @result;

  foreach my $path (@_) {
    my $data = {};

    open my $fh, "<", $path or die "$0: open $path: $!";
    while (<$fh>) {
      chomp;
      s/[!^].*//;  # don't put the caret first without escaping!
      ++$data->{$_};
    }

    push @result, $data;
  }

  wantarray ? @result : \@result;
}

Computing the intersection of two arrays is covered in the Data Manipulation section of the Perl FAQ list. Adapting the technique to your situation, we want to know the lines that are common to all inputs.

sub common {
  my %matches;
  for (@_) {
    ++$matches{$_} for keys %$_;
  }

  my @result = grep $matches{$_} == @_, keys %matches;
  wantarray ? @result : \@result;
}

Tying it together with

my @input = read_inputs "FileA", "FileB";
my @common = common @input;
print "$_\n" for sort @common;

gives output of

hello
help?
oh no
yes
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文