我想读取一个带有“!”等符号的文件和“^”,并希望在将它们与另一行中的其他字符串进行比较之前删除它们。如果删除符号后两个字符串相同,我想将它们存储在另一个名为“common”的哈希中。
例如...
FileA:
hello!world
help?!3233
oh no^!!
yes!
FileB:
hello
help?
oh no
yes
在这种情况下,FileA 和 FileB 应该相同,因为我正在比较字符到“!”的位置。或出现“^”。
我使用以下代码读取文件:
open FILEA, "< script/".$fileA or die;
my %read_file;
while (my $line=<FILEA>) {
(my $word1,my $word2) = split /\n/, $line;
$word1 =~ s/(!.+)|(!.*)|(\^.+)|(\^.*)//;#to remove ! and ^
$read_file{$word1} = $word1;
}
close(FILEA);
我打印出了哈希中的键,它显示了正确的结果(即将 FileA 转换为“你好,帮助?,哦不,是的)。但是,当我进行比较时使用以下代码的 FileA 和 FileB ,它总是失败,
while(($key,$value)=each(%config))
{
$num=keys(%base_config);
$num--;#to get the correct index
while($num>=0)
{
$common{$value}=$value if exists $read_file{$key};#stored the correct matches in %common
$num--;
}
}
我尝试使用以下示例来测试我的替换并比较两个字符串,但它不知道为什么它不能从 a 读取字符串到哈希中。 。
use strict;
use warnings;
my $str="hello^vsd";
my $test="hello";
$str =~ s/(!.+)|(!.*)|(\^.+)|(\^.*)//;
my %hash=();
$hash{$str}=();
foreach my $key(keys %hash)
{
print "$key\n";
}
print "yay\n" if exists $hash{$test};
print "boo\n" unless exists $hash{$test};
两个文件可以有不同的行数 搜索时,文本和文本行的顺序不必相同,即“哦不”可以出现在“你好”之前。
I want to read in a file with some symbols like "!" and "^" and would like to remove them before I compare them with other strings from another line. If both strings are the same after removing the symbols, I want to store them in another hash called "common".
For example...
FileA:
hello!world
help?!3233
oh no^!!
yes!
FileB:
hello
help?
oh no
yes
In this case, FileA and FileB should be identical as I am comparing characters up to the place where "!" or "^" appears.
I read the files by using the following code:
open FILEA, "< script/".$fileA or die;
my %read_file;
while (my $line=<FILEA>) {
(my $word1,my $word2) = split /\n/, $line;
$word1 =~ s/(!.+)|(!.*)|(\^.+)|(\^.*)//;#to remove ! and ^
$read_file{$word1} = $word1;
}
close(FILEA);
I printed out the keys in the hash and it shows the correct result (ie. it converts FileA to "hello, help?, oh no, yes). However, when I do a comparison of FileA and FileB using the following code, it always fails.
while(($key,$value)=each(%config))
{
$num=keys(%base_config);
$num--;#to get the correct index
while($num>=0)
{
$common{$value}=$value if exists $read_file{$key};#stored the correct matches in %common
$num--;
}
}
I tried to test my substitution and comparing between 2 strings using the following example and it works. I don't know why is it not working for reading strings into a hash from a file.
use strict;
use warnings;
my $str="hello^vsd";
my $test="hello";
$str =~ s/(!.+)|(!.*)|(\^.+)|(\^.*)//;
my %hash=();
$hash{$str}=();
foreach my $key(keys %hash)
{
print "$key\n";
}
print "yay\n" if exists $hash{$test};
print "boo\n" unless exists $hash{$test};
Both files can have different number of lines of text and the lines of text need not be in the same order when searching. ie. "oh no" can come before "hello".
发布评论
评论(4)
这是同时读取两个文件的另一种解决方案(假设两个文件具有相同的行数):
输出
Here's another solution that reads both files simultaneously (assumes both files have an equal number of lines):
Output
首先将可重用段打包到子例程中:
read_file
获取输入文件名并返回任何!
或^
字符之前的行段的哈希值。每条线段都是一个键,值是它出现的次数。使用此功能,下一步是找出文件之间匹配的行:
将打印:
如果您想测试它,可以按如下方式定义
your_file_names_here
:Start by packaging up reusable segments into subroutines:
read_file
takes an input file name and returns a hash of the line segments before any!
or^
characters. Each line segment is a key, and the value is the number of times it appeared.Using this, the next step is to figure out which lines match between files:
Which will print:
You could define
your_file_names_here
as follows if you wanted to test it:您可以使用正则表达式字符类 s/[?^]//g 删除 ^ 和 ?,注意 ^ 需要是组中的最后一个,否则需要对其进行转义。 (转义它可能更安全,以防您稍后添加其他字符,这样它们就不会被否定)。
我处理所有文件,使用哈希来计算该单词存在于哪个文件中。
为了比较差异,我使用 2**(文件数),因此得到值 2**0=1、2**1=2、2**2=4,依此类推。我用来显示字符串属于哪个文件。如果它们存在于所有文件中,则它们将等于总文件数,因此在本例中为 2 - 3 (2+1) 表示它们在两个文件中,1 表示仅 FileA,2 表示 FileB。您可以通过按位和(&)来检查这一点。
编辑:添加测试条件
输出:
You can use regex character classes s/[?^]//g to remove ^ and ?, note that the ^ needs to be the last in the group, or you need to escape it. (might be safer to escape it, in case you add other characters later, so they don't get negated).
I process all the files, using the hash to calculate which file the word exists.
To compare the differences, I use 2**( # of file) so I get values 2**0=1, 2**1=2, 2**2=4, and so on. I use to show which file the strings belong to. If they exist in all they will be equal the total files, so 2 in this case - 3 (2+1) means they are in both files, 1 means FileA only, 2 means FileB. You check this by doing bitwise and (&).
Edit: added the test conditions
Output:
首先我们必须规范您的输入。下面的代码为每个路径创建一个哈希值。对于给定文件中的每一行,删除以第一个
!
或^
字符开头的所有内容并记录其存在。计算两个数组的交集包含在数据操作 Perl 常见问题解答列表部分。根据您的具体情况调整技术,我们想知道所有输入共有的线路。
将其与 结合在一起
给出输出
First we must normalize your input. The code below creates one hash for each path. For each line in a given file, remove everything beginning with the first
!
or^
character and record its presence.Computing the intersection of two arrays is covered in the Data Manipulation section of the Perl FAQ list. Adapting the technique to your situation, we want to know the lines that are common to all inputs.
Tying it together with
gives output of