如何根据两个文件的子字符串匹配并找到常见?
我有两个文件。 File1包含电子邮件地址列表。 File2包含域列表。
我想在使用perl脚本匹配确切域之后过滤所有电子邮件地址。
我正在使用以下代码,但结果没有正确的结果。
#!/usr/bin/perl
#use strict;
#use warnings;
use feature 'say';
my $file1 = "/home/user/domain_file" or die " FIle not found\n";
my $file2 = "/home/user/email_address_file" or die " FIle not found\n";
my $match = open(MATCH, ">matching_domain") || die;
open(my $data1, '<', $file1) or die "Could not open '$file1' $!\n";
my @wrd = <$data1>;
chomp @wrd;
# loop on the fiile to be searched
open(my $data2, '<', $file2) or die "Could not open '$file2' $!\n";
while(my $line = <$data2>) {
chomp $line;
foreach (@wrd) {
if($line =~ /\@$_$/) {
print MATCH "$line\n";
}
}
}
file1
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
file2
yahoo.com
gmail.com
预期输出
[email protected]
[email protected]
I have two files. File1 contains list of email addresses. File2 contains list of domains.
I want to filter out all the email addresses after matching exact domain using Perl script.
I am using below code, but I don't get correct result.
#!/usr/bin/perl
#use strict;
#use warnings;
use feature 'say';
my $file1 = "/home/user/domain_file" or die " FIle not found\n";
my $file2 = "/home/user/email_address_file" or die " FIle not found\n";
my $match = open(MATCH, ">matching_domain") || die;
open(my $data1, '<', $file1) or die "Could not open '$file1' $!\n";
my @wrd = <$data1>;
chomp @wrd;
# loop on the fiile to be searched
open(my $data2, '<', $file2) or die "Could not open '$file2' $!\n";
while(my $line = <$data2>) {
chomp $line;
foreach (@wrd) {
if($line =~ /\@$_$/) {
print MATCH "$line\n";
}
}
}
File1
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
File2
yahoo.com
gmail.com
Expected output
[email protected]
[email protected]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,由于您似乎在 *nix上,因此您可能需要查看
grep -f
可以从给定文件中获取搜索模式。我不是
grep
的专家,但是我会尝试该文件和“匹配整个单词”,这应该很容易。第二:您的perl代码可以改进,但它可以按预期进行工作。如果您将电子邮件和域放在文件中,如您的代码所示。可能是您将文件混合在一起。
如果我运行您的代码,仅修复路径并将域保持在File1中,它确实会创建文件
matching_domain
,并且包含您的预期输出:所以我不知道您认为您的问题是什么(因为你没有说)。也许您期望它将输出打印到终端。无论哪种方式,它都可以工作,但是可以解决一些问题。
删除这两个是一个巨大的错误。您在编码Perl时会犯的最大错误。它不会删除您的错误,只需隐藏它们即可。您将花费大量时间修复时间。将其作为解决此问题的第一件事。
您永远不会使用此。例如,您可以用
打印匹配“ $ line \ n”
用说匹配$ line
,这更简洁。这是不正确的。您正在为变量的创建提供条件。如果条件失败,是否存在该变量?不要这样做。我认为这是检查文件是否存在,但这不是这样做的。要检查文件是否存在,您可以使用
-e
,记录为 <代码> perldoc“ -x” (各种文件测试)。此外,就Perl条件而言,以字符串形式的语句,
“/home/code>”是正确的(“真实”)。仅当它是
“ 0”
(零),“”
(empty)或undef
(undefined)时,它仅是错误的。因此,您的或
条款将永远不会执行。例如“ foo”或die
永远不会死。最后,此测试是毫无意义的,因为您将在以后的
Open
语句中对此进行测试。如果文件不存在,则开放将失败,您的程序将die
。这也是非常不正确的。首先,您永远不会使用
$ MATD
变量。其次,我敢打赌,它不包含您认为的作用。 (它包含一个布尔值,该布尔值表示open
是否成功,请参见 perldoc-- f打开)第三,再一次,不要在我的
声明变量上放置条件,这是一个坏主意。该语句的确意味着
$ match
将包含open
的返回值,或die die
的返回值。这可能简单地说:另外,使用明确的打开模式使用三个参数打开,并像其他地方一样使用词汇文件句柄。
除了我已经为您带来的所有内容外,还有一件事:我不建议这样的小程序来硬编码输出文件。如果要重定向输出,请使用shell重定向:
perl foo.pl&gt; output.txt
。我认为这是促使您认为代码出现问题的原因:您看不到输出。除此之外,您的代码还可以,如我所知。您可能需要
chomp
域文件中的行,但这无关紧要。还请记住,凹痕是一件好事,它可以帮助您阅读代码。我在评论中提到了这一点,但由于某种原因被删除。这很重要。祝你好运!
First off, since you seem to be on *nix, you might want to check out
grep -f
, which can take search patterns from a given file. I'm no expert ingrep
, but I would try the file and "match whole words" and this should be fairly easy.Second: Your Perl code can be improved, but it works as expected. If you put the emails and domains in the files as indicated by your code. It may be that you have mixed the files up.
If I run your code, fixing only the paths, and keeping the domains in file1, it does create the file
matching_domain
and it contains your expected output:So I don't know what you think your problem is (because you did not say). Maybe you were expecting it to print output to the terminal. Either way, it does work, but there are things to fix.
It is a huge mistake to remove these two. Biggest mistake you will ever do while coding Perl. It will not remove your errors, just hide them. You will spend 10 times as much time bug fixing. Uncomment this as your first thing you do to fix this.
You never use this. You could for example replace
print MATCH "$line\n"
withsay MATCH $line
, which is slightly more concise.This is very incorrect. You are placing a condition on the creation of a variable. If the condition fails, does the variable exist? Don't do this. I assume this is to check if the file exists, but that is not what this does. To check if a file exists, you can use
-e
, documented asperldoc "-X"
(various file tests).Furthermore, a statement in the form of a string,
"/home/user..."
is TRUE ("truthy"), as far as Perl conditions are concerned. It is only false if it is"0"
(zero),""
(empty) orundef
(undefined). So youror
clause will never be executed. E.g."foo" or die
will never die.Lastly, this test is quite meaningless, as you will be testing this in your
open
statement later on anyway. If the file does not exist, the open will fail and your program willdie
.This is also very incorrect. First off, you never use the
$match
variable. Secondly, I bet it does not contain what you think it does. (it contains a boolean which states whetheropen
was successful or not, see perldoc -f open) Thirdly, again, don't put conditions onmy
declarations of variables, it is a bad idea.What this statement really means is that
$match
will contain either the return value of theopen
, or the return value ofdie
. This should probably be simply:Also, use the three argument
open
with explicit open MODE, and use lexical file handles, like you have done elsewhere.And one more thing on top of all the stuff I've already badgered you with: I don't recommend hard coding output files for small programs like this. If you want to redirect the output, use shell redirection:
perl foo.pl > output.txt
. I think this is what has prompted you to think something is wrong with your code: You don't see the output.Other than that, your code is fine, as near as I can tell. You may want to
chomp
the lines from the domain file, but it should not matter. Also remember that indentation is a good thing, and it helps you read your code. I mentioned this in a comment, but it was removed for some reason. It is important though.Good luck!
假设标记为
file1
的行在文件中指向$ file1
,并且标记为file2
的行在文件中指向by <代码> $ file2 。您已交换变量。您想将
$ line
中的内容与$ _
匹配,而不是相反:您应该取消征询
警告
和严格
行:警告
向您显示或DIE
检查并未真正按照您在文件名称分配语句中的方式工作。只需使用:您已经在进行检查的位置(
Open
)。This assumes that the lines labeled
File1
are in the file pointed to by$file1
and the lines labeledFile2
are in the file pointed to by$file2
.You have your variables swapped. You want to match what is in
$line
against$_
, not the other way around:You should un-comment the
warnings
andstrict
lines:warnings
shows you that theor die
checks are not really working the way you intended in the file name assignment statements. Just use :You are already doing the checks where they belong (on
open
).