如何根据两个文件的子字符串匹配并找到常见?

发布于 2025-02-08 10:05:59 字数 2230 浏览 1 评论 0原文

我有两个文件。 File1包含电子邮件地址列表。 File2包含域列表。

我想在使用perl脚本匹配确切域之后过滤所有电子邮件地址。

我正在使用以下代码,但结果没有正确的结果。

#!/usr/bin/perl 
#use strict;
#use warnings;
use feature 'say';

my $file1 = "/home/user/domain_file" or die " FIle not found\n";
my $file2 = "/home/user/email_address_file" or die " FIle not found\n";

my $match = open(MATCH, ">matching_domain") || die;

open(my $data1, '<', $file1) or die "Could not open '$file1' $!\n";
my @wrd = <$data1>;
chomp @wrd;
# loop on the fiile to be searched
open(my $data2, '<', $file2) or die "Could not open '$file2' $!\n";
while(my $line = <$data2>) {
    chomp $line;
    foreach (@wrd) {
        if($line =~ /\@$_$/) {
            print MATCH "$line\n";
        }
    }
}

file1

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

file2

yahoo.com
gmail.com

预期输出

[email protected]
[email protected]

I have two files. File1 contains list of email addresses. File2 contains list of domains.

I want to filter out all the email addresses after matching exact domain using Perl script.

I am using below code, but I don't get correct result.

#!/usr/bin/perl 
#use strict;
#use warnings;
use feature 'say';

my $file1 = "/home/user/domain_file" or die " FIle not found\n";
my $file2 = "/home/user/email_address_file" or die " FIle not found\n";

my $match = open(MATCH, ">matching_domain") || die;

open(my $data1, '<', $file1) or die "Could not open '$file1' $!\n";
my @wrd = <$data1>;
chomp @wrd;
# loop on the fiile to be searched
open(my $data2, '<', $file2) or die "Could not open '$file2' $!\n";
while(my $line = <$data2>) {
    chomp $line;
    foreach (@wrd) {
        if($line =~ /\@$_$/) {
            print MATCH "$line\n";
        }
    }
}

File1

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

File2

yahoo.com
gmail.com

Expected output

[email protected]
[email protected]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

末が日狂欢 2025-02-15 10:05:59

首先,由于您似乎在 *nix上,因此您可能需要查看 grep -f可以从给定文件中获取搜索模式。我不是grep的专家,但是我会尝试该文件和“匹配整个单词”,这应该很容易。

第二:您的perl代码可以改进,但它可以按预期进行工作。如果您将电子邮件和域放在文件中,如您的代码所示。可能是您将文件混合在一起。

如果我运行您的代码,仅修复路径并将域保持在File1中,它确实会创建文件matching_domain,并且包含您的预期输出:

[email protected]
[email protected]

所以我不知道您认为您的问题是什么(因为你没有说)。也许您期望它将输出打印到终端。无论哪种方式,它都可以工作,但是可以解决一些问题。

#use strict;
#use warnings;

删除这两个是一个巨大的错误。您在编码Perl时会犯的最大错误。它不会删除您的错误,只需隐藏它们即可。您将花费大量时间修复时间。将其作为解决此问题的第一件事。

use feature 'say';

您永远不会使用此。例如,您可以用打印匹配“ $ line \ n”说匹配$ line,这更简洁。

my $file1 = "/home/user/domain_file" or die " FIle not found\n";
my $file2 = "/home/user/email_address_file" or die " FIle not found\n";

这是不正确的。您正在为变量的创建提供条件。如果条件失败,是否存在该变量?不要这样做。我认为这是检查文件是否存在,但这不是这样做的。要检查文件是否存在,您可以使用-e,记录为 <代码> perldoc“ -x” (各种文件测试)。

此外,就Perl条件而言,以字符串形式的语句,“/home/code>”是正确的(“真实”)。仅当它是“ 0”(零),“”(empty)或undef(undefined)时,它仅是错误的。因此,您的条款将永远不会执行。例如“ foo”或die永远不会死。

最后,此测试是毫无意义的,因为您将在以后的Open语句中对此进行测试。如果文件不存在,则开放将失败,您的程序将die

my $match = open(MATCH, ">matching_domain") || die;

这也是非常不正确的。首先,您永远不会使用$ MATD变量。其次,我敢打赌,它不包含您认为的作用。 (它包含一个布尔值,该布尔值表示open是否成功,请参见 perldoc-- f打开)第三,再一次,不要在我的声明变量上放置条件,这是一个坏主意。

该语句的确意味着$ match将包含open的返回值,或die die的返回值。这可能简单地说:

open my $match, ">", "matching_domain" or die "Cannot open '$match': $!;

另外,使用明确的打开模式使用三个参数打开,并像其他地方一样使用词汇文件句柄。

除了我已经为您带来的所有内容外,还有一件事:我不建议这样的小程序来硬编码输出文件。如果要重定向输出,请使用shell重定向:perl foo.pl&gt; output.txt。我认为这是促使您认为代码出现问题的原因:您看不到输出。

除此之外,您的代码还可以,如我所知。您可能需要chomp域文件中的行,但这无关紧要。还请记住,凹痕是一件好事,它可以帮助您阅读代码。我在评论中提到了这一点,但由于某种原因被删除。这很重要。

祝你好运!

First off, since you seem to be on *nix, you might want to check out grep -f, which can take search patterns from a given file. I'm no expert in grep, but I would try the file and "match whole words" and this should be fairly easy.

Second: Your Perl code can be improved, but it works as expected. If you put the emails and domains in the files as indicated by your code. It may be that you have mixed the files up.

If I run your code, fixing only the paths, and keeping the domains in file1, it does create the file matching_domain and it contains your expected output:

[email protected]
[email protected]

So I don't know what you think your problem is (because you did not say). Maybe you were expecting it to print output to the terminal. Either way, it does work, but there are things to fix.

#use strict;
#use warnings;

It is a huge mistake to remove these two. Biggest mistake you will ever do while coding Perl. It will not remove your errors, just hide them. You will spend 10 times as much time bug fixing. Uncomment this as your first thing you do to fix this.

use feature 'say';

You never use this. You could for example replace print MATCH "$line\n" with say MATCH $line, which is slightly more concise.

my $file1 = "/home/user/domain_file" or die " FIle not found\n";
my $file2 = "/home/user/email_address_file" or die " FIle not found\n";

This is very incorrect. You are placing a condition on the creation of a variable. If the condition fails, does the variable exist? Don't do this. I assume this is to check if the file exists, but that is not what this does. To check if a file exists, you can use -e, documented as perldoc "-X" (various file tests).

Furthermore, a statement in the form of a string, "/home/user..." is TRUE ("truthy"), as far as Perl conditions are concerned. It is only false if it is "0" (zero), "" (empty) or undef (undefined). So your or clause will never be executed. E.g. "foo" or die will never die.

Lastly, this test is quite meaningless, as you will be testing this in your open statement later on anyway. If the file does not exist, the open will fail and your program will die.

my $match = open(MATCH, ">matching_domain") || die;

This is also very incorrect. First off, you never use the $match variable. Secondly, I bet it does not contain what you think it does. (it contains a boolean which states whether open was successful or not, see perldoc -f open) Thirdly, again, don't put conditions on my declarations of variables, it is a bad idea.

What this statement really means is that $match will contain either the return value of the open, or the return value of die. This should probably be simply:

open my $match, ">", "matching_domain" or die "Cannot open '$match': $!;

Also, use the three argument open with explicit open MODE, and use lexical file handles, like you have done elsewhere.

And one more thing on top of all the stuff I've already badgered you with: I don't recommend hard coding output files for small programs like this. If you want to redirect the output, use shell redirection: perl foo.pl > output.txt. I think this is what has prompted you to think something is wrong with your code: You don't see the output.

Other than that, your code is fine, as near as I can tell. You may want to chomp the lines from the domain file, but it should not matter. Also remember that indentation is a good thing, and it helps you read your code. I mentioned this in a comment, but it was removed for some reason. It is important though.

Good luck!

深陷 2025-02-15 10:05:59

假设标记为file1的行在文件中指向$ file1,并且标记为file2的行在文件中指向by <代码> $ file2 。

您已交换变量。您想将$ line中的内容与$ _匹配,而不是相反:

# loop on the file to be searched
open( my $data2, '<', $file2 ) or die "Could not open '$file2' $!\n";
while ( my $line = <$data2> ) {
    chomp $line;
    foreach (@wrd) {
        if (/\@$line$/) {
            print MATCH "$_\n";
        }
    }
}

您应该取消征询警告严格行:

use strict;
use warnings;

警告向您显示或DIE检查并未真正按照您在文件名称分配语句中的方式工作。只需使用:

my $file1 = "/home/user/domain_file";
my $file2 = "/home/user/email_address_file";

您已经在进行检查的位置(Open)。

This assumes that the lines labeled File1 are in the file pointed to by $file1 and the lines labeled File2 are in the file pointed to by $file2.

You have your variables swapped. You want to match what is in $line against $_, not the other way around:

# loop on the file to be searched
open( my $data2, '<', $file2 ) or die "Could not open '$file2' $!\n";
while ( my $line = <$data2> ) {
    chomp $line;
    foreach (@wrd) {
        if (/\@$line$/) {
            print MATCH "$_\n";
        }
    }
}

You should un-comment the warnings and strict lines:

use strict;
use warnings;

warnings shows you that the or die checks are not really working the way you intended in the file name assignment statements. Just use :

my $file1 = "/home/user/domain_file";
my $file2 = "/home/user/email_address_file";

You are already doing the checks where they belong (on open).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文