关于 perl 中的字符串比较和引用的问题
这就是我开始的地方。我正在使用 while 循环从数据库中一次读取一个数组。我想从数据库中提取重复的元素(在某些字段上)。我只想保留这些字段中唯一的项目。然后我想以某种方式打印出我保存的数据。我创建了我认为可以做到的代码,但它给了我一切,包括在现场重复的项目。我一直在寻找又寻找,但我无法弄清楚,我在想,作为一个 perl 菜鸟,我错过了一些简单的东西。代码如下:
my @uniques = ();
my $output;
while (my @itemArray = $sth->fetchrow_array() ) {
my $duplicateFlag = 0;
foreach (@uniques){
if( ($itemArray[3] eq "$_->[3]") and ($itemArray[4] eq "$_->[4]")
and ($itemArray[5] eq "$_->[5]" ) and ($itemArray[6] eq "$_->[6]" )
and ($itemArray[7] eq "$_->[7]" ) and ($itemArray[8] == "$_->[8]" ) ){
$duplicateFlag = 1;
}
}
if( $duplicateflag == 0){
$refToAdd = \@itemArray;
push(@uniques, $refToAdd);
$output .= "$itemArray[3]" . "\t$itemArray[8]" . "\t$itemArray[5]" . "\t$itemArray[7]\n";
}
}
print $output
Here is where I begin. I am reading arrays from a database one at a time using a while loop. I want to pick up on elements from the database that are duplicates (on certain fields). I want to keep only the items that are unique on these fields. Then I want to print out the data I have kept in a certain way. I created the code I thought would do it, but it gives me everything including items which are duplicates on the field. I've been searching and searching and I can't figure it out, I'm thinking, as a perl noob, I am missing something simple. Code is as follows:
my @uniques = ();
my $output;
while (my @itemArray = $sth->fetchrow_array() ) {
my $duplicateFlag = 0;
foreach (@uniques){
if( ($itemArray[3] eq "$_->[3]") and ($itemArray[4] eq "$_->[4]")
and ($itemArray[5] eq "$_->[5]" ) and ($itemArray[6] eq "$_->[6]" )
and ($itemArray[7] eq "$_->[7]" ) and ($itemArray[8] == "$_->[8]" ) ){
$duplicateFlag = 1;
}
}
if( $duplicateflag == 0){
$refToAdd = \@itemArray;
push(@uniques, $refToAdd);
$output .= "$itemArray[3]" . "\t$itemArray[8]" . "\t$itemArray[5]" . "\t$itemArray[7]\n";
}
}
print $output
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
一种可能性:使用哈希值来确定某个项目以前是否见过。对您的代码进行了一些简化:
好吧,它非常简化,但您明白了。通过使用带有我想要验证的值是否唯一的哈希值,我可以避免双循环和 O2算法效率。 (天哪!大学里的这些年终于得到了回报!)。
您可能希望通过组合要搜索重复项的所有字段来使用更复杂的哈希键。也许是这样的:
主要的事情是避免一次又一次地循环所有唯一的项目,如果你可以将它们存储在哈希中。
One possibility: Use hashes to determine whether or not an item has been seen before. A bit simplified from your code:
Okay, it's very simplified, but you get the idea. By using a hash with the values I want to verify are unique, I can avoid the double loop and the O2 algorithm efficiency. (Dang! All those years in college finally paid off!).
You'll probably want to use a more complex hash key by combining all the fields you want to search for dups on. Maybe something like this:
The main thing is avoiding looping through all unique items again and again if you can store them in a hash.
可能:
应该是:
与所有其他的相匹配。
可以解决您的问题的另一件事是删除“$_->[8]”周围的引号。取决于你的数据是什么。
Possibly:
should be:
to match all the others.
Another thing that may solve your problem is removing the quote marks around "$_->[8]". Depends what your data are.
您得到了所有重复项,因为 $duplicateflag 在第 13 行未定义。使用
use strict; 对脚本运行语法测试。 use warnings;
on 会产生以下警告:如果我们仔细检查您对“that”变量的定义,它会说:
也就是说,您有一个大写的 F,这意味着 $duplicateflag 与 $ 不是同一变量重复标志。检查
undef == 0
仍然会产生真值并导致误报。为了避免此类问题,请始终使用以下命令运行脚本
You are getting all the duplicates because $duplicateflag is undefined at line 13. Running a syntax test on your script with
use strict; use warnings;
on produces the following warning:And if we scrutinize your definition of "that" variable, it says:
Which is to say, you have a capital F, which means $duplicateflag is not the same variable as $duplicateFlag. The check
undef == 0
still produces a true value and causes a false positive.To avoid problems like this, always run your scripts with
SQL
group by
或select unique
是 SQL 数据库保持行唯一的方式。但如果您打算在 Perl 中执行此操作,我同意散列和键是可行的方法。但是,我们建议的任何分隔符也可能存在于数据中。这使您有可能出现不明确的匹配。一种基于散列的方法是明确的,并使用 Perl 的自然结构来界定字段。
这就是为什么我提出以下内容。
这样就消除了临时变量。因此消除了临时变量拼写错误的结果。然而,USUW 在以下方面效果更好:USUW="
use strict; use warnings;
"。SQL
group by
orselect distinct
is the SQL database way of keeping rows unique.But if you're going to do this in Perl, I agree that hashes and keys are the way to go. However, any delimiter we could suggest, might also be there in the data. That gives you the potential for an ambiguous match. One hash-based method is unequivocal and uses Perl's natural structures to delimit your fields.
That is why I present the following.
That would have eliminated the temporary variable. And so eliminated the result of misspelling a temporary variable. However, USUW works better for these things: USUW="
use strict; use warnings;
".