当前位置：文江博客话题详情

查找重复项（正则表达式）

发布于 2024-09-25 02:08:38 字数 140 浏览 10 评论 0原文

我有一个 CSV，其中包含 500 名成员及其电话号码的列表。我尝试了 diff 工具，但似乎没有一个能找到重复项。

我可以使用正则表达式按会员电话号码查找重复行吗？

我在 Mac 上使用 Textmate。

非常感谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

还不是爱你 2024-10-02 02:08:39

您正在寻找哪些重复项？整条线路还是只是同一个电话号码？

如果是整行，请尝试以下操作：

sort phonelist.txt | uniq -c | sort -n

您将在底部看到多次出现的所有行。

如果只是某列中的电话号码，则使用以下方法：

awk -F ';' '{print $4}' phonelist.txt | uniq -c | sort -n

将“4”替换为包含电话号码和“;”的列的编号与您在文件中使用的真实分隔符。

或者给我们一些来自该文件的示例行。

编辑：

如果数据格式为：name,mobile,phone,uniqueid,group，则

awk -F ',' '{print $3}' phonelist.txt | uniq -c | sort -n

在命令行中使用以下命令。

What duplicates are you searching for? The whole lines or just the same phone number?

If it is the whole line, then try this:

sort phonelist.txt | uniq -c | sort -n

and you will see at the bottom all lines, that occur more than once.

If it is just the phone number in some column, then use this:

awk -F ';' '{print $4}' phonelist.txt | uniq -c | sort -n

replace the '4' with the number of the column with the phone number and the ';' with the real separator you are using in your file.

Or give us a few example lines from this file.

EDIT:

If the data format is: name,mobile,phone,uniqueid,group, then use the following:

awk -F ',' '{print $3}' phonelist.txt | uniq -c | sort -n

in the command line.

回复收藏 0 原文

攒一口袋星星 2024-10-02 02:08:39

是的。要了解一种方法，请查看此处。但您可能不想这样做。

回复收藏 0 原文

凶凌 2024-10-02 02:08:39

您通常可以解析该文件，并检查哪些行是重复的。我认为 RAGEX 是这个问题最糟糕的解决方案。

回复收藏 0 原文

时常饿 2024-10-02 02:08:39

您使用什么语言？在 .NET 中，您可以轻松地将 CSV 文件加载到 DataTable 中并查找/删除重复行。然后，将 DataTable 写回另一个 CSV 文件。

哎呀，您可以将此文件加载到 Excel 中并按字段排序并手动查找重复项。 500 并不算多。

回复收藏 0 原文

过去的过去 2024-10-02 02:08:39

使用 PERL。

将 CSV 文件加载到数组中，并匹配要检查重复项的列（电话号码），然后将值存储到另一个数组中，然后检查该数组中的重复项，使用：

my %seen;
my @unique = grep !$seen{$_}++, @array2;

之后，您需要做的就是将唯一数组（电话号码）加载到 for 循环中，并在其中将 array#1(lines) 加载到 for 循环中。比较唯一数组中的电话号码，如果匹配，则将该行输出到另一个 csv 文件中。

use PERL.

Load the CSV file into an array, and match the column you want to check (phone numbers) for duplicates, then store the values into another array, then check for duplicates in that array, using:

my %seen;
my @unique = grep !$seen{$_}++, @array2;

After that, all you need to do is load the unique array(phone numbers) into a for loop, and inside it load array#1(lines) into a for loop. Compare the phone number in the unique array, and if it matches, output that line into another csv file.

回复收藏 0 原文

~没有更多了~