文章来源于网络收集而来，版权归原创者所有，如有侵权请及时联系！

solution / 2800-2899 / 2882.Drop Duplicate Rows / README_EN

发布于 2024-06-17 01:02:59 字数 2191 浏览 0 评论 0 收藏 0

Description

DataFrame customers
+-------------+--------+
| Column Name | Type   |
+-------------+--------+
| customer_id | int  |
| name    | object |
| email     | object |
+-------------+--------+

There are some duplicate rows in the DataFrame based on the email column.

Write a solution to remove these duplicate rows and keep only the first occurrence.

The result format is in the following example.

Example 1:
Input:
+-------------+---------+---------------------+
| customer_id | name  | email         |
+-------------+---------+---------------------+
| 1       | Ella  | emily@example.com   |
| 2       | David   | michael@example.com |
| 3       | Zachary | sarah@example.com   |
| 4       | Alice   | john@example.com  |
| 5       | Finn  | john@example.com  |
| 6       | Violet  | alice@example.com   |
+-------------+---------+---------------------+
Output:  
+-------------+---------+---------------------+
| customer_id | name  | email         |
+-------------+---------+---------------------+
| 1       | Ella  | emily@example.com   |
| 2       | David   | michael@example.com |
| 3       | Zachary | sarah@example.com   |
| 4       | Alice   | john@example.com  |
| 6       | Violet  | alice@example.com   |
+-------------+---------+---------------------+
Explanation:
Alic (customer_id = 4) and Finn (customer_id = 5) both use john@example.com, so only the first occurrence of this email is retained.

Solutions

Solution 1

import pandas as pd


def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:
  return customers.drop_duplicates(subset=['email'])

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据