替换groupby后的值

发布于 2025-01-14 16:36:10 字数 781 浏览 1 评论 0原文

我有一个杂货店记录的数据框:

df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
               columns=['customer', 'product'])

|客户 |产品 | | -------- | --------| |汤姆|苹果1| |汤姆|香蕉35| |杰夫|梨0| 我想要获取客户曾经购买过的所有产品,因此我使用了

product_by_customer = df.groupby('customer')['product'].unique()
product_by_customer
客户
Jeff[pear0]
Tom[apple1,banana35]

我想去掉产品名称后面的数字。我尝试过

product_by_customer.str.replace('[0-9]', '')

,但它用 NaN 替换了所有内容。

我想要的输出是 |客户|| |--------|--------| |杰夫|梨| |汤姆|苹果、香蕉|

任何帮助表示赞赏!

I have a data frame of a grocery store record:

df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
               columns=['customer', 'product'])

| customer | product |
| -------- | --------|
| Tom| apple1|
| Tom| banana35|
|Jeff| pear0|
I want to get all the products that a customer ever bought, so I used

product_by_customer = df.groupby('customer')['product'].unique()
product_by_customer
customer
Jeff[pear0]
Tom[apple1, banana35]

I want to get rid of the numbers after the product name. I tried

product_by_customer.str.replace('[0-9]', '')

but it replaced everything by NaN.

My desired output is
|customer||
|--------|--------|
|Jeff|pear|
|Tom|apple, banana|

Any help is appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

小苏打饼 2025-01-21 16:36:12

产品列中的值采用 nd 数组类型。因此没有发生更换。尝试以下代码。

import re

df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
               columns=['customer', 'product'])
df1 = df.groupby(["customer"])["product"].unique().reset_index()
df1["product"] = df1["product"].apply(lambda x: [re.sub("\d","", v ) for v in x])


df1
Out[52]: 
  customer          product
0     Jeff           [pear]
1      Tom  [apple, banana]

我们正在做的是使用 lambda 函数来访问每个数组值,然后替换数字。

The values in the product column are in type nd array. Hence the replacement isnt taking place. Try the following code.

import re

df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
               columns=['customer', 'product'])
df1 = df.groupby(["customer"])["product"].unique().reset_index()
df1["product"] = df1["product"].apply(lambda x: [re.sub("\d","", v ) for v in x])


df1
Out[52]: 
  customer          product
0     Jeff           [pear]
1      Tom  [apple, banana]

What we are doing is using the lambda function we will access each of the array value and then replace the digits.

等风也等你 2025-01-21 16:36:12
df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
               columns=['customer', 'product'])
df1 = df.copy()
df1["product"] = df1["product"].str.replace('[0-9]', '')
product_by_customer = df1.groupby('customer')['product'].unique()
product_by_customer

out :

customer
Jeff             [pear]
Tom     [apple, banana]
Name: product, dtype: object

make copy df 并在 groupby 之前进行更改怎么样?

df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
               columns=['customer', 'product'])
df1 = df.copy()
df1["product"] = df1["product"].str.replace('[0-9]', '')
product_by_customer = df1.groupby('customer')['product'].unique()
product_by_customer

out :

customer
Jeff             [pear]
Tom     [apple, banana]
Name: product, dtype: object

make copy df and how about change before groupby?

莫相离 2025-01-21 16:36:11

您可以先替换然后聚合:

product_by_customer = df["product"].str.replace('[0-9]', '')
    .groupby(df['customer']).unique()

print(product_by_customer)

customer
Jeff             [pear]
Tom     [apple, banana]
Name: product, dtype: object

或者使用删除数字进行聚合:

import re

f = lambda x: [re.sub("[0-9]", "", v) for v in x.unique()]
product_by_customer = df.groupby('customer')['product'].agg(f)

print(product_by_customer)

customer
Jeff             [pear]
Tom     [apple, banana]
Name: product, dtype: object

类似的想法是通过 dict.fromkeys 技巧删除可能的重复项:

f = lambda x: list(dict.fromkeys(x.str.replace('[0-9]', '', regex=True)))
product_by_customer = df.groupby('customer')['product'].agg(f)

print (product_by_customer)

customer
Jeff             [pear]
Tom     [apple, banana]
Name: product, dtype: object

You can first replace and then aggregate:

product_by_customer = df["product"].str.replace('[0-9]', '')
    .groupby(df['customer']).unique()

print(product_by_customer)

customer
Jeff             [pear]
Tom     [apple, banana]
Name: product, dtype: object

Or aggregate with remove numeric:

import re

f = lambda x: [re.sub("[0-9]", "", v) for v in x.unique()]
product_by_customer = df.groupby('customer')['product'].agg(f)

print(product_by_customer)

customer
Jeff             [pear]
Tom     [apple, banana]
Name: product, dtype: object

Similar idea is remove possible duplicates by dict.fromkeys trick:

f = lambda x: list(dict.fromkeys(x.str.replace('[0-9]', '', regex=True)))
product_by_customer = df.groupby('customer')['product'].agg(f)

print (product_by_customer)

customer
Jeff             [pear]
Tom     [apple, banana]
Name: product, dtype: object
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文