替换groupby后的值
我有一个杂货店记录的数据框:
df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
columns=['customer', 'product'])
|客户 |产品 | | -------- | --------| |汤姆|苹果1| |汤姆|香蕉35| |杰夫|梨0| 我想要获取客户曾经购买过的所有产品,因此我使用了
product_by_customer = df.groupby('customer')['product'].unique()
product_by_customer
客户 | |
---|---|
Jeff | [pear0] |
Tom | [apple1,banana35] |
我想去掉产品名称后面的数字。我尝试过
product_by_customer.str.replace('[0-9]', '')
,但它用 NaN 替换了所有内容。
我想要的输出是 |客户|| |--------|--------| |杰夫|梨| |汤姆|苹果、香蕉|
任何帮助表示赞赏!
I have a data frame of a grocery store record:
df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
columns=['customer', 'product'])
| customer | product |
| -------- | --------|
| Tom| apple1|
| Tom| banana35|
|Jeff| pear0|
I want to get all the products that a customer ever bought, so I used
product_by_customer = df.groupby('customer')['product'].unique()
product_by_customer
customer | |
---|---|
Jeff | [pear0] |
Tom | [apple1, banana35] |
I want to get rid of the numbers after the product name. I tried
product_by_customer.str.replace('[0-9]', '')
but it replaced everything by NaN.
My desired output is
|customer||
|--------|--------|
|Jeff|pear|
|Tom|apple, banana|
Any help is appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
产品列中的值采用 nd 数组类型。因此没有发生更换。尝试以下代码。
我们正在做的是使用 lambda 函数来访问每个数组值,然后替换数字。
The values in the product column are in type nd array. Hence the replacement isnt taking place. Try the following code.
What we are doing is using the lambda function we will access each of the array value and then replace the digits.
out :
make copy df 并在 groupby 之前进行更改怎么样?
out :
make copy df and how about change before groupby?
您可以先替换然后聚合:
或者使用删除数字进行聚合:
类似的想法是通过 dict.fromkeys 技巧删除可能的重复项:
You can first replace and then aggregate:
Or aggregate with remove numeric:
Similar idea is remove possible duplicates by
dict.fromkeys
trick: