当前位置：文江博客话题详情

groupby，同意字符串并返回独特的值

发布于 2025-02-11 13:46:15 字数 3231 浏览 2 评论 0原文

如何添加一个新的汇总数据列，

我想在DataFrame

列中创建03个新列01：unique_list

在cfop_code的唯一值的唯一值中创建一个新列

在每个`cfop_code`的唯一值键>键列02：unique_count

一列检查在unique_list中显示的唯一值数

一列检查在unique_list列

03：not_unique_count
一列检查在unique_list中显示的非唯一值

unique_list example_df

	关键	产品	CFOP_CODE
0	12345678901234567890	A	2551
1	12345678901234567890	产品5	2551
4	12345678901234567897	产品	2551
中	12345678901234567897	产品A	检查在
产品	一列	B	显示的非唯一值
5	产品	C	2407

预期结果

	关键	产品	cfop_code	unique_list	unique_count	not_unique_count
0	12345678901234567890	A	2551	2251，3551	2	3
1	12345678901234567890	产品	551	2251	2	3
3	12345678901234567895	2407	2551	B	4	1
1	12345678901234567897	A	2551
	产品	产品		产品，2551	2	2
5	12345678901234567897	产品C	2407	2407，2551	2	2

我尝试

创建的唯一值的列表，

df.groupby('key')["cfop"].unique()

key
12345678901234567890    [2551, 3551]
12345678901234567895          [2551]
12345678901234567897    [2551, 2407]
Name: cfop, dtype: object

使计数不是唯一的值，而不是唯一的值将计数唯一值

df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="size")}).reset_index()

key unique_values
0   12345678901234567890    3
1   12345678901234567895    1
2   12345678901234567897    2

将唯一值添加到数据框架中

df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="nunique")}).reset_index()

key unique_values
0   12345678901234567890    2
1   12345678901234567895    1
2   12345678901234567897    2

，但添加了新列，

df['unique_list'] = df.groupby('key')["cfop"].unique()

df['unique_count'] = df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="nunique")}).reset_index()
df['not_unique_count'] =df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="size")}).reset_index()

原文

How to add a new column of aggregated data

I want to create 03 new columns in a dataframe

Column 01: unique_list

Create a new column in the dataframe of unique values of cfop_code for each key

Column 02: unique_count

A column that check the number of unique values that shows in unique_list

Column 03: not_unique_count

A column that check the number of not unique values that shows in unique_list

example_df

	key	product	cfop_code
0	12345678901234567890	product a	2551
1	12345678901234567890	product b	2551
2	12345678901234567890	product c	3551
3	12345678901234567895	product a	2551
4	12345678901234567897	product b	2551
5	12345678901234567897	product c	2407

Expected Result

	key	product	cfop_code	unique_list	unique_count	not_unique_count
0	12345678901234567890	product a	2551	2251, 3551	2	3
1	12345678901234567890	product b	2551	2251, 3551	2	3
2	12345678901234567890	product c	3551	2251, 3551	2	3
3	12345678901234567895	product a	2551	2251	1	1
4	12345678901234567897	product b	2551	2407, 2551	2	2
5	12345678901234567897	product c	2407	2407, 2551	2	2

What i had tried

Create a list of unique values

df.groupby('key')["cfop"].unique()

key
12345678901234567890    [2551, 3551]
12345678901234567895          [2551]
12345678901234567897    [2551, 2407]
Name: cfop, dtype: object

Getting the count not unique values

df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="size")}).reset_index()

key unique_values
0   12345678901234567890    3
1   12345678901234567895    1
2   12345678901234567897    2

Getting the count unique values into data frame

df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="nunique")}).reset_index()

key unique_values
0   12345678901234567890    2
1   12345678901234567895    1
2   12345678901234567897    2

But FAIL adding a new column

df['unique_list'] = df.groupby('key')["cfop"].unique()

df['unique_count'] = df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="nunique")}).reset_index()
df['not_unique_count'] =df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="size")}).reset_index()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伊面 2025-02-18 13:46:15

尝试：

tmp = (
    df.groupby("key")["cfop_code"]
    .agg(
        unique_list = lambda s: sorted(s.unique()), 
        unique_count = "nunique", 
        not_unique_count = "size")
    .reset_index()
)
res = df.merge(tmp, on="key")

print(res)
                    key    product  cfop_code   unique_list  unique_count  not_unique_count
0  12345678901234567890  product a       2551  [2551, 3551]             2                 3
1  12345678901234567890  product b       2551  [2551, 3551]             2                 3
2  12345678901234567890  product c       3551  [2551, 3551]             2                 3
3  12345678901234567895  product a       2551        [2551]             1                 1
4  12345678901234567897  product b       2551  [2407, 2551]             2                 2
5  12345678901234567897  product c       2407  [2407, 2551]             2                 2

您尝试的问题是：

df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop_code', aggfunc="nunique")}).reset_index()

返回数据框。您尝试将整个数据框架分配给失败的新列。

Try:

tmp = (
    df.groupby("key")["cfop_code"]
    .agg(
        unique_list = lambda s: sorted(s.unique()), 
        unique_count = "nunique", 
        not_unique_count = "size")
    .reset_index()
)
res = df.merge(tmp, on="key")

print(res)
                    key    product  cfop_code   unique_list  unique_count  not_unique_count
0  12345678901234567890  product a       2551  [2551, 3551]             2                 3
1  12345678901234567890  product b       2551  [2551, 3551]             2                 3
2  12345678901234567890  product c       3551  [2551, 3551]             2                 3
3  12345678901234567895  product a       2551        [2551]             1                 1
4  12345678901234567897  product b       2551  [2407, 2551]             2                 2
5  12345678901234567897  product c       2407  [2407, 2551]             2                 2

The problem with your attempt is that:

df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop_code', aggfunc="nunique")}).reset_index()

returns a DataFrame.You try to assign this whole DataFrame to a new column which fails.

回复收藏 0 原文

风流物 2025-02-18 13:46:15

您可以在小组之后合并，并且agg类似：

df.merge(df.groupby('key',as_index=False).agg(
   unique_list = ('cfop_code', 'unique'),
   unique_count = ('cfop_code', 'nunique'),
   not_unique_count = ('cfop_code', 'size')
), on='key', how = 'left')

输出：

                    key    product  cfop_code   unique_list  unique_count  \
0  12345678901234567890  product a       2551  [2551, 3551]             2   
1  12345678901234567890  product b       2551  [2551, 3551]             2   
2  12345678901234567890  product c       3551  [2551, 3551]             2   
3  12345678901234567895  product a       2551        [2551]             1   
4  12345678901234567897  product b       2551  [2551, 2407]             2   
5  12345678901234567897  product c       2407  [2551, 2407]             2   

   not_unique_count  
0                 3  
1                 3  
2                 3  
3                 1  
4                 2  
5                 2

You can do merge after group and agg like:

df.merge(df.groupby('key',as_index=False).agg(
   unique_list = ('cfop_code', 'unique'),
   unique_count = ('cfop_code', 'nunique'),
   not_unique_count = ('cfop_code', 'size')
), on='key', how = 'left')

output:

                    key    product  cfop_code   unique_list  unique_count  \
0  12345678901234567890  product a       2551  [2551, 3551]             2   
1  12345678901234567890  product b       2551  [2551, 3551]             2   
2  12345678901234567890  product c       3551  [2551, 3551]             2   
3  12345678901234567895  product a       2551        [2551]             1   
4  12345678901234567897  product b       2551  [2551, 2407]             2   
5  12345678901234567897  product c       2407  [2551, 2407]             2   

   not_unique_count  
0                 3  
1                 3  
2                 3  
3                 1  
4                 2  
5                 2

回复收藏 0 原文

~没有更多了~