当前位置：文江博客话题详情

如何在.xlsx中删除重复项并用熊猫将值移至新列

发布于 2025-02-07 18:15:58 字数 356 浏览 0 评论 0原文

我的.xlsx表有多个条目：条目

我尝试实现的目标：结果

因此，我真的被卡住了。我尝试过df.drop_duplicates（），但无济于事。如果您可以分享一些想法或在哪里看，那就太好了！ Tia

P.S.对不起，图片，但由于某种原因，它无法正确渲染。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一绘本一梦想 2025-02-14 18:15:58

我的策略是创建然后将索引设置为触发列连接2个数据范围。一个数据帧仅包含触发， to 和列的代码>类别和说明列。这里的“窍门”是使用 dataframe。 Pivot 将相应的类别和descript值中的值添加到自己的列中。

import pandas as pd
import os

filename = r'logfile.xlsx'
new_filename = r'consolidated_logfile.xlsx'

df = pd.read_excel(filename)

# Create a dataframe with the actual records we want to be left with at the end
# You might think of this as removing duplicates, in a way
records = df.loc[df['String Value'].isna()]

# Remove the 'String Value' and 'Remarks' columns
records = records.drop(columns=['String Value', 'Remarks'])

# Set the index to the 'Trigger' column, so another dataframe with the extra 
# details ('Category' and 'Description' columns) can be joined/merged later on
records = records.set_index('Trigger')

# Create a dataframe with just the extra details ('Category' and 'Description' columns)
details = df.loc[df['String Value'].notna()]

# Transform the dataframe using DataFrame.pivot so that:
# - the indexes are the values from the 'Trigger' column
# - the columns are the values from the 'String Value' column
# - the values are taken from the 'Remarks' column
details = details.pivot(index='Trigger', columns='String Value', values='Remarks')

# Combine the 2 dataframes with DataFrame.join
combined = records.join(details)

# Save to a new spreadsheet
combined.to_excel(new_filename)

# Launch the new spreadsheet
os.startfile(new_filename)

为了使其更容易理解，以下是每个步骤之后的数据框：

df = pd.read_excel(filename)

                   Trigger   From     To String Value                    Remarks
0  incident: INC111111111    Tom  Peter          NaN                        NaN
1      request: REQ123464   John   Alex          NaN                        NaN
2   incident: INC22222222  David  Maria          NaN                        NaN
3  incident: INC111111111    NaN    NaN     Category                       FYI 
4      request: REQ123464    NaN    NaN     Category               Wrong Route 
5   incident: INC22222222    NaN    NaN     Category                       FYI 
6  incident: INC111111111    NaN    NaN  Description            Wrongly routed 
7      request: REQ123464    NaN    NaN  Description  Dispatch to another team 
8   incident: INC22222222    NaN    NaN  Description     Try more troubleshoot

# Create a dataframe with the actual records we want to be left with at the end
# You might think of this as removing duplicates, in a way
records = df.loc[df['String Value'].isna()]

                   Trigger   From     To String Value Remarks
0  incident: INC111111111    Tom  Peter          NaN     NaN
1      request: REQ123464   John   Alex          NaN     NaN
2   incident: INC22222222  David  Maria          NaN     NaN

# Remove the 'String Value' and 'Remarks' columns
records = records.drop(columns=['String Value', 'Remarks'])

                   Trigger   From     To
0  incident: INC111111111    Tom  Peter
1      request: REQ123464   John   Alex
2   incident: INC22222222  David  Maria

# Set the index to the 'Trigger' column, so another dataframe with the extra 
# details ('Category' and 'Description' columns) can be joined/merged later on
records = records.set_index('Trigger')

                          From     To
Trigger                             
incident: INC111111111    Tom  Peter
request: REQ123464       John   Alex
incident: INC22222222   David  Maria

# Create a dataframe with just the extra details ('Category' and 'Description' columns)
details = df.loc[df['String Value'].notna()]

                   Trigger From   To String Value                    Remarks
3  incident: INC111111111  NaN  NaN     Category                       FYI 
4      request: REQ123464  NaN  NaN     Category               Wrong Route 
5   incident: INC22222222  NaN  NaN     Category                       FYI 
6  incident: INC111111111  NaN  NaN  Description            Wrongly routed 
7      request: REQ123464  NaN  NaN  Description  Dispatch to another team 
8   incident: INC22222222  NaN  NaN  Description     Try more troubleshoot

# Transform the dataframe using DataFrame.pivot so that:
# - the indexes are the values from the 'Trigger' column
# - the columns are the values from the 'String Value' column
# - the values are taken from the 'Remarks' column
details = details.pivot(index='Trigger', columns='String Value', values='Remarks')

 String Value                Category                Description
Trigger                                                        
incident: INC111111111          FYI             Wrongly routed 
incident: INC22222222           FYI      Try more troubleshoot 
request: REQ123464      Wrong Route   Dispatch to another team

# Combine the 2 dataframes with DataFrame.join
combined = records.join(details)

                          From     To      Category                Description
Trigger                                                                      
incident: INC111111111    Tom  Peter          FYI             Wrongly routed 
request: REQ123464       John   Alex  Wrong Route   Dispatch to another team 
incident: INC22222222   David  Maria          FYI      Try more troubleshoot

My strategy was to create and then join 2 dataframes with the index set to the Trigger column. One dataframe contains only the Trigger, To, and From columns, while the other dataframe has the Trigger, Category, and Description columns. The "trick" here is to use DataFrame.pivot to get the corresponding Category and Description values into their own columns.

import pandas as pd
import os

filename = r'logfile.xlsx'
new_filename = r'consolidated_logfile.xlsx'

df = pd.read_excel(filename)

# Create a dataframe with the actual records we want to be left with at the end
# You might think of this as removing duplicates, in a way
records = df.loc[df['String Value'].isna()]

# Remove the 'String Value' and 'Remarks' columns
records = records.drop(columns=['String Value', 'Remarks'])

# Set the index to the 'Trigger' column, so another dataframe with the extra 
# details ('Category' and 'Description' columns) can be joined/merged later on
records = records.set_index('Trigger')

# Create a dataframe with just the extra details ('Category' and 'Description' columns)
details = df.loc[df['String Value'].notna()]

# Transform the dataframe using DataFrame.pivot so that:
# - the indexes are the values from the 'Trigger' column
# - the columns are the values from the 'String Value' column
# - the values are taken from the 'Remarks' column
details = details.pivot(index='Trigger', columns='String Value', values='Remarks')

# Combine the 2 dataframes with DataFrame.join
combined = records.join(details)

# Save to a new spreadsheet
combined.to_excel(new_filename)

# Launch the new spreadsheet
os.startfile(new_filename)

To make it easier to understand, here are the dataframes after each step:

df = pd.read_excel(filename)

                   Trigger   From     To String Value                    Remarks
0  incident: INC111111111    Tom  Peter          NaN                        NaN
1      request: REQ123464   John   Alex          NaN                        NaN
2   incident: INC22222222  David  Maria          NaN                        NaN
3  incident: INC111111111    NaN    NaN     Category                       FYI 
4      request: REQ123464    NaN    NaN     Category               Wrong Route 
5   incident: INC22222222    NaN    NaN     Category                       FYI 
6  incident: INC111111111    NaN    NaN  Description            Wrongly routed 
7      request: REQ123464    NaN    NaN  Description  Dispatch to another team 
8   incident: INC22222222    NaN    NaN  Description     Try more troubleshoot

# Create a dataframe with the actual records we want to be left with at the end
# You might think of this as removing duplicates, in a way
records = df.loc[df['String Value'].isna()]

                   Trigger   From     To String Value Remarks
0  incident: INC111111111    Tom  Peter          NaN     NaN
1      request: REQ123464   John   Alex          NaN     NaN
2   incident: INC22222222  David  Maria          NaN     NaN

# Remove the 'String Value' and 'Remarks' columns
records = records.drop(columns=['String Value', 'Remarks'])

                   Trigger   From     To
0  incident: INC111111111    Tom  Peter
1      request: REQ123464   John   Alex
2   incident: INC22222222  David  Maria

# Set the index to the 'Trigger' column, so another dataframe with the extra 
# details ('Category' and 'Description' columns) can be joined/merged later on
records = records.set_index('Trigger')

                          From     To
Trigger                             
incident: INC111111111    Tom  Peter
request: REQ123464       John   Alex
incident: INC22222222   David  Maria

# Create a dataframe with just the extra details ('Category' and 'Description' columns)
details = df.loc[df['String Value'].notna()]

                   Trigger From   To String Value                    Remarks
3  incident: INC111111111  NaN  NaN     Category                       FYI 
4      request: REQ123464  NaN  NaN     Category               Wrong Route 
5   incident: INC22222222  NaN  NaN     Category                       FYI 
6  incident: INC111111111  NaN  NaN  Description            Wrongly routed 
7      request: REQ123464  NaN  NaN  Description  Dispatch to another team 
8   incident: INC22222222  NaN  NaN  Description     Try more troubleshoot

# Transform the dataframe using DataFrame.pivot so that:
# - the indexes are the values from the 'Trigger' column
# - the columns are the values from the 'String Value' column
# - the values are taken from the 'Remarks' column
details = details.pivot(index='Trigger', columns='String Value', values='Remarks')

 String Value                Category                Description
Trigger                                                        
incident: INC111111111          FYI             Wrongly routed 
incident: INC22222222           FYI      Try more troubleshoot 
request: REQ123464      Wrong Route   Dispatch to another team

# Combine the 2 dataframes with DataFrame.join
combined = records.join(details)

                          From     To      Category                Description
Trigger                                                                      
incident: INC111111111    Tom  Peter          FYI             Wrongly routed 
request: REQ123464       John   Alex  Wrong Route   Dispatch to another team 
incident: INC22222222   David  Maria          FYI      Try more troubleshoot

回复收藏 0 原文

~没有更多了~