在，python试图在数据框中删除重复的单词，但获取错误

发布于 2025-02-10 06:33:05 字数 628 浏览 1 评论 0原文

我正在尝试在我尝试过以下的单元格中删除重复的单词

      Current      Desired
0  John and Jane    John and Jane
1  John and John    John
2  John             John
3  Jane and Jane    Jane

，所需的列被odict _键（['nan']）：

from collections import OrderedDict

df['Current'] = (df['Desired'].astype(str).str.split()
                              .apply(lambda x: OrderedDict.fromkeys(x).keys())
                              .astype(str).str.join(' '))

我也尝试过，但是所需的列填充nan

df['Desired'] = df['Current'].str.replace(r'\b(\w+)(\s+\1)+\b', r'\1')

原文

I'm trying to remove a duplicate word in a cell

      Current      Desired
0  John and Jane    John and Jane
1  John and John    John
2  John             John
3  Jane and Jane    Jane

I have tried the following, desired column gets filled with o d i c t _ k e y s ( [ ' n a n ' ] ):

from collections import OrderedDict

df['Current'] = (df['Desired'].astype(str).str.split()
                              .apply(lambda x: OrderedDict.fromkeys(x).keys())
                              .astype(str).str.join(' '))

I have also tried this, but the desired column gets filled with nan

df['Desired'] = df['Current'].str.replace(r'\b(\w+)(\s+\1)+\b', r'\1')

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

眼眸 2025-02-17 06:33:05

让我们做split使用set然后 join 返回

df['out'] = df.Current.str.split(' and ').map(lambda x : ' and '.join(set(x)))
df
Out[876]: 
         Current            out
0  John and Jane  Jane and John
1  John and John           John
2           John           John
3  Jane and Jane           Jane

Let us do split with set then join back

df['out'] = df.Current.str.split(' and ').map(lambda x : ' and '.join(set(x)))
df
Out[876]: 
         Current            out
0  John and Jane  Jane and John
1  John and John           John
2           John           John
3  Jane and Jane           Jane

回复收藏 0 原文

~没有更多了~