熊猫将列表的列转换为文本数据预处理列

发布于 2025-02-01 20:42:22 字数 1077 浏览 3 评论 0原文

我有一个看起来像这样的数据集：

情感	文本
阳性	['chewy'，'''，'dhepburn'，'sed']
中性	['chewy'，'plus'，'you'，'ve'，' ']

，我想将其转换为：

中性	我基本上想将“文本
性	dhepburn所说的
咀嚼	，您添加了

”列（由列表组成的“文本”列转换为文本列。

我已经完成了此代码的多个版本：

def joinr(words):
   return ','.join(words)

#df['text'] = df.apply(lambda row: joinr(row['text']), axis=1)
#df['text'] = df['text'].apply(lambda x: ' '.join([x]))
df['text'] = df['text'].apply(joinr)

而且我一直得到类似于此代码的东西：

情感	文本
呈阳性	['chew y'，'wha t'，'dhepbur n'，'sai d']
中性	['chew y'， 'plu s'，'yo u'，'v e'，'adde d']

这是ML模型预处理的数据。我正在Google Colab（类似于Juypter Notebook）工作。

原文

I have a data set that looks like this:

sentiment	text
positive	['chewy', 'what', 'dhepburn', 'said']
neutral	['chewy', 'plus', 'you', 've', 'added']

and I want to convert it to this:

sentiment	text
positive	chewy what dhepburn said
neutral	chewy plus you ve added

I basically want to convert the 'text' column, which is made up of lists, into a column of text.

I've done multiple versions of this code:

def joinr(words):
   return ','.join(words)

#df['text'] = df.apply(lambda row: joinr(row['text']), axis=1)
#df['text'] = df['text'].apply(lambda x: ' '.join([x]))
df['text'] = df['text'].apply(joinr)

and I keep getting something that resembles this:

sentiment	text
positive	['c h e w y', 'w h a t', 'd h e p b u r n', 's a i d']
neutral	['c h e w y', 'p l u s', 'y o u', 'v e', 'a d d e d']

This is apart of data pre-processing for a ML model. I'm working in Google Colab (similar to Juypter Notebook).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

葬﹪忆之殇 2025-02-08 20:42:22

我相信您的问题是轴= 1您不需要

data = {
    'sentiment' : ['positive', 'neutral'],
    'text' : ["['chewy', 'what', 'dhepburn', 'said']", "['chewy', 'plus', 'you', 've', 'added']"]
}
df = pd.DataFrame(data)
df['text'] = df['text'].apply(lambda x : x.replace('[', '')).apply(lambda x : x.replace(']', '')).apply(lambda x : x.replace("'", ''))
df['text'] = df['text'].apply(lambda x : x.split(','))
df['text'] = df['text'].agg(' '.join)
df

I believe your problem is the axis = 1 you don't need that

data = {
    'sentiment' : ['positive', 'neutral'],
    'text' : ["['chewy', 'what', 'dhepburn', 'said']", "['chewy', 'plus', 'you', 've', 'added']"]
}
df = pd.DataFrame(data)
df['text'] = df['text'].apply(lambda x : x.replace('[', '')).apply(lambda x : x.replace(']', '')).apply(lambda x : x.replace("'", ''))
df['text'] = df['text'].apply(lambda x : x.split(','))
df['text'] = df['text'].agg(' '.join)
df

回复收藏 0 原文

拥抱没勇气 2025-02-08 20:42:22

使用JOIN：

df['test'].str.join(' ')

演示：

df = pd.DataFrame({'test': [['chewy', 'what', 'dhepburn', 'said']]})
df['test'].str.join(' ')

输出：

0    chewy what dhepburn said
Name: test, dtype: object

基于评论：

#Preparing data
string = """sentiment   text
positive    ['chewy', 'what', 'dhepburn', 'said']
neutral ['chewy', 'plus', 'you', 've', 'added']"""
data = [x.split('\t') for x in string.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])

#Solution
df['text'].apply(lambda x: eval(x)).str.join(' ')

另外，您可以简单地使用：

df['text'].str.replace("\[|\]|'|,",'')

输出：

0    chewy what dhepburn said
1     chewy plus you ve added
Name: text, dtype: object

Use join:

df['test'].str.join(' ')

Demonstration:

df = pd.DataFrame({'test': [['chewy', 'what', 'dhepburn', 'said']]})
df['test'].str.join(' ')

Output:

0    chewy what dhepburn said
Name: test, dtype: object

Based on the comment:

#Preparing data
string = """sentiment   text
positive    ['chewy', 'what', 'dhepburn', 'said']
neutral ['chewy', 'plus', 'you', 've', 'added']"""
data = [x.split('\t') for x in string.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])

#Solution
df['text'].apply(lambda x: eval(x)).str.join(' ')

Also, you can use more simply:

df['text'].str.replace("\[|\]|'|,",'')

Output:

0    chewy what dhepburn said
1     chewy plus you ve added
Name: text, dtype: object

回复收藏 0 原文

你げ笑在眉眼 2025-02-08 20:42:22

如果您有一个列表的字符串表示形式，则可以使用：

from ast import literal_eval

df['text'] = df['text'].apply(lambda x: ' '.join(literal_eval(x)))

如果真的只想删除括号和逗号，请使用正则：

df['text'] = df['text'].str.replace('[\[\',\]]', '', regex=True)

输出：

  sentiment                      text
0  positive  chewy what dhepburn said
1   neutral   chewy plus you ve added

If you have a string representation of a list you can use:

from ast import literal_eval

df['text'] = df['text'].apply(lambda x: ' '.join(literal_eval(x)))

If really you just want to remove the brackets and commas, use a regex:

df['text'] = df['text'].str.replace('[\[\',\]]', '', regex=True)

Output:

  sentiment                      text
0  positive  chewy what dhepburn said
1   neutral   chewy plus you ve added

回复收藏 0 原文

~没有更多了~

关于作者

画▽骨i

暂无简介

文章

30 人气

关注发私信

櫻之舞

文章 0 评论 0

关注

弥枳

文章 0 评论 0

关注

m2429

文章 0 评论 0

关注

寻找一个思念的角度

文章 0 评论 0

关注

野却迷人

文章 0 评论 0

关注

我怀念的。

文章 0 评论 0

友情链接

文江博客

熊猫将列表的列转换为文本数据预处理列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

熊猫将列表的列转换为文本数据预处理列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。