熊猫将列表的列转换为文本数据预处理列

发布于 2025-02-01 20:42:22 字数 1077 浏览 3 评论 0原文

我有一个看起来像这样的数据集:

情感文本
阳性['chewy',''','dhepburn','sed']
中性['chewy','plus','you','ve',' ']

,我想将其转换为:

中性我基本上想将“文本
dhepburn所说的
咀嚼,您添加了

”列(由列表组成的“文本”列转换为文本列。

我已经完成了此代码的多个版本:

def joinr(words):
   return ','.join(words)

#df['text'] = df.apply(lambda row: joinr(row['text']), axis=1)
#df['text'] = df['text'].apply(lambda x: ' '.join([x]))
df['text'] = df['text'].apply(joinr)

而且我一直得到类似于此代码的东西:

情感文本
呈阳性['chew y','wha t','dhepbur n','sai d']
中性['chew y', 'plu s','yo u','v e','adde d']

这是ML模型预处理的数据。我正在Google Colab(类似于Juypter Notebook)工作。

I have a data set that looks like this:

sentimenttext
positive['chewy', 'what', 'dhepburn', 'said']
neutral['chewy', 'plus', 'you', 've', 'added']

and I want to convert it to this:

sentimenttext
positivechewy what dhepburn said
neutralchewy plus you ve added

I basically want to convert the 'text' column, which is made up of lists, into a column of text.

I've done multiple versions of this code:

def joinr(words):
   return ','.join(words)

#df['text'] = df.apply(lambda row: joinr(row['text']), axis=1)
#df['text'] = df['text'].apply(lambda x: ' '.join([x]))
df['text'] = df['text'].apply(joinr)

and I keep getting something that resembles this:

sentimenttext
positive['c h e w y', 'w h a t', 'd h e p b u r n', 's a i d']
neutral['c h e w y', 'p l u s', 'y o u', 'v e', 'a d d e d']

This is apart of data pre-processing for a ML model. I'm working in Google Colab (similar to Juypter Notebook).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

葬﹪忆之殇 2025-02-08 20:42:22

我相信您的问题是轴= 1您不需要

data = {
    'sentiment' : ['positive', 'neutral'],
    'text' : ["['chewy', 'what', 'dhepburn', 'said']", "['chewy', 'plus', 'you', 've', 'added']"]
}
df = pd.DataFrame(data)
df['text'] = df['text'].apply(lambda x : x.replace('[', '')).apply(lambda x : x.replace(']', '')).apply(lambda x : x.replace("'", ''))
df['text'] = df['text'].apply(lambda x : x.split(','))
df['text'] = df['text'].agg(' '.join)
df

I believe your problem is the axis = 1 you don't need that

data = {
    'sentiment' : ['positive', 'neutral'],
    'text' : ["['chewy', 'what', 'dhepburn', 'said']", "['chewy', 'plus', 'you', 've', 'added']"]
}
df = pd.DataFrame(data)
df['text'] = df['text'].apply(lambda x : x.replace('[', '')).apply(lambda x : x.replace(']', '')).apply(lambda x : x.replace("'", ''))
df['text'] = df['text'].apply(lambda x : x.split(','))
df['text'] = df['text'].agg(' '.join)
df
拥抱没勇气 2025-02-08 20:42:22

使用JOIN

df['test'].str.join(' ')

演示:

df = pd.DataFrame({'test': [['chewy', 'what', 'dhepburn', 'said']]})
df['test'].str.join(' ')

输出:

0    chewy what dhepburn said
Name: test, dtype: object

基于评论:

#Preparing data
string = """sentiment   text
positive    ['chewy', 'what', 'dhepburn', 'said']
neutral ['chewy', 'plus', 'you', 've', 'added']"""
data = [x.split('\t') for x in string.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])

#Solution
df['text'].apply(lambda x: eval(x)).str.join(' ')

另外,您可以简单地使用:

df['text'].str.replace("\[|\]|'|,",'')

输出:

0    chewy what dhepburn said
1     chewy plus you ve added
Name: text, dtype: object

Use join:

df['test'].str.join(' ')

Demonstration:

df = pd.DataFrame({'test': [['chewy', 'what', 'dhepburn', 'said']]})
df['test'].str.join(' ')

Output:

0    chewy what dhepburn said
Name: test, dtype: object

Based on the comment:

#Preparing data
string = """sentiment   text
positive    ['chewy', 'what', 'dhepburn', 'said']
neutral ['chewy', 'plus', 'you', 've', 'added']"""
data = [x.split('\t') for x in string.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])

#Solution
df['text'].apply(lambda x: eval(x)).str.join(' ')

Also, you can use more simply:

df['text'].str.replace("\[|\]|'|,",'')

Output:

0    chewy what dhepburn said
1     chewy plus you ve added
Name: text, dtype: object
你げ笑在眉眼 2025-02-08 20:42:22

如果您有一个列表的字符串表示形式,则可以使用:

from ast import literal_eval

df['text'] = df['text'].apply(lambda x: ' '.join(literal_eval(x)))

如果真的只想删除括号和逗号,请使用正则:

df['text'] = df['text'].str.replace('[\[\',\]]', '', regex=True)

输出:

  sentiment                      text
0  positive  chewy what dhepburn said
1   neutral   chewy plus you ve added

If you have a string representation of a list you can use:

from ast import literal_eval

df['text'] = df['text'].apply(lambda x: ' '.join(literal_eval(x)))

If really you just want to remove the brackets and commas, use a regex:

df['text'] = df['text'].str.replace('[\[\',\]]', '', regex=True)

Output:

  sentiment                      text
0  positive  chewy what dhepburn said
1   neutral   chewy plus you ve added
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文