将用户在循环中输入的新数据附加到现有数据框架
我有一个Python程序,显示要由用户标记的文本。用户将标签添加到显示的文本后,该程序应能够创建一个新的数据框架,其中文本向用户呈现在第一列中标记,以及用户在第二列中输入的标签。这个新的数据框将附加到现有数据框架上。我的程序运行良好,但是创建的新数据框显示了用户在单独的行中输入的每个标签,并且在每行中重复的文本。我所需的输出类似于:
corpus labels
0 text1 x,y,z 1 text2 a,b 3 text3 c,d,e
,但使用我的代码,我得到了:
corpus labels
0 text1 x,y,z 1个text1 x,y,z 2 text1 x,y,z
如何获取所需的输出?我的代码如下:
count=1
for i in sorted_dict:
count+=1
a=pool_df['corpus'][i]
print(f'\n\nText {count}: index {i} \n\n{a}')
question=input('Enter new label(s) for this text? type Y for yes or N for no: ')
question.lower()
if question == 'n':
print('see you later')
break
elif question == 'y':
print('\n\nIf you think that the label printed is associated with the corpus,
type the label otherwise hit "space"\n\n')
new_label1=input('x: ')
new_label2=input('y: ')
new_label3=input('z: ')
new_label4=input('a: ')
new_label5=input('b: ')
new_label6=input('c: ' )
new_label7=input('d: ')
list_new_labels=
[new_label1,new_label2,new_label3,new_label4,new_label5,new_label6,new_label7]
list_new_labels1=[]
for i in list_new_labels:
if i != '':
list_new_labels1.append(i)
print(f'The new labels are: {list_new_labels1}')
df_new_labels={'corpus': a, 'zero_level_name': list_new_labels1}
df_new_labels=pd.DataFrame(df_new_labels)
df_new_labels
I have a python program that displays texts to be labeled by the user. After the user add labels to the displayed text, the program should be able to create a new data frame with the text presented to the user for labelling in the first column and the labels entered by the user in the second column. This new data frame will be appended to an existing data frame. My program works well, but the new data frame created displays each label entered by the user in a separate row and the text repeated in each row. My desired output is something like:
corpus labels
0 text1 x, y, z
1 text2 a,b
3 text3 c,d,e
But with my code I am getting:
corpus labels
0 text1 x,y,z
1 text1 x,y,z
2 text1 x,y,z
how to get my desired output? my code is below:
count=1
for i in sorted_dict:
count+=1
a=pool_df['corpus'][i]
print(f'\n\nText {count}: index {i} \n\n{a}')
question=input('Enter new label(s) for this text? type Y for yes or N for no: ')
question.lower()
if question == 'n':
print('see you later')
break
elif question == 'y':
print('\n\nIf you think that the label printed is associated with the corpus,
type the label otherwise hit "space"\n\n')
new_label1=input('x: ')
new_label2=input('y: ')
new_label3=input('z: ')
new_label4=input('a: ')
new_label5=input('b: ')
new_label6=input('c: ' )
new_label7=input('d: ')
list_new_labels=
[new_label1,new_label2,new_label3,new_label4,new_label5,new_label6,new_label7]
list_new_labels1=[]
for i in list_new_labels:
if i != '':
list_new_labels1.append(i)
print(f'The new labels are: {list_new_labels1}')
df_new_labels={'corpus': a, 'zero_level_name': list_new_labels1}
df_new_labels=pd.DataFrame(df_new_labels)
df_new_labels
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为什么它不按预期工作,有两个部分。
错误1:
df_new_labels
都是为每个文本创建的。相反,新文本及其标签应附加到现有列表中。错误2:使用
df_new_labels = pd.dataframe(df_new_labels)
创建数据框时,PANDAS会自动扩展语料库列以适合您的标签列表的长度。为了解决这个问题,标签应为列表。给定以下示例输入:
text2','text3'])
pool_df = pd.dataframe(columns = ['corpus'],data = ['text1','text2', ' ,0,2]
编写此代码的一种方法如下:
There are two parts to why this is not working as intended.
Error 1: the
df_new_labels
is created anew for every text. Instead, the new text and its labels should be appended to existing lists.Error 2: when creating the DataFrame with
df_new_labels=pd.DataFrame(df_new_labels)
, pandas automatically extends the corpus column to fit the length of your list of labels. To circumvent this, the labels should be a list of lists.Given the following exemplary inputs:
pool_df = pd.DataFrame(columns=['corpus'], data=['text1', 'text2', 'text3'])
sorted_dict = [1, 0, 2]
One way to write this code is as follows: