将用户在循环中输入的新数据附加到现有数据框架

发布于 2025-02-13 23:31:39 字数 1483 浏览 4 评论 0原文

我有一个Python程序,显示要由用户标记的文本。用户将标签添加到显示的文本后,该程序应能够创建一个新的数据框架,其中文本向用户呈现在第一列中标记,以及用户在第二列中输入的标签。这个新的数据框将附加到现有数据框架上。我的程序运行良好,但是创建的新数据框显示了用户在单独的行中输入的每个标签,并且在每行中重复的文本。我所需的输出类似于:

              corpus                               labels

0 text1 x,y,z 1 text2 a,b 3 text3 c,d,e

,但使用我的代码,我得到了:

corpus  labels

0 text1 x,y,z 1个text1 x,y,z 2 text1 x,y,z

如何获取所需的输出?我的代码如下:

count=1


for i in sorted_dict:
  count+=1
  a=pool_df['corpus'][i]
  print(f'\n\nText {count}: index {i} \n\n{a}')

  question=input('Enter new label(s) for this text? type Y for yes or N for no: ')
  question.lower()
  if question == 'n':
    print('see you later')
    break
  elif question == 'y':
    print('\n\nIf you think that the label printed is associated with the corpus, 
type the label otherwise hit "space"\n\n')
    new_label1=input('x: ')
    new_label2=input('y: ')
    new_label3=input('z: ')
    new_label4=input('a: ')
    new_label5=input('b: ')
    new_label6=input('c: ' )
    new_label7=input('d: ')
    list_new_labels= 
  [new_label1,new_label2,new_label3,new_label4,new_label5,new_label6,new_label7]
    list_new_labels1=[]
    for i in list_new_labels:
      if i != '':
        list_new_labels1.append(i)
    print(f'The new labels are: {list_new_labels1}')
    df_new_labels={'corpus': a, 'zero_level_name': list_new_labels1}

df_new_labels=pd.DataFrame(df_new_labels)

df_new_labels

I have a python program that displays texts to be labeled by the user. After the user add labels to the displayed text, the program should be able to create a new data frame with the text presented to the user for labelling in the first column and the labels entered by the user in the second column. This new data frame will be appended to an existing data frame. My program works well, but the new data frame created displays each label entered by the user in a separate row and the text repeated in each row. My desired output is something like:

              corpus                               labels

0 text1 x, y, z
1 text2 a,b
3 text3 c,d,e

But with my code I am getting:

corpus  labels

0 text1 x,y,z
1 text1 x,y,z
2 text1 x,y,z

how to get my desired output? my code is below:

count=1


for i in sorted_dict:
  count+=1
  a=pool_df['corpus'][i]
  print(f'\n\nText {count}: index {i} \n\n{a}')

  question=input('Enter new label(s) for this text? type Y for yes or N for no: ')
  question.lower()
  if question == 'n':
    print('see you later')
    break
  elif question == 'y':
    print('\n\nIf you think that the label printed is associated with the corpus, 
type the label otherwise hit "space"\n\n')
    new_label1=input('x: ')
    new_label2=input('y: ')
    new_label3=input('z: ')
    new_label4=input('a: ')
    new_label5=input('b: ')
    new_label6=input('c: ' )
    new_label7=input('d: ')
    list_new_labels= 
  [new_label1,new_label2,new_label3,new_label4,new_label5,new_label6,new_label7]
    list_new_labels1=[]
    for i in list_new_labels:
      if i != '':
        list_new_labels1.append(i)
    print(f'The new labels are: {list_new_labels1}')
    df_new_labels={'corpus': a, 'zero_level_name': list_new_labels1}

df_new_labels=pd.DataFrame(df_new_labels)

df_new_labels

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

苦笑流年记忆 2025-02-20 23:31:39

为什么它不按预期工作,有两个部分。
错误1:df_new_labels都是为每个文本创建的。相反,新文本及其标签应附加到现有列表中。
错误2:使用df_new_labels = pd.dataframe(df_new_labels)创建数据框时,PANDAS会自动扩展语料库列以适合您的标签列表的长度。为了解决这个问题,标签应为列表。

给定以下示例输入:

text2','text3'])

pool_df = pd.dataframe(columns = ['corpus'],data = ['text1','text2', ' ,0,2]

编写此代码的一种方法如下:

count = 1

new_labels = {'corpus': [], 'zero_level_name': []}  # lists to store new entries, see Error 1
corpus_list = []
label_lists = []
for i in sorted_dict:
    count += 1
    a = pool_df['corpus'][i]
    print(f'\n\nText {count}: index {i} \n\n{a}')
    question=input('Enter new label(s) for this text? type Y for yes or N for no: ')
    question.lower()
    if question == 'n':
        print('see you later')
        break
    elif question == 'y':
        print('\n\nIf you think that the label printed is associated with the corpus, type the label otherwise hit "space"\n\n')
    new_label1=input('x: ')
    new_label2=input('y: ')
    new_label3=input('z: ')
    new_label4=input('a: ')
    new_label5=input('b: ')
    new_label6=input('c: ')
    new_label7=input('d: ')
    list_new_labels = [new_label1,new_label2,new_label3,new_label4,new_label5,new_label6,new_label7]
    list_new_labels1 = []
    for i in list_new_labels:
        if i != '':
            list_new_labels1.append(i)
    print(f'The new labels are: {list_new_labels1}')
    new_labels['corpus'].append(a)
    new_labels['zero_level_name'].append(list_new_labels1)  # append list of labels to create list of lists of labels, see Error 2

df_new_labels = pd.DataFrame(new_labels)
df_new_labels

There are two parts to why this is not working as intended.
Error 1: the df_new_labels is created anew for every text. Instead, the new text and its labels should be appended to existing lists.
Error 2: when creating the DataFrame with df_new_labels=pd.DataFrame(df_new_labels), pandas automatically extends the corpus column to fit the length of your list of labels. To circumvent this, the labels should be a list of lists.

Given the following exemplary inputs:

pool_df = pd.DataFrame(columns=['corpus'], data=['text1', 'text2', 'text3'])

sorted_dict = [1, 0, 2]

One way to write this code is as follows:

count = 1

new_labels = {'corpus': [], 'zero_level_name': []}  # lists to store new entries, see Error 1
corpus_list = []
label_lists = []
for i in sorted_dict:
    count += 1
    a = pool_df['corpus'][i]
    print(f'\n\nText {count}: index {i} \n\n{a}')
    question=input('Enter new label(s) for this text? type Y for yes or N for no: ')
    question.lower()
    if question == 'n':
        print('see you later')
        break
    elif question == 'y':
        print('\n\nIf you think that the label printed is associated with the corpus, type the label otherwise hit "space"\n\n')
    new_label1=input('x: ')
    new_label2=input('y: ')
    new_label3=input('z: ')
    new_label4=input('a: ')
    new_label5=input('b: ')
    new_label6=input('c: ')
    new_label7=input('d: ')
    list_new_labels = [new_label1,new_label2,new_label3,new_label4,new_label5,new_label6,new_label7]
    list_new_labels1 = []
    for i in list_new_labels:
        if i != '':
            list_new_labels1.append(i)
    print(f'The new labels are: {list_new_labels1}')
    new_labels['corpus'].append(a)
    new_labels['zero_level_name'].append(list_new_labels1)  # append list of labels to create list of lists of labels, see Error 2

df_new_labels = pd.DataFrame(new_labels)
df_new_labels
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文