如何将从现有列得出的句子嵌入到新列中？

发布于 2025-01-28 14:49:44 字数 1833 浏览 4 评论 0原文

我有一个具有四个nw_data = ['qn_id'，'qn_context'，qns'，'anwsers'的数据框。这就是我想在该数据集中添加第五列的样子

Qn_id  |     Qn_context       |   Qns        |     Anwsers
 01    | In 1962, Uk gave...  | what year....| the year 1962 was.....
 02    | Major kanuti raised..| Who raised...| Kanuti akorimo rasied.

，该数据集由列['answers']的句子嵌入组成。

我正在使用sente_transformers生成句子嵌入。

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

我尝试使用一种方法：

#Created a var for the column
sent = nw_data['Answers']

然后

#Passed the variable sent into the model and created the embeddings
embeddings = model.encode(sent)

得到

#Tried passing the embeddings into a new column named Embeddings
nw_data['Embeddings'] = embeddings

一个错误：

KeyError: 'Embeddings'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
KeyError: 'Embeddings'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py in check_ndim(values, placement, ndim)
   1978         if len(placement) != len(values):
   1979             raise ValueError(
-> 1980                 f"Wrong number of items passed {len(values)}, "
   1981                 f"placement implies {len(placement)}"
   1982             )

ValueError: Wrong number of items passed 384, placement implies 1

如何创建这些嵌入并将它们添加到相同的数据框架NW_DATA中的新列中！

无论如何，是否有可能，建议尝试使用 .apply（）方法或 lambda函数，但是问题不确定如何使用它们。

原文

I have a dataframe that has four nw_data=['Qn_id', 'Qn_context', 'Qns', 'Anwsers']. This is how it looks like

Qn_id  |     Qn_context       |   Qns        |     Anwsers
 01    | In 1962, Uk gave...  | what year....| the year 1962 was.....
 02    | Major kanuti raised..| Who raised...| Kanuti akorimo rasied.

I want to add a fifth column to that dataset that consists of the sentence embeddings of the column['Answers'].

Am using the sentence_transformers to generate the sentence embeddings.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

I tried using an approach where:

#Created a var for the column
sent = nw_data['Answers']

and

#Passed the variable sent into the model and created the embeddings
embeddings = model.encode(sent)

then

#Tried passing the embeddings into a new column named Embeddings
nw_data['Embeddings'] = embeddings

I get an error:

KeyError: 'Embeddings'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
KeyError: 'Embeddings'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py in check_ndim(values, placement, ndim)
   1978         if len(placement) != len(values):
   1979             raise ValueError(
-> 1980                 f"Wrong number of items passed {len(values)}, "
   1981                 f"placement implies {len(placement)}"
   1982             )

ValueError: Wrong number of items passed 384, placement implies 1

How can i create these embeddings and add them to a new column in the same dataframe nw_data!!

Is it possible anyway, was advised try using the .apply() method or lambda functions but the issues is am not sure on how or when to use them.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尝蛊 2025-02-04 14:49:44

如果我正确理解，您想将列表（嵌入）插入单元格中。

尝试使用：

>>> import pandas as pd
>>> from sentence_transformers import SentenceTransformer
>>> sentences = 'Absence of sanity'
>>> embedding = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2], 'Embedding': None})
>>> df.at[0, 'Embedding'] = embedding.tolist()
>>> df.dtypes
foo           int64
Embedding    object
>>> df.head()
dtype: object
   foo                                          Embedding
0    1  [0.2954030930995941, 0.29181134700775146, 2.16...
1    2                                               None

如果您有多个句子，只需通过列表：

>>> import pandas as pd
>>> sentences = ['Absence of sanity', 'its a new day', 'make the best of it']
>>> embeddings = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2, 3], 'Embedding': None})
>>> df['Embedding'] = embeddings.tolist()
>>> print(df.head())
   foo                                          Embedding
0    1  [0.29540303349494934, 0.29181137681007385, 2.1...
1    2  [0.0362740121781826, -0.8035800457000732, 2.44...
2    3  [-0.4539063572883606, -0.4333038330078125, 2.2...

If I understand correctly, you'd like to insert a list (embedding) into a cell.

Try using at:

>>> import pandas as pd
>>> from sentence_transformers import SentenceTransformer
>>> sentences = 'Absence of sanity'
>>> embedding = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2], 'Embedding': None})
>>> df.at[0, 'Embedding'] = embedding.tolist()
>>> df.dtypes
foo           int64
Embedding    object
>>> df.head()
dtype: object
   foo                                          Embedding
0    1  [0.2954030930995941, 0.29181134700775146, 2.16...
1    2                                               None

If you have multiple sentences, just pass the list:

>>> import pandas as pd
>>> sentences = ['Absence of sanity', 'its a new day', 'make the best of it']
>>> embeddings = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2, 3], 'Embedding': None})
>>> df['Embedding'] = embeddings.tolist()
>>> print(df.head())
   foo                                          Embedding
0    1  [0.29540303349494934, 0.29181137681007385, 2.1...
1    2  [0.0362740121781826, -0.8035800457000732, 2.44...
2    3  [-0.4539063572883606, -0.4333038330078125, 2.2...

回复收藏 0 原文

尸血腥色 2025-02-04 14:49:44

我找到了另一种方法，请告诉我它是否有效：

def embed_text(sentence):
       return model.encode(sentence)
nw_data['Embeddings'] = nw_data['Answers'].apply(embed_text)

I found another way to do this, pls tell me if it works:

def embed_text(sentence):
       return model.encode(sentence)
nw_data['Embeddings'] = nw_data['Answers'].apply(embed_text)

回复收藏 0 原文

~没有更多了~

关于作者

§普罗旺斯的薰衣草

暂无简介

文章

25 人气

关注发私信

牛↙奶布丁

文章 0 评论 0

关注

COSO

文章 0 评论 0

关注

落叶

文章 0 评论 0

关注

暗地喜欢

文章 0 评论 0

关注

qq_i8qOEG

文章 0 评论 0

关注

qq_Wl4Sbi

文章 0 评论 0

友情链接

文江博客

如何将从现有列得出的句子嵌入到新列中？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

牛↙奶布丁

COSO

落叶

暗地喜欢

qq_i8qOEG

qq_Wl4Sbi

友情链接

如何将从现有列得出的句子嵌入到新列中？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

牛↙奶布丁

COSO

落叶

暗地喜欢

qq_i8qOEG

qq_Wl4Sbi

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。