如何将从现有列得出的句子嵌入到新列中?

发布于 2025-01-28 14:49:44 字数 1833 浏览 4 评论 0原文

我有一个具有四个nw_data = ['qn_id','qn_context',qns','anwsers'的数据框。这就是我想在该数据集中添加第五列的样子

Qn_id  |     Qn_context       |   Qns        |     Anwsers
 01    | In 1962, Uk gave...  | what year....| the year 1962 was.....
 02    | Major kanuti raised..| Who raised...| Kanuti akorimo rasied.

,该数据集由列['answers']的句子嵌入组成。

我正在使用sente_transformers生成句子嵌入。

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

我尝试使用一种方法:

#Created a var for the column
sent = nw_data['Answers']

然后

#Passed the variable sent into the model and created the embeddings
embeddings = model.encode(sent)

得到

#Tried passing the embeddings into a new column named Embeddings
nw_data['Embeddings'] = embeddings

一个错误:

KeyError: 'Embeddings'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
KeyError: 'Embeddings'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py in check_ndim(values, placement, ndim)
   1978         if len(placement) != len(values):
   1979             raise ValueError(
-> 1980                 f"Wrong number of items passed {len(values)}, "
   1981                 f"placement implies {len(placement)}"
   1982             )

ValueError: Wrong number of items passed 384, placement implies 1

如何创建这些嵌入并将它们添加到相同的数据框架NW_DATA中的新列中!

无论如何,是否有可能,建议尝试使用 .apply()方法 lambda函数,但是问题不确定如何使用它们。

I have a dataframe that has four nw_data=['Qn_id', 'Qn_context', 'Qns', 'Anwsers']. This is how it looks like

Qn_id  |     Qn_context       |   Qns        |     Anwsers
 01    | In 1962, Uk gave...  | what year....| the year 1962 was.....
 02    | Major kanuti raised..| Who raised...| Kanuti akorimo rasied.

I want to add a fifth column to that dataset that consists of the sentence embeddings of the column['Answers'].

Am using the sentence_transformers to generate the sentence embeddings.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

I tried using an approach where:

#Created a var for the column
sent = nw_data['Answers']

and

#Passed the variable sent into the model and created the embeddings
embeddings = model.encode(sent)

then

#Tried passing the embeddings into a new column named Embeddings
nw_data['Embeddings'] = embeddings

I get an error:

KeyError: 'Embeddings'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
KeyError: 'Embeddings'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py in check_ndim(values, placement, ndim)
   1978         if len(placement) != len(values):
   1979             raise ValueError(
-> 1980                 f"Wrong number of items passed {len(values)}, "
   1981                 f"placement implies {len(placement)}"
   1982             )

ValueError: Wrong number of items passed 384, placement implies 1

How can i create these embeddings and add them to a new column in the same dataframe nw_data!!

Is it possible anyway, was advised try using the .apply() method or lambda functions but the issues is am not sure on how or when to use them.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

尝蛊 2025-02-04 14:49:44

如果我正确理解,您想将列表(嵌入)插入单元格中。

尝试使用

>>> import pandas as pd
>>> from sentence_transformers import SentenceTransformer
>>> sentences = 'Absence of sanity'
>>> embedding = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2], 'Embedding': None})
>>> df.at[0, 'Embedding'] = embedding.tolist()
>>> df.dtypes
foo           int64
Embedding    object
>>> df.head()
dtype: object
   foo                                          Embedding
0    1  [0.2954030930995941, 0.29181134700775146, 2.16...
1    2                                               None

如果您有多个句子,只需通过列表:

>>> import pandas as pd
>>> sentences = ['Absence of sanity', 'its a new day', 'make the best of it']
>>> embeddings = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2, 3], 'Embedding': None})
>>> df['Embedding'] = embeddings.tolist()
>>> print(df.head())
   foo                                          Embedding
0    1  [0.29540303349494934, 0.29181137681007385, 2.1...
1    2  [0.0362740121781826, -0.8035800457000732, 2.44...
2    3  [-0.4539063572883606, -0.4333038330078125, 2.2...

If I understand correctly, you'd like to insert a list (embedding) into a cell.

Try using at:

>>> import pandas as pd
>>> from sentence_transformers import SentenceTransformer
>>> sentences = 'Absence of sanity'
>>> embedding = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2], 'Embedding': None})
>>> df.at[0, 'Embedding'] = embedding.tolist()
>>> df.dtypes
foo           int64
Embedding    object
>>> df.head()
dtype: object
   foo                                          Embedding
0    1  [0.2954030930995941, 0.29181134700775146, 2.16...
1    2                                               None

If you have multiple sentences, just pass the list:

>>> import pandas as pd
>>> sentences = ['Absence of sanity', 'its a new day', 'make the best of it']
>>> embeddings = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2, 3], 'Embedding': None})
>>> df['Embedding'] = embeddings.tolist()
>>> print(df.head())
   foo                                          Embedding
0    1  [0.29540303349494934, 0.29181137681007385, 2.1...
1    2  [0.0362740121781826, -0.8035800457000732, 2.44...
2    3  [-0.4539063572883606, -0.4333038330078125, 2.2...
尸血腥色 2025-02-04 14:49:44

我找到了另一种方法,请告诉我它是否有效:

def embed_text(sentence):
       return model.encode(sentence)
nw_data['Embeddings'] = nw_data['Answers'].apply(embed_text)

I found another way to do this, pls tell me if it works:

def embed_text(sentence):
       return model.encode(sentence)
nw_data['Embeddings'] = nw_data['Answers'].apply(embed_text)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文