python - 替换数据框中不包含某些单词的所有元素
我有一个非常大的数据框,我想用 NaN 替换所有不包含特定单词的元素(同时保持第一个“id”列不变)。
例如:
index id text1 text2 ...
1 123 {'"key'": '"living_space'" '"value'": '"01.04.2022'"} ...
2 124 {'"key'": '"rooms'" '"value'": '"3'"} ...
3 125 23 {'"key'": '"rooms'" ...
4 126 45 Apartment sold ...
我想保留数据框中包含单词 key 或 value 的所有元素,并用 nan 替换所有其他元素,所以我会得到一个像这样的数据框:
index id text1 text2 ...
1 123 {'"key'": '"living_space'" '"value'": '"01.04.2022'"} ...
2 124 {'"key'": '"rooms'" '"value'": '"3'"} ...
3 125 nan {'"key'": '"rooms'" ...
4 126 nan nan ...
我尝试使用以下代码,但它只是清除整个数据集。
l1 = ['key', 'value']
df.iloc[:,1:] = df.iloc[:,1:].applymap(lambda x: x if set(x.split()).intersection(l1) else '')
提前致谢。
I have a very large dataframe and I want to substitute all elements that do not contain a specific word with NaN (while keeping the first "id" column unchanged).
For example:
index id text1 text2 ...
1 123 {'"key'": '"living_space'" '"value'": '"01.04.2022'"} ...
2 124 {'"key'": '"rooms'" '"value'": '"3'"} ...
3 125 23 {'"key'": '"rooms'" ...
4 126 45 Apartment sold ...
I want to keep all elements in the dataframe that contain the words key or value and substitute all else with nan, so I would get a dataframe like:
index id text1 text2 ...
1 123 {'"key'": '"living_space'" '"value'": '"01.04.2022'"} ...
2 124 {'"key'": '"rooms'" '"value'": '"3'"} ...
3 125 nan {'"key'": '"rooms'" ...
4 126 nan nan ...
I have tried using the following code, but it is just clears the whole dataset.
l1 = ['key', 'value']
df.iloc[:,1:] = df.iloc[:,1:].applymap(lambda x: x if set(x.split()).intersection(l1) else '')
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
考虑以下方法来解决问题。它由2个部分组成。 (1)在函数
substring_filter
中实现了决定是否保留或删除数据的逻辑 - 我们只需检查target> target
字符串即将包含Words
words
string /代码>。 (2)实际过滤是用
np.Where
- Numpy的非常令人信服的辅助功能。结果是:
Consider the following approach to solve the problem. It consists of 2 parts. (1) The logic to decide whether to keep or to erase data is implemented in the function
substring_filter
- we simply check iftarget
string contains any word fromwords
. (2) Actual filtering is performed withnp.where
- very convinient helper function from numpy.Result is: