python - 替换数据框中不包含某些单词的所有元素

发布于 2025-01-18 09:02:35 字数 1176 浏览 3 评论 0原文

我有一个非常大的数据框，我想用 NaN 替换所有不包含特定单词的元素（同时保持第一个“id”列不变）。

例如：

index  id    text1                        text2                        ...
1      123   {'"key'": '"living_space'"   '"value'": '"01.04.2022'"}   ...
2      124   {'"key'": '"rooms'"          '"value'": '"3'"}            ...
3      125   23                           {'"key'": '"rooms'"          ...
4      126   45                           Apartment sold               ...

我想保留数据框中包含单词 key 或 value 的所有元素，并用 nan 替换所有其他元素，所以我会得到一个像这样的数据框：

index  id    text1                        text2                        ...
1      123   {'"key'": '"living_space'"   '"value'": '"01.04.2022'"}   ...
2      124   {'"key'": '"rooms'"          '"value'": '"3'"}            ...
3      125   nan                          {'"key'": '"rooms'"          ...
4      126   nan                          nan                          ...

我尝试使用以下代码，但它只是清除整个数据集。

l1 = ['key', 'value']
df.iloc[:,1:] = df.iloc[:,1:].applymap(lambda x: x if set(x.split()).intersection(l1) else '')

提前致谢。

原文

I have a very large dataframe and I want to substitute all elements that do not contain a specific word with NaN (while keeping the first "id" column unchanged).

For example:

index  id    text1                        text2                        ...
1      123   {'"key'": '"living_space'"   '"value'": '"01.04.2022'"}   ...
2      124   {'"key'": '"rooms'"          '"value'": '"3'"}            ...
3      125   23                           {'"key'": '"rooms'"          ...
4      126   45                           Apartment sold               ...

I want to keep all elements in the dataframe that contain the words key or value and substitute all else with nan, so I would get a dataframe like:

index  id    text1                        text2                        ...
1      123   {'"key'": '"living_space'"   '"value'": '"01.04.2022'"}   ...
2      124   {'"key'": '"rooms'"          '"value'": '"3'"}            ...
3      125   nan                          {'"key'": '"rooms'"          ...
4      126   nan                          nan                          ...

I have tried using the following code, but it is just clears the whole dataset.

l1 = ['key', 'value']
df.iloc[:,1:] = df.iloc[:,1:].applymap(lambda x: x if set(x.split()).intersection(l1) else '')

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

在巴黎塔顶看东京樱花 2025-01-25 09:02:35

考虑以下方法来解决问题。它由2个部分组成。（1）在函数substring_filter中实现了决定是否保留或删除数据的逻辑 - 我们只需检查target> target字符串即将包含Words words string /代码>。（2）实际过滤是用np.Where - Numpy的非常令人信服的辅助功能。

import numpy as np
import pandas as pd


def substring_filter(target, words):
    for word in words:
        if word in target:
            return True
    return False


if __name__ == '__main__':

    df = pd.DataFrame({
        'A': [1, 2, 3, 4],
        'B': [True, False, False, True],
        'C': ['{"key": 1}', '{"value": 2}', 'text', 'abc']})

    words_to_search = ('key', 'value')
    df.loc[:, 'C'] = np.where(
        df.loc[:, 'C'].apply(lambda x: substring_filter(x, words_to_search)),
        df.loc[:, 'C'],
        None)
    print(df)

结果是：

   A      B             C
0  1   True    {"key": 1}
1  2  False  {"value": 2}
2  3  False          None
3  4   True          None

Consider the following approach to solve the problem. It consists of 2 parts. (1) The logic to decide whether to keep or to erase data is implemented in the function substring_filter - we simply check if target string contains any word from words. (2) Actual filtering is performed with np.where - very convinient helper function from numpy.

import numpy as np
import pandas as pd


def substring_filter(target, words):
    for word in words:
        if word in target:
            return True
    return False


if __name__ == '__main__':

    df = pd.DataFrame({
        'A': [1, 2, 3, 4],
        'B': [True, False, False, True],
        'C': ['{"key": 1}', '{"value": 2}', 'text', 'abc']})

    words_to_search = ('key', 'value')
    df.loc[:, 'C'] = np.where(
        df.loc[:, 'C'].apply(lambda x: substring_filter(x, words_to_search)),
        df.loc[:, 'C'],
        None)
    print(df)

Result is:

   A      B             C
0  1   True    {"key": 1}
1  2  False  {"value": 2}
2  3  False          None
3  4   True          None

回复收藏 0 原文

~没有更多了~

关于作者

今天小雨转甜

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

python - 替换数据框中不包含某些单词的所有元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

882123719

朦胧时间

alipaysp_DQOPIT9H5Y

眼藏柔

微信用户

寻梦旅人

友情链接

python - 替换数据框中不包含某些单词的所有元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

882123719

朦胧时间

alipaysp_DQOPIT9H5Y

眼藏柔

微信用户

寻梦旅人

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。