在行和列上迭代并根据条件替换值

发布于 2025-02-11 00:55:02 字数 1335 浏览 1 评论 0原文

  1. 在整个熊猫数据框架中,我们如何将所有数字值除以10到100之间?

条件:

  1. 时间或任何要忽略的非数字列。
  2. 这些数字可以位于任何行或列中。
时间N1N2N3N4
11:5012340
12:5056708
13:508076500

如果需要,请使用此代码:


import pandas as pd
import numpy as np

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 80],
          'n2': [2, 6 ,7],
          'n3': [3, 70 ,6],
          'n4': [40, 8, 500],
        }

df1 = pd.DataFrame(data = data_1)
df1

尝试1:它似乎无法正常工作

j = 0
k = 0
for i in df:
    if df[j][k] > 10 and df[j][k] < 100:
        df[j][k] = df[j][k] / 10
        j = j + 1
    else:
        pass;
    k = k + 1

预期结果:

  1. ,因为80、70、40是10之间的数字和100,在同一数据框架中,它们都被X/10所取代。
  • 80 - &GT; 80/10 = 8
  • 70-&GT; 70/10 = 7
  • 40 - &GT; 40/10 = 4
  1. 整个时间列被忽略,因为它是非数字值。
  1. How do we divide all numeric values by 10 in the entire pandas dataframe lying between 10 and 100?

conditions:

  1. Time or any non-numeric column to be ignored.
  2. The numbers can lie in any row or column.
timen1n2n3n4
11:5012340
12:5056708
13:508076500

Use this code if need be:


import pandas as pd
import numpy as np

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 80],
          'n2': [2, 6 ,7],
          'n3': [3, 70 ,6],
          'n4': [40, 8, 500],
        }

df1 = pd.DataFrame(data = data_1)
df1

Try 1: It doesn't seem to work

j = 0
k = 0
for i in df:
    if df[j][k] > 10 and df[j][k] < 100:
        df[j][k] = df[j][k] / 10
        j = j + 1
    else:
        pass;
    k = k + 1

Expected Result:

  1. Since 80, 70, 40 are the numbers lying between 10 and 100, they are all replaced by x/10 in the same dataframe.
  • 80 --> 80/10 = 8
  • 70 --> 70/10 = 7
  • 40 --> 40/10 = 4
  1. Entire column of time is ignored as it is non-numeric value.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

半枫 2025-02-18 00:55:02

使用dataframe.applymap在使用大数据集时非常慢,它的扩展不佳。如果可能,您应该始终寻找矢量化解决方案。

在这种情况下,您可以掩盖10到100之间的值,并使用 dataframe.mask (或 dataFrame.Where 如果您否定条件)。

# select the numeric columns
num_cols = df1.select_dtypes(include="number").columns

# In DataFrame.mask `df` is replaced by the calling DataFrame, 
# in this case df = df1[num_cols]
df1[num_cols] = (
    df1[num_cols].mask(lambda df: (df > 10) & (df < 100), 
                       lambda df: df // 10)
)

输出:

>>> df1

    time  n1  n2  n3   n4
0  11:50   1   2   3    4
1  12:50   5   6   7    8
2  13:50   8   7   6  500

设置:

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 80],
          'n2': [2, 6 ,7],
          'n3': [3, 70 ,6],
          'n4': [40, 8, 500],
        }

df1 = pd.DataFrame(data = data_1)

Using DataFrame.applymap is pretty slow when working with a big data set, it doesn't scale well. You should always look for a vectorized solution if possible.

In this case, you can mask the values between 10 and 100 and perform the conditional replacement using DataFrame.mask (or DataFrame.where if you negate the condition).

# select the numeric columns
num_cols = df1.select_dtypes(include="number").columns

# In DataFrame.mask `df` is replaced by the calling DataFrame, 
# in this case df = df1[num_cols]
df1[num_cols] = (
    df1[num_cols].mask(lambda df: (df > 10) & (df < 100), 
                       lambda df: df // 10)
)

Output:

>>> df1

    time  n1  n2  n3   n4
0  11:50   1   2   3    4
1  12:50   5   6   7    8
2  13:50   8   7   6  500

Setup:

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 80],
          'n2': [2, 6 ,7],
          'n3': [3, 70 ,6],
          'n4': [40, 8, 500],
        }

df1 = pd.DataFrame(data = data_1)
盗心人 2025-02-18 00:55:02

这项工作是:

df1[['n1','n2','n3','n4']].applymap(lambda x : x/10 if 10 < x < 100 else x)
    n1  n2  n3  n4
0   1.0 2   3.0 4.0
1   5.0 6   7.0 8.0
2   8.0 7   6.0 500.0

Does this work:

df1[['n1','n2','n3','n4']].applymap(lambda x : x/10 if 10 < x < 100 else x)
    n1  n2  n3  n4
0   1.0 2   3.0 4.0
1   5.0 6   7.0 8.0
2   8.0 7   6.0 500.0
书信已泛黄 2025-02-18 00:55:02

您可以选择具有数字数据类型的列,请使用.applymap()执行除法操作,然后将其重新分配回原始dataFrame。值得注意的是,这不需要硬编码您要预先转换的列:

numerics = df1.select_dtypes(include="number")
numerics = numerics.applymap(lambda x: x // 10 if 10 < x < 100 else x)
df1[numerics.columns] = numerics

此输出:

    time  n1  n2  n3   n4
0  11:50   1   2   3    4
1  12:50   5   6   7    8
2  13:50   8   7   6  500

You can select the columns which have numeric datatypes, use .applymap() to perform the division operation, and then reassign back to the original dataframe. Notably, this doesn't require hardcoding the columns you want to transform in advance:

numerics = df1.select_dtypes(include="number")
numerics = numerics.applymap(lambda x: x // 10 if 10 < x < 100 else x)
df1[numerics.columns] = numerics

This outputs:

    time  n1  n2  n3   n4
0  11:50   1   2   3    4
1  12:50   5   6   7    8
2  13:50   8   7   6  500
萝莉病 2025-02-18 00:55:02

尝试以下

def repl(df, cols):
    for col in cols:
        df[col] = df[col].apply(lambda x: x//10 if x >= 10 and x <= 100 else x)
    return df

new_df = repl(df1, ['n1', 'n2', 'n3', 'n4'])
new_df

输出:

   time   n1    n2  n3  n4
0   11:50   1   2   3   4
1   12:50   5   6   7   8
2   13:50   8   7   6   500

Try the following

def repl(df, cols):
    for col in cols:
        df[col] = df[col].apply(lambda x: x//10 if x >= 10 and x <= 100 else x)
    return df

new_df = repl(df1, ['n1', 'n2', 'n3', 'n4'])
new_df

Output:

   time   n1    n2  n3  n4
0   11:50   1   2   3   4
1   12:50   5   6   7   8
2   13:50   8   7   6   500
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文