pandas loc not ot dy n of dive the the Inder

发布于 2025-02-02 03:11:09 字数 1348 浏览 4 评论 0原文

我正在尝试使用Pandas的LOC来更改某些列的值。

我有一个主DF，其大约200k行具有以下结构： [Col1，Col2，Col3，Col4，Col5]。

我需要根据带有Value Val的行数更改Col4和Col5的一些值。在伪代码中会这样：

for each row in dataframe:
    if col2 == value:
        then col4 and col5 change its value

我制作了一种方法来创建每个Col2值的较小数据框架，以与它们一起更改Col4和col5，然后将它们串联。在这个较小的数据框中，我使用的是这样的pandas loc：

smaller_df.loc[range_to_change_col4, col4] = new_col4_value
smaller_df.loc[range_to_change_col5, col5] = new_col5_value

数据示例：

Original ->
class;id;url;aug;iterations
image_class;1;image_url;0;0
Expected ->
class;id;url;aug;iterations
image_class;1;image_url;1;1

代码示例：

# Number of images I need to augment / 
# number of images I already have
if images_to_add / df.shape[0] < 1:
        # Random index' rows
        to_update = df.sample(
            n = to_add, # number of images I need to create
            replace=True, 
            random_state=1
            ).index
        # real image will be augmented
        df.loc[to_update,'aug'] = 1
        # How many times real image will be augmented 
        df.loc[to_update,'iterations'] = 1

我的问题是，在每个较小的DF中，所有行都不会更新其值。我对熊猫的新手相对较新，我不是问题所在。也许记忆问题？对我如何避免这种情况有什么想法吗？

原文

I'm trying to use pandas' loc to change some column's value.

I have a main df which has about 200k rows with the following structure:
[col1, col2, col3, col4, col5].

I need to change some of the values of col4 and col5 based on the number of rows with value val. In pseudocode would something like this:

for each row in dataframe:
    if col2 == value:
        then col4 and col5 change its value

I made a method to create smaller dataframes of each col2 values to work with them to change col4 and col5 and then concatenate them. In this smaller dataframe I'm using pandas loc like that:

smaller_df.loc[range_to_change_col4, col4] = new_col4_value
smaller_df.loc[range_to_change_col5, col5] = new_col5_value

Data sample:

Original ->
class;id;url;aug;iterations
image_class;1;image_url;0;0
Expected ->
class;id;url;aug;iterations
image_class;1;image_url;1;1

Code sample:

# Number of images I need to augment / 
# number of images I already have
if images_to_add / df.shape[0] < 1:
        # Random index' rows
        to_update = df.sample(
            n = to_add, # number of images I need to create
            replace=True, 
            random_state=1
            ).index
        # real image will be augmented
        df.loc[to_update,'aug'] = 1
        # How many times real image will be augmented 
        df.loc[to_update,'iterations'] = 1

My problem is that not in every smaller df all rows update its value. I'm relatively new to pandas and I don't what's the problem. Maybe memory problem? Any idea about how could I avoid this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

謸气贵蔟 2025-02-09 03:11:09

我会使用分配，例如：

.assign(new_colum_name=lambda x: x["colom name"] * 1.3,
    new_colum_name=lambda x: x["colom name"] + x["colom name"],
    new_colum_name=lambda x: x["colom name"].str.replace("2", "3") + x["colom name"]),
    new_colum_name=lambda x: np.where(x["Voorraad"] > 8, 8, x["Voorraad"]),)

I would use assign, like:

.assign(new_colum_name=lambda x: x["colom name"] * 1.3,
    new_colum_name=lambda x: x["colom name"] + x["colom name"],
    new_colum_name=lambda x: x["colom name"].str.replace("2", "3") + x["colom name"]),
    new_colum_name=lambda x: np.where(x["Voorraad"] > 8, 8, x["Voorraad"]),)

回复收藏 0 原文

梦里兽 2025-02-09 03:11:09

已解决：在df.sample中，问题是替换 parameter;将其设置为false解决了问题，现在我可以更改每个值。

新代码是

# Number of images I need to augment / 
# number of images I already have
if images_to_add / df.shape[0] < 1:
        # Random index' rows
        to_update = df.sample(
            n = to_add, # number of images I need to create
            replace=False, 
            random_state=1
            ).index
        # real image will be augmented
        df.loc[to_update,'aug'] = 1
        # How many times real image will be augmented 
        df.loc[to_update,'iterations'] = 1

SOLVED: in df.sample the problem was the replace parameter; setting it to False solved the issue and now I can change every value.

The new code is

# Number of images I need to augment / 
# number of images I already have
if images_to_add / df.shape[0] < 1:
        # Random index' rows
        to_update = df.sample(
            n = to_add, # number of images I need to create
            replace=False, 
            random_state=1
            ).index
        # real image will be augmented
        df.loc[to_update,'aug'] = 1
        # How many times real image will be augmented 
        df.loc[to_update,'iterations'] = 1

回复收藏 0 原文

~没有更多了~