pandas loc not ot dy n of dive the the Inder
我正在尝试使用Pandas的LOC来更改某些列的值。
我有一个主DF,其大约200k行具有以下结构: [Col1,Col2,Col3,Col4,Col5]。
我需要根据带有Value Val的行数更改Col4和Col5的一些值。在伪代码中会这样:
for each row in dataframe:
if col2 == value:
then col4 and col5 change its value
我制作了一种方法来创建每个Col2值的较小数据框架,以与它们一起更改Col4和col5,然后将它们串联。在这个较小的数据框中,我使用的是这样的pandas loc:
smaller_df.loc[range_to_change_col4, col4] = new_col4_value
smaller_df.loc[range_to_change_col5, col5] = new_col5_value
数据示例:
Original ->
class;id;url;aug;iterations
image_class;1;image_url;0;0
Expected ->
class;id;url;aug;iterations
image_class;1;image_url;1;1
代码示例:
# Number of images I need to augment /
# number of images I already have
if images_to_add / df.shape[0] < 1:
# Random index' rows
to_update = df.sample(
n = to_add, # number of images I need to create
replace=True,
random_state=1
).index
# real image will be augmented
df.loc[to_update,'aug'] = 1
# How many times real image will be augmented
df.loc[to_update,'iterations'] = 1
我的问题是,在每个较小的DF中,所有行都不会更新其值。我对熊猫的新手相对较新,我不是问题所在。也许记忆问题?对我如何避免这种情况有什么想法吗?
I'm trying to use pandas' loc to change some column's value.
I have a main df which has about 200k rows with the following structure:
[col1, col2, col3, col4, col5].
I need to change some of the values of col4 and col5 based on the number of rows with value val. In pseudocode would something like this:
for each row in dataframe:
if col2 == value:
then col4 and col5 change its value
I made a method to create smaller dataframes of each col2 values to work with them to change col4 and col5 and then concatenate them. In this smaller dataframe I'm using pandas loc like that:
smaller_df.loc[range_to_change_col4, col4] = new_col4_value
smaller_df.loc[range_to_change_col5, col5] = new_col5_value
Data sample:
Original ->
class;id;url;aug;iterations
image_class;1;image_url;0;0
Expected ->
class;id;url;aug;iterations
image_class;1;image_url;1;1
Code sample:
# Number of images I need to augment /
# number of images I already have
if images_to_add / df.shape[0] < 1:
# Random index' rows
to_update = df.sample(
n = to_add, # number of images I need to create
replace=True,
random_state=1
).index
# real image will be augmented
df.loc[to_update,'aug'] = 1
# How many times real image will be augmented
df.loc[to_update,'iterations'] = 1
My problem is that not in every smaller df all rows update its value. I'm relatively new to pandas and I don't what's the problem. Maybe memory problem? Any idea about how could I avoid this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会使用分配,例如:
I would use assign, like:
已解决:在
df.sample
中,问题是替换
parameter;将其设置为false
解决了问题,现在我可以更改每个值。新代码是
SOLVED: in
df.sample
the problem was thereplace
parameter; setting it toFalse
solved the issue and now I can change every value.The new code is