选择数据框中某个范围内的列
我正在用 Python 处理数据框。我有 10 天的原始数据框。我已经划分了每天的数据框并尝试绘制。我在某些列(这里是 y 和 z)中有一些奇怪的值,所以我尝试使用“之间方法”来指定我的范围(0,100)。该代码正在运行,但我收到警告。有人可以帮我吗?
for df in ((listofDF)):
if len(df) != 0:
f_df = df[df[' y'].between(0,100)]
f_df = f_df[df[' z'].between(0,100)]
maxTemp = f_df[' y']
minTemp = f_df[' z']
Time = f_df['x']
plt.plot(x,y)
plt.plot(x,z)
我收到的警告是,UserWarning: Boolean Series key will be reindexed to match DataFrame index. f_df = f_df[df['y']. Between(0,100)]
I am working on dataframes in Python. I have original dataframe for 10 days. I have divided that dataframe for each day and trying to plot. I have some strange values in some columns(here y and z) ,so I am trying to use 'between method' to specify my range (0,100). The code is working, but I am getting warning. Can anyone help me please ?
for df in ((listofDF)):
if len(df) != 0:
f_df = df[df[' y'].between(0,100)]
f_df = f_df[df[' z'].between(0,100)]
maxTemp = f_df[' y']
minTemp = f_df[' z']
Time = f_df['x']
plt.plot(x,y)
plt.plot(x,z)
The warning I am getting is, UserWarning: Boolean Series key will be reindexed to match DataFrame index.
f_df = f_df[df[' y'].between(0,100)]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
TL;DR 解决方案
将
f_df = f_df[df[' z']. Between(0, 100)]
更改为f_df = f_df[f_df[' z' ]. Between(0, 100)]
您收到的警告是因为这一行:
There's an issue with this line, you can find it?
您正在使用
df
来索引f_df
。您在这里实质上要做的是获取df
中z
列介于 0 和 100 之间的行,因此假设 df 中的第 2 行和第 4 行。但是,在 f_df 中,行可能完全不同。这意味着在 f_df(这是一个不同的数据帧)中,
z
介于 0 到 100 之间的行是第 3 行和第 10 行。由于您使用df
来索引 < code>f_df 从这个意义上来说(就像您获得满足df
中的条件的索引,并使用这些索引从f_df
中选择行),熊猫告诉您f_df
的索引用于决定保留哪些行,这可能不是您想要的。因此,当您对
df
进行过滤并返回第 1 行和第 10 行时,它将从f_df
中选择第 1 行和第 10 行。或者更准确地说 - 它将选择索引 1 和 10。在您的情况下,这就是您想要的,因为在创建
f_df
数据帧时会保留索引,如打印出来时左侧的索引所示。TL;DR Solution
Change
f_df = f_df[df[' z'].between(0, 100)]
tof_df = f_df[f_df[' z'].between(0, 100)]
The warning you are getting is because of this line:
There's an issue with this line, can you spot it?
You're using
df
to indexf_df
. What you're essentially doing here is getting the rows where indf
, columnz
is between 0 and 100, so let's say in df that's rows 2 and 4.However, in f_df, the rows could be completely different. Meaning that in f_df (which is a different dataframe), the rows where
z
is between 0 and 100 are rows 3 and 10. Since you're usingdf
to indexf_df
in this sense (as in you're getting the indices that satisfy the condition indf
and using these indices to select rows fromf_df
), pandas is telling you thatf_df
's index is used to decide which rows to keep, which may not be what you want.So when you do the filter on
df
and it returns rows 1 and 10, it will choose rows 1 and 10 fromf_df
. Or to be more accurate - it will choose the indices 1 and 10.In your case, it is what you want because the indices are retained when you create the
f_df
dataframe, as seen by the indices on the left when you print it out.