pandas 根据一个标题值删除一列

发布于 2025-01-09 21:03:47 字数 1139 浏览 1 评论 0 原文

我有这个数据帧，

name,01100MS,02200MS,02500MS,03100MS,06400MS
lat,626323,616720,616288,611860,622375
long,5188431,5181393,5173583,5165895,5152605
alt,915,1499,1310,1235,190
1920-01-01,1,4.1,4.41,4.441,4.4441
1920-01-02,2,4.2,4.42,4.442,4.4442
1920-01-03,3,4.3,4.43,4.443,4.4443
1920-01-04,4,4.4,4.44,4.444,4.4444
1920-01-05,5,4.5,4.45,4.445,4.4445
1920-01-06,6,4.6,4.46,4.446,4.4446
1920-01-07,7,4.7,4.47,4.447,4.4447
1920-01-08,8,4.8,4.48,4.448,4.4448
1920-01-09,9,4.9,4.49,4.449,4.4449
1920-01-10,10,5,4.5,4.45,4.445
1920-01-11,11,5.1,4.51,4.451,4.4451

我将其读为：

 dfr     =  pd.read_csv(f_name,
                        parse_dates           = True,
                        index_col             = 0,
                        header                = [0,1,2,3],
                        infer_datetime_format = True,
                        cache_dates=True)

我想根据第四行中的阈值删除一些列，由于我使用多个索引，这是头之一。

我想做类似的事情：

for column in dfr:
    if dfr[column][2] <= 1300.:
        dfr = dfr.drop(column,axis=1)

问题是我无法在多头内选择正确的“头”。我也想以一种聪明的方式做到这一点，换句话说，避免循环。

原文

I have this dataframe

name,01100MS,02200MS,02500MS,03100MS,06400MS
lat,626323,616720,616288,611860,622375
long,5188431,5181393,5173583,5165895,5152605
alt,915,1499,1310,1235,190
1920-01-01,1,4.1,4.41,4.441,4.4441
1920-01-02,2,4.2,4.42,4.442,4.4442
1920-01-03,3,4.3,4.43,4.443,4.4443
1920-01-04,4,4.4,4.44,4.444,4.4444
1920-01-05,5,4.5,4.45,4.445,4.4445
1920-01-06,6,4.6,4.46,4.446,4.4446
1920-01-07,7,4.7,4.47,4.447,4.4447
1920-01-08,8,4.8,4.48,4.448,4.4448
1920-01-09,9,4.9,4.49,4.449,4.4449
1920-01-10,10,5,4.5,4.45,4.445
1920-01-11,11,5.1,4.51,4.451,4.4451

I read it as:

 dfr     =  pd.read_csv(f_name,
                        parse_dates           = True,
                        index_col             = 0,
                        header                = [0,1,2,3],
                        infer_datetime_format = True,
                        cache_dates=True)

I would like to drop some columns according to the a threshold value in the 4th row, that is one of the heads due to the the fact that I use multiple index.

I would like to do something like:

for column in dfr:
    if dfr[column][2] <= 1300.:
        dfr = dfr.drop(column,axis=1)

The problem is that I am not able to select the right "head" inside the multi heads. I would like also to do it in a smart way, avoiding in other word the cycle.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜访吸血鬼 2025-01-16 21:03:47

您可以通过 Index 选择第四级别.get_level_values 并选择带有反转掩码的列 - 类似于 1300 href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html" rel="nofollow noreferrer">DataFrame.loc：

df = df.loc[:,df.columns.get_level_values(3).astype(int) > 1300]

或者，如果不需要总是转换为整数，则可以在解决方案之前设置值：

df = df.rename(columns=int, level=3)
print (df.columns)
MultiIndex([('01100MS', '626323', '5188431',  915),
            ('02200MS', '616720', '5181393', 1499),
            ('02500MS', '616288', '5173583', 1310),
            ('03100MS', '611860', '5165895', 1235),
            ('06400MS', '622375', '5152605',  190)],
           names=['name', 'lat', 'long', 'alt'])

df = df.loc[:,df.columns.get_level_values(3) > 1300]
print (df)
name       02200MS 02500MS
lat         616720  616288
long       5181393 5173583
alt           1499    1310
1920-01-01     4.1    4.41
1920-01-02     4.2    4.42
1920-01-03     4.3    4.43
1920-01-04     4.4    4.44
1920-01-05     4.5    4.45
1920-01-06     4.6    4.46
1920-01-07     4.7    4.47
1920-01-08     4.8    4.48
1920-01-09     4.9    4.49
1920-01-10     5.0    4.50
1920-01-11     5.1    4.51

You can select forth level by Index.get_level_values and select columns with invert mask - greater like 1300 in DataFrame.loc:

df = df.loc[:,df.columns.get_level_values(3).astype(int) > 1300]

Or if dont need always converting to integers is possible set values before solution:

df = df.rename(columns=int, level=3)
print (df.columns)
MultiIndex([('01100MS', '626323', '5188431',  915),
            ('02200MS', '616720', '5181393', 1499),
            ('02500MS', '616288', '5173583', 1310),
            ('03100MS', '611860', '5165895', 1235),
            ('06400MS', '622375', '5152605',  190)],
           names=['name', 'lat', 'long', 'alt'])

df = df.loc[:,df.columns.get_level_values(3) > 1300]
print (df)
name       02200MS 02500MS
lat         616720  616288
long       5181393 5173583
alt           1499    1310
1920-01-01     4.1    4.41
1920-01-02     4.2    4.42
1920-01-03     4.3    4.43
1920-01-04     4.4    4.44
1920-01-05     4.5    4.45
1920-01-06     4.6    4.46
1920-01-07     4.7    4.47
1920-01-08     4.8    4.48
1920-01-09     4.9    4.49
1920-01-10     5.0    4.50
1920-01-11     5.1    4.51

回复收藏 0 原文

~没有更多了~