群组丢弃一些变量

发布于 2025-02-02 03:26:21 字数 308 浏览 3 评论 0原文

这是原始数据,我需要所有变量的每一年的平均值。

“原始data”

但是当我使用groupby('Year')命令时,它正在删除所有变量除了“ lnmcap”和“ epu”。

“

为什么会发生这种情况以及需要做什么?

This is the original data and I need the mean of each year of all the variables.

Original data

But when I am using groupby('year') command, it is dropping all variables except 'lnmcap' and 'epu'.

Post Groupby output image

Why this is happening and what needs to be done?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

○闲身 2025-02-09 03:26:21

其他列可能具有数据的对象或字符串类型,而不是整数,因此仅'inmcap''epu'具有平均列。
使用ds.dtypes或简单地ds.info()检查列中的数据类型

它出现为对象/字符串类型,然后使用

ds=ds.drop('company',axis=1)
column_names=ds.columns
for i in column_names:
   ds[i]=ds[i].astype(str).astype(float)

此功能可以正常工作

Probably the other columns have object or string type of the data, instead of integer, as a result of which only 'Inmcap' and 'epu' has got the average column.
Use ds.dtypes or simply ds.info() to check the data types of data in the columns

it comes out to be object/string type then use

ds=ds.drop('company',axis=1)
column_names=ds.columns
for i in column_names:
   ds[i]=ds[i].astype(str).astype(float)

This could work

绅刃 2025-02-09 03:26:21

您可能需要在获得均值之前将所有数值列转换为浮动

cols = list(ds.columns)

#remove irrelevant columns
cols.pop(cols.index('company'))
cols.pop(cols.index('year'))

#convert remaining relevant columns to float
for col in cols:
    ds[col] = pd.to_numeric(ds[col], errors='coerce')
    
#after that you can apply the aggregation
ds.groupby('year').mean()

You might want to convert all numerical columns to float before getting their mean, for example

cols = list(ds.columns)

#remove irrelevant columns
cols.pop(cols.index('company'))
cols.pop(cols.index('year'))

#convert remaining relevant columns to float
for col in cols:
    ds[col] = pd.to_numeric(ds[col], errors='coerce')
    
#after that you can apply the aggregation
ds.groupby('year').mean()
倚栏听风 2025-02-09 03:26:21

您需要将数字列转换为浮点类型。使用df.info()检查各种数据类型。

for col in ds.select_dtypes(['object']).columns:
    try:
        ds[col] = ds[col].astype('float')
    except:
        continue

之后,使用df.info()再次检查。那些带有“ 1.604809'”对象的列将转换为float 1.604809,

有时,该列可能包含一些无法转换为float的“脏”数据。在这种情况下,您可以使用errors ='Coerce'的代码以下代码,表示非数字数据变为nan

column_names = list(ds.columns)
column_names.remove('company')
column_names.remove('year')
for col in column_names:
    ds[col] = pd.to_numeric(ds[col], errors='coerce')    #this will convert to numeric, whereas non-numeric becomes NaN

You will need to convert the numeric columns to float types. Use df.info() to check the various data types.

for col in ds.select_dtypes(['object']).columns:
    try:
        ds[col] = ds[col].astype('float')
    except:
        continue

After this, use df.info() to check again. Those columns with objects like '1.604809' will be converted to float 1.604809

Sometimes, the column may contain some "dirty" data that cannot be converted to float. In this case, you could use below code with errors='coerce' means non-numeric data becomes NaN

column_names = list(ds.columns)
column_names.remove('company')
column_names.remove('year')
for col in column_names:
    ds[col] = pd.to_numeric(ds[col], errors='coerce')    #this will convert to numeric, whereas non-numeric becomes NaN
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文