Python 3X-聚集熊猫DF行和列

发布于 2025-01-25 00:42:09 字数 6626 浏览 4 评论 0原文

我正在将CSV文件读为PANDAS数据框架，该文件具有我需要协议的行和列的重复值（行和COLS）。

The csv file looks like this:

p/q/[val]	1	1	1	2	2	2	3	3	4	4
1	85.09227753	79.70470428	14.60372257	35.94606018	38.66883087	43.14413452	62.1992569	61.9662056	47.01652908	55.35105515
1	77.67690277	72.28933716	35.8657341	10.12055206	22.38080597	35.09898376	39.91122818	48.39712524	37.95729065	42.97728348
1	71.51867676	66.13111115	38.59518433	22.38080406	11.31649399	29.02029228	43.14096069	44.00777054	41.19556427	36.96442413
2	78.38805389	73.00048828	34.14358902	35.09897995	29.02029228	13.26141262	36.20913696	52.90936279	36.04150391	41.10220718
2	87.69218445	82.30461884	62.14162445	39.91123581	43.14096451	36.20913696	15.41283798	52.42485428	53.06882477	55.80033112
2	68.89026642	63.50270844	52.83700562	45.54430771	41.67800522	48.60984421	50.78954315	13.5169096	37.65000153	36.0362854
3	71.05574036	65.66817474	37.6963768	34.8531723	41.11572266	36.43598175	55.10356522	39.11390305	11.24700832	23.63844109
3	68.75523376	63.36768723	46.03090668	43.18769836	49.4425621	45.19208527	53.05971527	39.51002502	23.63843918	11.75947094
4	71.51867676	66.13111115	38.59518433	22.38080406	11.31649399	29.02029228	43.14096069	44.00777054	41.19556427	36.96442413
4	68.89026642	63.50270844	52.83700562	45.54430771	41.67800522	48.60984421	50.78954315	13.5169096	37.65000153	36.0362854

When I read the csv file into a pandas df, it converts the duplicated column names to decimal numbers and looks like this:

p/q/[val]	1	1.1	1.2	2	2.1	2.2	3	3.1	4	4.1
1	85.09227753	79.70470428	14.60372257	35.94606018	38.66883087	43.14413452	62.1992569	61.9662056	47.01652908	55.35105515
1	77.67690277	72.28933716	35.8657341	10.12055206	22.38080597	35.09898376	39.91122818	48.39712524	37.95729065	42.97728348
1	71.51867676	66.13111115	38.59518433	22.38080406	11.31649399	29.02029228	43.14096069	44.00777054	41.19556427	36.96442413
2	78.38805389	73.00048828	34.14358902	35.09897995	29.02029228	13.26141262	36.20913696	52.90936279	36.04150391	41.10220718
2	87.69218445	82.30461884	62.14162445	39.91123581	43.14096451	36.20913696	15.41283798	52.42485428	53.06882477	55.80033112
2	68.89026642	63.50270844	52.83700562	45.54430771	41.67800522	48.60984421	50.78954315	13.5169096	37.65000153	36.0362854
3	71.05574036	65.66817474	37.6963768	34.8531723	41.11572266	36.43598175	55.10356522	39.11390305	11.24700832	23.63844109
3	68.75523376	63.36768723	46.03090668	43.18769836	49.4425621	45.19208527	53.05971527	39.51002502	23.63843918	11.75947094
4	71.51867676	66.13111115	38.59518433	22.38080406	11.31649399	29.02029228	43.14096069	44.00777054	41.19556427	36.96442413
4	68.89026642	63.50270844	52.83700562	45.54430771	41.67800522	48.60984421	50.78954315	13.5169096	37.65000153	36.0362854

I need to aggregate the rows and the columns so my final dataframe looks这样：

P/Q/[Val]	3	4	1	60.1641834
27.56410641	49.93709119	43.577702446	2	66.98894882
36.94157547	214	17.57083988	37.96156883	60.24582545
4	37.863796	1	我	Excel
33.09162458	中	2	在	strong

可以使用以下公式以两个步骤进行此操作：

< > step1-汇总行：

step2-汇总列：

我不确定我如何能做到这一点在Python。

原文

I am reading a csv file into a pandas dataframe which has duplicate values for the rows and columns that I need to agregate (both rows and cols).

The csv file looks like this:

p/q/[val]	1	1	1	2	2	2	3	3	4	4
1	85.09227753	79.70470428	14.60372257	35.94606018	38.66883087	43.14413452	62.1992569	61.9662056	47.01652908	55.35105515
1	77.67690277	72.28933716	35.8657341	10.12055206	22.38080597	35.09898376	39.91122818	48.39712524	37.95729065	42.97728348
1	71.51867676	66.13111115	38.59518433	22.38080406	11.31649399	29.02029228	43.14096069	44.00777054	41.19556427	36.96442413
2	78.38805389	73.00048828	34.14358902	35.09897995	29.02029228	13.26141262	36.20913696	52.90936279	36.04150391	41.10220718
2	87.69218445	82.30461884	62.14162445	39.91123581	43.14096451	36.20913696	15.41283798	52.42485428	53.06882477	55.80033112
2	68.89026642	63.50270844	52.83700562	45.54430771	41.67800522	48.60984421	50.78954315	13.5169096	37.65000153	36.0362854
3	71.05574036	65.66817474	37.6963768	34.8531723	41.11572266	36.43598175	55.10356522	39.11390305	11.24700832	23.63844109
3	68.75523376	63.36768723	46.03090668	43.18769836	49.4425621	45.19208527	53.05971527	39.51002502	23.63843918	11.75947094
4	71.51867676	66.13111115	38.59518433	22.38080406	11.31649399	29.02029228	43.14096069	44.00777054	41.19556427	36.96442413
4	68.89026642	63.50270844	52.83700562	45.54430771	41.67800522	48.60984421	50.78954315	13.5169096	37.65000153	36.0362854

When I read the csv file into a pandas df, it converts the duplicated column names to decimal numbers and looks like this:

p/q/[val]	1	1.1	1.2	2	2.1	2.2	3	3.1	4	4.1
1	85.09227753	79.70470428	14.60372257	35.94606018	38.66883087	43.14413452	62.1992569	61.9662056	47.01652908	55.35105515
1	77.67690277	72.28933716	35.8657341	10.12055206	22.38080597	35.09898376	39.91122818	48.39712524	37.95729065	42.97728348
1	71.51867676	66.13111115	38.59518433	22.38080406	11.31649399	29.02029228	43.14096069	44.00777054	41.19556427	36.96442413
2	78.38805389	73.00048828	34.14358902	35.09897995	29.02029228	13.26141262	36.20913696	52.90936279	36.04150391	41.10220718
2	87.69218445	82.30461884	62.14162445	39.91123581	43.14096451	36.20913696	15.41283798	52.42485428	53.06882477	55.80033112
2	68.89026642	63.50270844	52.83700562	45.54430771	41.67800522	48.60984421	50.78954315	13.5169096	37.65000153	36.0362854
3	71.05574036	65.66817474	37.6963768	34.8531723	41.11572266	36.43598175	55.10356522	39.11390305	11.24700832	23.63844109
3	68.75523376	63.36768723	46.03090668	43.18769836	49.4425621	45.19208527	53.05971527	39.51002502	23.63843918	11.75947094
4	71.51867676	66.13111115	38.59518433	22.38080406	11.31649399	29.02029228	43.14096069	44.00777054	41.19556427	36.96442413
4	68.89026642	63.50270844	52.83700562	45.54430771	41.67800522	48.60984421	50.78954315	13.5169096	37.65000153	36.0362854

I need to aggregate the rows and the columns so my final dataframe looks like this:

p/q/[val]	1	2	3	4
1	60.1641834	27.56410641	49.93709119	43.57702446
2	66.98894882	36.94157547	36.87710746	43.28319232
3	58.76235326	41.70453707	46.69680214	17.57083988
4	60.24582545	33.09162458	37.863796	37.96156883

In Excel I can do this in two steps using the following formulae:

Step1 - aggregate the rows:

Step2 - aggregate the columns:

I am just not sure how I can do this in python.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

记忆消瘦 2025-02-01 00:42:09

如果您的每列/行确实具有相同的索引：

(df
 .set_index('p/q/[val]')
 .groupby(level=0).mean()
 .groupby(level=0, axis=1).mean()
 )

替代：

(df
 .melt(id_vars='p/q/[val]')
 .groupby(['p/q/[val]', 'variable'])['value'].mean()
 .unstack()
 )

输出：

                  1          2          3          4
p/q/[val]                                            
1          60.164183  27.564106  49.937091  43.577024
2          66.988949  36.941575  36.877107  43.283192
3          58.762353  41.704537  46.696802  17.570840
4          60.245825  33.091625  37.863796  37.961569

如果列在表单1、1.1等上

(df
 .set_index('p/q/[val]')
 .rename(columns=lambda x: x.rpartition('.')[0])  # or x[0] if single digits
 .groupby(level=0).mean()
 .groupby(level=0, axis=1).mean()
 )

If you really have the same indices per column/row:

(df
 .set_index('p/q/[val]')
 .groupby(level=0).mean()
 .groupby(level=0, axis=1).mean()
 )

Alternative:

(df
 .melt(id_vars='p/q/[val]')
 .groupby(['p/q/[val]', 'variable'])['value'].mean()
 .unstack()
 )

Output:

                  1          2          3          4
p/q/[val]                                            
1          60.164183  27.564106  49.937091  43.577024
2          66.988949  36.941575  36.877107  43.283192
3          58.762353  41.704537  46.696802  17.570840
4          60.245825  33.091625  37.863796  37.961569

If the columns are on the form 1, 1.1, etc, add a rename step:

(df
 .set_index('p/q/[val]')
 .rename(columns=lambda x: x.rpartition('.')[0])  # or x[0] if single digits
 .groupby(level=0).mean()
 .groupby(level=0, axis=1).mean()
 )

回复收藏 0 原文

~没有更多了~