Python 3X-聚集熊猫DF行和列

发布于 2025-01-25 00:42:09 字数 6626 浏览 4 评论 0原文

我正在将CSV文件读为PANDAS数据框架,该文件具有我需要协议的行和列的重复值(行和COLS)。

The csv file looks like this:

p/q/[val]1112223344
185.0922775379.7047042814.6037225735.9460601838.6688308743.1441345262.199256961.966205647.0165290855.35105515
177.6769027772.2893371635.865734110.1205520622.3808059735.0989837639.9112281848.3971252437.9572906542.97728348
171.5186767666.1311111538.5951843322.3808040611.3164939929.0202922843.1409606944.0077705441.1955642736.96442413
278.3880538973.0004882834.1435890235.0989799529.0202922813.2614126236.2091369652.9093627936.0415039141.10220718
287.6921844582.3046188462.1416244539.9112358143.1409645136.2091369615.4128379852.4248542853.0688247755.80033112
268.8902664263.5027084452.8370056245.5443077141.6780052248.6098442150.7895431513.516909637.6500015336.0362854
371.0557403665.6681747437.696376834.853172341.1157226636.4359817555.1035652239.1139030511.2470083223.63844109
368.7552337663.3676872346.0309066843.1876983649.442562145.1920852753.0597152739.5100250223.6384391811.75947094
471.5186767666.1311111538.5951843322.3808040611.3164939929.0202922843.1409606944.0077705441.1955642736.96442413
468.8902664263.5027084452.8370056245.5443077141.6780052248.6098442150.7895431513.516909637.6500015336.0362854

When I read the csv file into a pandas df, it converts the duplicated column names to decimal numbers and looks like this:

p/q/[val]11.11.222.12.233.144.1
185.0922775379.7047042814.6037225735.9460601838.6688308743.1441345262.199256961.966205647.0165290855.35105515
177.6769027772.2893371635.865734110.1205520622.3808059735.0989837639.9112281848.3971252437.9572906542.97728348
171.5186767666.1311111538.5951843322.3808040611.3164939929.0202922843.1409606944.0077705441.1955642736.96442413
278.3880538973.0004882834.1435890235.0989799529.0202922813.2614126236.2091369652.9093627936.0415039141.10220718
287.6921844582.3046188462.1416244539.9112358143.1409645136.2091369615.4128379852.4248542853.0688247755.80033112
268.8902664263.5027084452.8370056245.5443077141.6780052248.6098442150.7895431513.516909637.6500015336.0362854
371.0557403665.6681747437.696376834.853172341.1157226636.4359817555.1035652239.1139030511.2470083223.63844109
368.7552337663.3676872346.0309066843.1876983649.442562145.1920852753.0597152739.5100250223.6384391811.75947094
471.5186767666.1311111538.5951843322.3808040611.3164939929.0202922843.1409606944.0077705441.1955642736.96442413
468.8902664263.5027084452.8370056245.5443077141.6780052248.6098442150.7895431513.516909637.6500015336.0362854

I need to aggregate the rows and the columns so my final dataframe looks这样:

P/Q/[Val]34160.1641834
27.5641064149.9370911943.577702446266.98894882
36.9415754721417.5708398837.9615688360.24582545
437.8637961Excel
33.091624582strong

可以使用以下公式以两个步骤进行此操作:

< > step1-汇总行:

”在此处输入图像说明“

step2-汇总列:

“在此处输入映像说明”

我不确定我如何能做到这一点在Python。

I am reading a csv file into a pandas dataframe which has duplicate values for the rows and columns that I need to agregate (both rows and cols).

The csv file looks like this:

p/q/[val]1112223344
185.0922775379.7047042814.6037225735.9460601838.6688308743.1441345262.199256961.966205647.0165290855.35105515
177.6769027772.2893371635.865734110.1205520622.3808059735.0989837639.9112281848.3971252437.9572906542.97728348
171.5186767666.1311111538.5951843322.3808040611.3164939929.0202922843.1409606944.0077705441.1955642736.96442413
278.3880538973.0004882834.1435890235.0989799529.0202922813.2614126236.2091369652.9093627936.0415039141.10220718
287.6921844582.3046188462.1416244539.9112358143.1409645136.2091369615.4128379852.4248542853.0688247755.80033112
268.8902664263.5027084452.8370056245.5443077141.6780052248.6098442150.7895431513.516909637.6500015336.0362854
371.0557403665.6681747437.696376834.853172341.1157226636.4359817555.1035652239.1139030511.2470083223.63844109
368.7552337663.3676872346.0309066843.1876983649.442562145.1920852753.0597152739.5100250223.6384391811.75947094
471.5186767666.1311111538.5951843322.3808040611.3164939929.0202922843.1409606944.0077705441.1955642736.96442413
468.8902664263.5027084452.8370056245.5443077141.6780052248.6098442150.7895431513.516909637.6500015336.0362854

When I read the csv file into a pandas df, it converts the duplicated column names to decimal numbers and looks like this:

p/q/[val]11.11.222.12.233.144.1
185.0922775379.7047042814.6037225735.9460601838.6688308743.1441345262.199256961.966205647.0165290855.35105515
177.6769027772.2893371635.865734110.1205520622.3808059735.0989837639.9112281848.3971252437.9572906542.97728348
171.5186767666.1311111538.5951843322.3808040611.3164939929.0202922843.1409606944.0077705441.1955642736.96442413
278.3880538973.0004882834.1435890235.0989799529.0202922813.2614126236.2091369652.9093627936.0415039141.10220718
287.6921844582.3046188462.1416244539.9112358143.1409645136.2091369615.4128379852.4248542853.0688247755.80033112
268.8902664263.5027084452.8370056245.5443077141.6780052248.6098442150.7895431513.516909637.6500015336.0362854
371.0557403665.6681747437.696376834.853172341.1157226636.4359817555.1035652239.1139030511.2470083223.63844109
368.7552337663.3676872346.0309066843.1876983649.442562145.1920852753.0597152739.5100250223.6384391811.75947094
471.5186767666.1311111538.5951843322.3808040611.3164939929.0202922843.1409606944.0077705441.1955642736.96442413
468.8902664263.5027084452.8370056245.5443077141.6780052248.6098442150.7895431513.516909637.6500015336.0362854

I need to aggregate the rows and the columns so my final dataframe looks like this:

p/q/[val]1234
160.164183427.5641064149.9370911943.57702446
266.9889488236.9415754736.8771074643.28319232
358.7623532641.7045370746.6968021417.57083988
460.2458254533.0916245837.86379637.96156883

In Excel I can do this in two steps using the following formulae:

Step1 - aggregate the rows:

enter image description here

Step2 - aggregate the columns:

enter image description here

I am just not sure how I can do this in python.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

记忆消瘦 2025-02-01 00:42:09

如果您的每列/行确实具有相同的索引:

(df
 .set_index('p/q/[val]')
 .groupby(level=0).mean()
 .groupby(level=0, axis=1).mean()
 )

替代:

(df
 .melt(id_vars='p/q/[val]')
 .groupby(['p/q/[val]', 'variable'])['value'].mean()
 .unstack()
 )

输出:

                  1          2          3          4
p/q/[val]                                            
1          60.164183  27.564106  49.937091  43.577024
2          66.988949  36.941575  36.877107  43.283192
3          58.762353  41.704537  46.696802  17.570840
4          60.245825  33.091625  37.863796  37.961569

如果列在表单1、1.1等上

(df
 .set_index('p/q/[val]')
 .rename(columns=lambda x: x.rpartition('.')[0])  # or x[0] if single digits
 .groupby(level=0).mean()
 .groupby(level=0, axis=1).mean()
 )

If you really have the same indices per column/row:

(df
 .set_index('p/q/[val]')
 .groupby(level=0).mean()
 .groupby(level=0, axis=1).mean()
 )

Alternative:

(df
 .melt(id_vars='p/q/[val]')
 .groupby(['p/q/[val]', 'variable'])['value'].mean()
 .unstack()
 )

Output:

                  1          2          3          4
p/q/[val]                                            
1          60.164183  27.564106  49.937091  43.577024
2          66.988949  36.941575  36.877107  43.283192
3          58.762353  41.704537  46.696802  17.570840
4          60.245825  33.091625  37.863796  37.961569

If the columns are on the form 1, 1.1, etc, add a rename step:

(df
 .set_index('p/q/[val]')
 .rename(columns=lambda x: x.rpartition('.')[0])  # or x[0] if single digits
 .groupby(level=0).mean()
 .groupby(level=0, axis=1).mean()
 )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文