我看过很多问题,即使他们不知道枢轴表。编写 canonical canonical canonical canonical otanical几乎不可能。问答涵盖了旋转的所有方面...但是我要去一试。
现有问题和答案的问题在于,这个问题通常集中在OP概括以使用许多现有良好答案的细微差别上。但是,没有一个答案试图给出全面的解释(因为这是一项艰巨的任务)。查看我的 Google搜索:
- 如何在Pandas中旋转dataframe?好的问题和答案。但是答案只有很少的解释就回答了特定问题。
- pandas pivot表到数据框架 - OP与枢轴的输出有关,即列的外观。 OP希望它看起来像R。这对Pandas用户并不是很有帮助。
- pandas旋转数据框。
import numpy as np
import pandas as pd
from numpy.core.defchararray import add
np.random.seed([3,1415])
n = 20
cols = np.array(['key', 'row', 'item', 'col'])
arr1 = (np.random.randint(5, size=(n, 4)) // [2, 1, 2, 1]).astype(str)
df = pd.DataFrame(
add(cols, arr1), columns=cols
).join(
pd.DataFrame(np.random.rand(n, 2).round(2)).add_prefix('val')
)
print(df)
key row item col val0 val1
0 key0 row3 item1 col3 0.81 0.04
1 key1 row2 item1 col2 0.44 0.07
2 key1 row0 item1 col0 0.77 0.01
3 key0 row4 item0 col2 0.15 0.59
4 key1 row0 item2 col1 0.81 0.64
5 key1 row2 item2 col4 0.13 0.88
6 key2 row4 item1 col3 0.88 0.39
7 key1 row4 item1 col1 0.10 0.07
8 key1 row0 item2 col4 0.65 0.02
9 key1 row2 item0 col2 0.35 0.61
10 key2 row0 item2 col1 0.40 0.85
11 key2 row4 item1 col2 0.64 0.25
12 key0 row2 item2 col3 0.50 0.44
13 key0 row4 item1 col4 0.24 0.46
14 key1 row3 item2 col3 0.28 0.11
15 key0 row3 item1 col1 0.31 0.23
16 key0 row0 item2 col3 0.86 0.01
17 key0 row4 item0 col3 0.64 0.21
18 key2 row2 item2 col0 0.13 0.45
19 key0 row2 item0 col4 0.37 0.70
问题
为什么我得到 value eRror:索引包含重复条目,无法重塑
?
- 在是值?
col0 col1 col2 col2 col3 col4
排
Row0 0.77 0.605 NAN 0.860 0.65
Row2 0.13 NAN 0.395 0.500 0.25
Row3 Nan 0.310 Nan 0.545 Nan
Row4 Nan 0.100 0.395 0.760 0.24
-
我如何制作它,以使丢失值为 0
?
col0 col1 col2 col2 col3 col4
排
Row0 0.77 0.605 0.000 0.860 0.65
Row2 0.13 0.000 0.395 0.500 0.25
Row3 0.00 0.310 0.000 0.545 0.00
Row4 0.00 0.100 0.395 0.760 0.24
-
我可以得到含义
以外的其他东西,例如 sum
?
col0 col1 col2 col2 col3 col4
排
Row0 0.77 1.21 0.00 0.86 0.65
Row2 0.13 0.00 0.79 0.50 0.50
Row3 0.00 0.31 0.00 1.09 0.00
Row4 0.00 0.10 0.79 1.52 0.24
-
我一次可以做更多的聚合吗?
总和
COL0 COL1 COL2 COL3 COL4 COL0 COL1 COL2 COL3 COL4
排
Row0 0.77 1.21 0.00 0.86 0.65 0.77 0.605 0.000 0.860 0.65
Row2 0.13 0.00 0.79 0.50 0.50 0.13 0.000 0.395 0.500 0.25
Row3 0.00 0.31 0.00 1.09 0.00 0.00 0.310 0.000 0.545 0.00
Row4 0.00 0.10 0.79 1.52 0.24 0.00 0.100 0.395 0.760 0.24
-
我可以在多个值列上汇总吗?
val0 val1
COL0 COL1 COL2 COL3 COL4 COL0 COL1 COL2 COL3 COL4
排
Row0 0.77 0.605 0.000 0.860 0.65 0.01 0.745 0.00 0.010 0.02
Row2 0.13 0.000 0.395 0.500 0.25 0.45 0.000 0.34 0.440 0.79
Row3 0.00 0.310 0.000 0.545 0.00 0.00 0.230 0.00 0.075 0.00
Row4 0.00 0.100 0.395 0.760 0.24 0.00 0.070 0.42 0.300 0.46
-
我可以通过多个列细分吗?
item0 item0 item1 item2
Col2 Col3 Col4 Col0 Col1 Col2 Col3 Col4 Col4 Col0 Col1 Col3 Col4 Col4
排
Row0 0.00 0.00 0.00 0.77 0.00 0.00 0.00 0.00 0.00 0.605 0.86 0.65
Row2 0.35 0.00 0.37 0.00 0.00 0.44 0.00 0.00 0.13 0.000 0.50 0.50 0.13
Row3 0.00 0.00 0.00 0.00 0.31 0.00 0.81 0.00 0.00 0.000 0.000 0.28 0.00
Row4 0.15 0.64 0.00 0.00 0.10 0.64 0.88 0.24 0.00 0.000 0.000 0.00 0.00
-
或
item0 item0 item1 item2
Col2 Col3 Col4 Col0 Col1 Col2 Col3 Col4 Col4 Col0 Col1 Col3 Col4 Col4
钥匙行
钥匙0行0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.86 0.00
Row2 0.00 0.00 0.37 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.50 0.00
Row3 0.00 0.00 0.00 0.00 0.31 0.00 0.81 0.00 0.00 0.00 0.00 0.00 0.00
Row4 0.15 0.64 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 0.00 0.00
KEY1 Row0 0.00 0.00 0.00 0.77 0.00 0.00 0.00 0.00 0.00 0.00 0.81 0.00 0.65
Row2 0.35 0.00 0.00 0.00 0.00 0.44 0.00 0.00 0.00 0.00 0.00 0.00 0.13
Row3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.28 0.00
Row4 0.00 0.00 0.00 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
键2行0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.40 0.00 0.00 0.00
Row2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 0.00 0.00 0.00
Row4 0.00 0.00 0.00 0.00 0.00 0.64 0.88 0.00 0.00 0.00 0.00 0.00 0.00
我可以汇总列和行在一起的频率,又称“交叉表”?
col0 col1 col2 col2 col3 col4
排
Row0 1 2 0 1 1
Row2 1 0 2 1 2
Row3 0 1 0 2 0
Row4 0 1 2 2 1
-
如何仅在两列上旋转,如何将数据帧从长时间转换为宽?给出,
np.random.seed([[3,1415])
df2 = pd.dataframe({'a':list('aaaabbbc'),'b':np.random.choice(15,8)})
DF2
ab
0 A 0
1 A 11
2 A 2
3 A 11
4 B 10
5 B 10
6 B 14
7 C 7
预期应该看起来像
ABC
0 0.0 10.0 7.0
1 11.0 10.0 Nan
2 2.0 14.0 nan
3 11.0 Nan Nan
如何在 pivot
之后将多个索引变为单个索引?
来自
1 2
1 1 2
A 2 1 1
B 2 1 0
C 1 0 0
到
1 | 1 2 | 1 2 | 2
A 2 1 1
B 2 1 0
C 1 0 0
- What is pivot?
- How do I pivot?
- Long format to wide format?
I've seen a lot of questions that ask about pivot tables, even if they don't know it. It is virtually impossible to write a canonical question and answer that encompasses all aspects of pivoting... But I'm going to give it a go.
The problem with existing questions and answers is that often the question is focused on a nuance that the OP has trouble generalizing in order to use a number of the existing good answers. However, none of the answers attempt to give a comprehensive explanation (because it's a daunting task). Look at a few examples from my Google search:
- How to pivot a dataframe in Pandas? - Good question and answer. But the answer only answers the specific question with little explanation.
- pandas pivot table to data frame - OP is concerned with the output of the pivot, namely how the columns look. OP wanted it to look like R. This isn't very helpful for pandas users.
- pandas pivoting a dataframe, duplicate rows - Another decent question but the answer focuses on one method, namely
pd.DataFrame.pivot
Setup
I conspicuously named my columns and relevant column values to correspond with how I'm going to pivot in the answers below.
import numpy as np
import pandas as pd
from numpy.core.defchararray import add
np.random.seed([3,1415])
n = 20
cols = np.array(['key', 'row', 'item', 'col'])
arr1 = (np.random.randint(5, size=(n, 4)) // [2, 1, 2, 1]).astype(str)
df = pd.DataFrame(
add(cols, arr1), columns=cols
).join(
pd.DataFrame(np.random.rand(n, 2).round(2)).add_prefix('val')
)
print(df)
key row item col val0 val1
0 key0 row3 item1 col3 0.81 0.04
1 key1 row2 item1 col2 0.44 0.07
2 key1 row0 item1 col0 0.77 0.01
3 key0 row4 item0 col2 0.15 0.59
4 key1 row0 item2 col1 0.81 0.64
5 key1 row2 item2 col4 0.13 0.88
6 key2 row4 item1 col3 0.88 0.39
7 key1 row4 item1 col1 0.10 0.07
8 key1 row0 item2 col4 0.65 0.02
9 key1 row2 item0 col2 0.35 0.61
10 key2 row0 item2 col1 0.40 0.85
11 key2 row4 item1 col2 0.64 0.25
12 key0 row2 item2 col3 0.50 0.44
13 key0 row4 item1 col4 0.24 0.46
14 key1 row3 item2 col3 0.28 0.11
15 key0 row3 item1 col1 0.31 0.23
16 key0 row0 item2 col3 0.86 0.01
17 key0 row4 item0 col3 0.64 0.21
18 key2 row2 item2 col0 0.13 0.45
19 key0 row2 item0 col4 0.37 0.70
Questions
-
Why do I get ValueError: Index contains duplicate entries, cannot reshape
?
-
How do I pivot df
such that the col
values are columns, row
values are the index, and mean of val0
are the values?
col col0 col1 col2 col3 col4
row
row0 0.77 0.605 NaN 0.860 0.65
row2 0.13 NaN 0.395 0.500 0.25
row3 NaN 0.310 NaN 0.545 NaN
row4 NaN 0.100 0.395 0.760 0.24
-
How do I make it so that missing values are 0
?
col col0 col1 col2 col3 col4
row
row0 0.77 0.605 0.000 0.860 0.65
row2 0.13 0.000 0.395 0.500 0.25
row3 0.00 0.310 0.000 0.545 0.00
row4 0.00 0.100 0.395 0.760 0.24
-
Can I get something other than mean
, like maybe sum
?
col col0 col1 col2 col3 col4
row
row0 0.77 1.21 0.00 0.86 0.65
row2 0.13 0.00 0.79 0.50 0.50
row3 0.00 0.31 0.00 1.09 0.00
row4 0.00 0.10 0.79 1.52 0.24
-
Can I do more that one aggregation at a time?
sum mean
col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4
row
row0 0.77 1.21 0.00 0.86 0.65 0.77 0.605 0.000 0.860 0.65
row2 0.13 0.00 0.79 0.50 0.50 0.13 0.000 0.395 0.500 0.25
row3 0.00 0.31 0.00 1.09 0.00 0.00 0.310 0.000 0.545 0.00
row4 0.00 0.10 0.79 1.52 0.24 0.00 0.100 0.395 0.760 0.24
-
Can I aggregate over multiple value columns?
val0 val1
col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4
row
row0 0.77 0.605 0.000 0.860 0.65 0.01 0.745 0.00 0.010 0.02
row2 0.13 0.000 0.395 0.500 0.25 0.45 0.000 0.34 0.440 0.79
row3 0.00 0.310 0.000 0.545 0.00 0.00 0.230 0.00 0.075 0.00
row4 0.00 0.100 0.395 0.760 0.24 0.00 0.070 0.42 0.300 0.46
-
Can I subdivide by multiple columns?
item item0 item1 item2
col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4
row
row0 0.00 0.00 0.00 0.77 0.00 0.00 0.00 0.00 0.00 0.605 0.86 0.65
row2 0.35 0.00 0.37 0.00 0.00 0.44 0.00 0.00 0.13 0.000 0.50 0.13
row3 0.00 0.00 0.00 0.00 0.31 0.00 0.81 0.00 0.00 0.000 0.28 0.00
row4 0.15 0.64 0.00 0.00 0.10 0.64 0.88 0.24 0.00 0.000 0.00 0.00
-
Or
item item0 item1 item2
col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4
key row
key0 row0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.86 0.00
row2 0.00 0.00 0.37 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.00
row3 0.00 0.00 0.00 0.00 0.31 0.00 0.81 0.00 0.00 0.00 0.00 0.00
row4 0.15 0.64 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 0.00
key1 row0 0.00 0.00 0.00 0.77 0.00 0.00 0.00 0.00 0.00 0.81 0.00 0.65
row2 0.35 0.00 0.00 0.00 0.00 0.44 0.00 0.00 0.00 0.00 0.00 0.13
row3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.28 0.00
row4 0.00 0.00 0.00 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00
key2 row0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.40 0.00 0.00
row2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 0.00 0.00 0.00
row4 0.00 0.00 0.00 0.00 0.00 0.64 0.88 0.00 0.00 0.00 0.00 0.00
-
Can I aggregate the frequency in which the column and rows occur together, aka "cross tabulation"?
col col0 col1 col2 col3 col4
row
row0 1 2 0 1 1
row2 1 0 2 1 2
row3 0 1 0 2 0
row4 0 1 2 2 1
-
How do I convert a DataFrame from long to wide by pivoting on ONLY two columns? Given,
np.random.seed([3, 1415])
df2 = pd.DataFrame({'A': list('aaaabbbc'), 'B': np.random.choice(15, 8)})
df2
A B
0 a 0
1 a 11
2 a 2
3 a 11
4 b 10
5 b 10
6 b 14
7 c 7
The expected should look something like
a b c
0 0.0 10.0 7.0
1 11.0 10.0 NaN
2 2.0 14.0 NaN
3 11.0 NaN NaN
-
How do I flatten the multiple index to single index after pivot
?
From
1 2
1 1 2
a 2 1 1
b 2 1 0
c 1 0 0
To
1|1 2|1 2|2
a 2 1 1
b 2 1 0
c 1 0 0
发布评论
评论(5)
这是我们可以使用的成语列表,我们可以用
pd.dataframe.pivot_table
groupby
具有更直观的API的荣耀版本。对于许多人来说,这是首选的方法。这是开发人员的预期方法。+
unstack
您想在列索引中的级别。+
groupby
范式,我们指定了最终将是行或列级并将其设置为索引的所有列。然后,我们unstack
我们在列中想要的级别。如果剩余的索引级别或列级不是唯一的,则此方法将失败。代码>
set_index非常相似,因为它共享重复的密钥限制。 API也非常有限。它仅采用
index
,列
,values
。的标量值。
。pivot_table
方法,因为我们选择了枢轴的行,列和值。但是,我们不能汇总,如果行或列不是唯一的,则此方法将失败。pivot_table
的专业版,最直观的方式是执行多个任务的最直观的方式。+
+
另请参阅:
问题1问题1
这是因为熊猫试图重新索引
列
或index
index 带有重复的对象条目。有不同的方法可以执行枢轴。当他们被要求旋转的密钥重复时,其中一些人不适合。例如:考虑pd.dataframe.pivot
。我知道有重复的条目共享行
和col
值:因此,当i
pivot
使用i时,我会得到上面提到的错误。实际上,当我尝试执行相同的任务时,我会遇到相同的错误:
示例
我要为每个后续问题做什么是使用
pd.dataframe.pivot_table_pivot_table
。然后,我将提供执行相同任务的替代方案。问题2和3
noreferrer“>
aggfunc ='平均'
是默认值,我不必设置它。我将其包括在内。noreferrer“>
fill_value
默认设置未设置。我倾向于适当地设置它。在这种情况下,我将其设置为0
。问题4
noreferrer“>
pd。 dataframe.groupby
问题5
请注意,对于
pivot_table
和crosstab
我需要传递可可的列表。另一方面,groupby.agg
能够为有限数量的特殊功能带上字符串。groupby.agg
也将采取与我们传递给其他人相同的可喊声,但是由于要获得的效率,要利用字符串函数名称通常更有效。noreferrer“>
pd。 dataframe.groupby
问题6
noreferrer“> 我们通过
values = ['val0','val1']
,但我们可以完全将其留下pd.dataframe.groupby.groupby
问题7
noreferrer“>
pd。 dataframe.groupby
问题8
noreferrer“>
pd。 dataframe.groupby
因为键的集合对于行和列都是唯一的
问题9
noreferrer“>
pd。 dataframe.groupby
pd.pd.pd.factorize
+ <10
noreferrer“>
dataframe.pivot
第一步是为每行分配一个数字 - 此数字将是枢纽结果中该值的行索引。这是使用 :
第二步是将新创建的列用作索引
dataframe.pivot
。而
> 仅接受列, dataframe.pivot_table 也接受数组,因此
groupby.cumcount
可以直接作为index
而无需创建显式列。问题11
如果
列
type object 使用字符串 join joinelse else
>格式
Here is a list of idioms we can use to pivot
pd.DataFrame.pivot_table
groupby
with more intuitive API. For many people, this is the preferred approach. And it is the intended approach by the developers.pd.DataFrame.groupby
+pd.DataFrame.unstack
unstack
the levels that you want to be in the column index.pd.DataFrame.set_index
+pd.DataFrame.unstack
groupby
paradigm, we specify all columns that will eventually be either row or column levels and set those to be the index. We thenunstack
the levels we want in the columns. If either the remaining index levels or column levels are not unique, this method will fail.pd.DataFrame.pivot
set_index
in that it shares the duplicate key limitation. The API is very limited as well. It only takes scalar values forindex
,columns
,values
.pivot_table
method in that we select rows, columns, and values on which to pivot. However, we cannot aggregate and if either rows or columns are not unique, this method will fail.pd.crosstab
pivot_table
and in its purest form is the most intuitive way to perform several tasks.pd.factorize
+np.bincount
pd.get_dummies
+pd.DataFrame.dot
See also:
Question 1
This occurs because pandas is attempting to reindex either a
columns
orindex
object with duplicate entries. There are varying methods to use that can perform a pivot. Some of them are not well suited to when there are duplicates of the keys on which it is being asked to pivot. For example: Considerpd.DataFrame.pivot
. I know there are duplicate entries that share therow
andcol
values:So when I
pivot
usingI get the error mentioned above. In fact, I get the same error when I try to perform the same task with:
Examples
What I'm going to do for each subsequent question is to answer it using
pd.DataFrame.pivot_table
. Then I'll provide alternatives to perform the same task.Questions 2 and 3
pd.DataFrame.pivot_table
aggfunc='mean'
is the default and I didn't have to set it. I included it to be explicit.pd.DataFrame.pivot_table
fill_value
is not set by default. I tend to set it appropriately. In this case I set it to0
.pd.DataFrame.groupby
pd.crosstab
Question 4
pd.DataFrame.pivot_table
pd.DataFrame.groupby
pd.crosstab
Question 5
Notice that for
pivot_table
andcrosstab
I needed to pass list of callables. On the other hand,groupby.agg
is able to take strings for a limited number of special functions.groupby.agg
would also have taken the same callables we passed to the others, but it is often more efficient to leverage the string function names as there are efficiencies to be gained.pd.DataFrame.pivot_table
pd.DataFrame.groupby
pd.crosstab
Question 6
pd.DataFrame.pivot_table
we passvalues=['val0', 'val1']
but we could've left that off completelypd.DataFrame.groupby
Question 7
pd.DataFrame.pivot_table
pd.DataFrame.groupby
Question 8
pd.DataFrame.pivot_table
pd.DataFrame.groupby
pd.DataFrame.set_index
because the set of keys are unique for both rows and columnsQuestion 9
pd.DataFrame.pivot_table
pd.DataFrame.groupby
pd.crosstab
pd.factorize
+np.bincount
pd.get_dummies
Question 10
DataFrame.pivot
The first step is to assign a number to each row - this number will be the row index of that value in the pivoted result. This is done using
GroupBy.cumcount
:The second step is to use the newly created column as the index to call
DataFrame.pivot
.DataFrame.pivot_table
Whereas
DataFrame.pivot
only accepts columns,DataFrame.pivot_table
also accepts arrays, so theGroupBy.cumcount
can be passed directly as theindex
without creating an explicit column.Question 11
If
columns
typeobject
with stringjoin
else
format
扩展@pirSquared的答案 62218881/how-to-to Transpose-a-specif-column-a-a-a-dataframe-a-dataframe and-group-by-python>问题10
问题10.1
dataframe:
output: output:output:
使用
df.groupbybyby
和
pd.series.tolist
或
使用代码> with
To extend @piRSquared's answer another version of Question 10
Question 10.1
DataFrame:
Output:
Using
df.groupby
andpd.Series.tolist
Or
A much better alternative using
pd.pivot_table
withdf.squeeze.
更好地了解函数如何 Pivot 如果您具有重复索引列(
foo
-bar
)组合(例如df
)第二个示例):与
pivot
相反/pandas.pydata.org/docs/reference/api/pandas.dataframe.pivot_table.html“ rel =” nofollow noreferrer“> pivot_table”> pivot_table 支持使用MANE
function支持数据汇总。这是一个带有sum
聚合函数的示例:To better understand how the function pivot works you can look at the example from Pandas documentation. However
pivot
will fail if you have repeating index-columns (foo
-bar
) combinations (likedf
in the second example):In opposite to
pivot
the function pivot_table supports data aggregation using themean
function by default. Here is an example with thesum
aggregation function:调用
reset_index()
(以及add_suffix()
)通常,
reset_index()
在您调用pivot_table
之后,需要或Pivot
。例如,要进行以下转换(其中一个列成为列名)< img src =“ https://i.sstatic.net/slcqf.png” alt =“ res”>
您使用以下代码,在
pivot
之后,您可以在新创建的列名称并将索引转换为(在这种情况下为“电影”
)回到列中,然后删除轴名称的名称:如提到的其他答案,“枢轴”可能是指2个不同的操作:
groupby.agg
更宽 在r)1中。聚合
pivot_table
或crosstab
只是groupby.agg
操作的未堆放结果。实际上,pivot_table
=groupby
+unstack
(在此处阅读以获取更多信息。)crosstab
=pivot_table
nb您可以将列名称列表用作
index
,<代码>列和values
参数。1.1。
crosstab
是pivot_table
的特例;因此,groupby
+unstack
以下是等效的:
pd.crosstab(df ['cola'],df ['colb'])
df.groupby(['cola','colb'])。 size()。unstack(fill_value = 0)
请注意,
pd.crosstab
的开销明显更大,因此比这两个pivot_table
和都明显慢得多。 groupby
+ <代码> unstack 。实际上,AS 在这里注明,pivot_table
比group> group +
unstack
。2。重塑
pivot
是pivot_table
的更有限的版本,其中其目的是将长数据框架重塑为长期。2.1。增强行/列作为问题10中,
您还可以将问题10的见解应用于多列枢轴操作。有两种情况:
“远到长” :通过增加指数
来重塑
代码:
“ toff to with wide” :通过增强列来重塑列
代码:
使用
set_index
+unstack
最小情况>语法:代码:
1
pivot_table()
值并解开它。具体来说,它使用传递的聚合方法创建了索引和列的单个平面列表,将其作为groupby()使用此列表,并使用传递的聚合方法(默认值为 earge> eargegator >) )。然后,在汇总之后,它通过列列表调用unstack()
。因此,在内部, pivot_table = groupby + unstack 。此外,如果传递了fill_value
,则调用fillna()
。换句话说,产生
pv_1
的方法与以下示例中产生gb_1
的方法相同。2
crosstab()
调用pivot_table()
,即, crosstab = pivot_table 。具体来说,它从传递的值阵列中构建一个数据框,通过通用索引和调用pivot_table()
对其进行过滤。它比pivot_table()
更有限制列作为value
。Call
reset_index()
(along withadd_suffix()
)Oftentimes,
reset_index()
is needed after you callpivot_table
orpivot
. For example, to make the following transformation (where one column become column names)you use the following code, where after
pivot
, you add prefix to the newly created column names and convert the index (in this case"movies"
) back into a column and remove the name of the axis name:As the other answers mentioned, "pivot" may refer to 2 different operations:
groupby.agg
wider.)reshape
in numpy orpivot_wider
in R)1. Aggregation
pivot_table
orcrosstab
are simply unstacked results ofgroupby.agg
operation. In fact, the source code shows that, under the hood, the following are true:pivot_table
=groupby
+unstack
(read here for more info.)crosstab
=pivot_table
N.B. You can use list of column names as
index
,columns
andvalues
arguments.1.1.
crosstab
is a special case ofpivot_table
; thus ofgroupby
+unstack
The following are equivalent:
pd.crosstab(df['colA'], df['colB'])
df.pivot_table(index='colA', columns='colB', aggfunc='size', fill_value=0)
df.groupby(['colA', 'colB']).size().unstack(fill_value=0)
Note that
pd.crosstab
has a significantly larger overhead, so it's significantly slower than bothpivot_table
andgroupby
+unstack
. In fact, as noted here,pivot_table
is slower thangroupby
+unstack
as well.2. Reshaping
pivot
is a more limited version ofpivot_table
where its purpose is to reshape a long dataframe into a long one.2.1. Augment rows/columns as in Question 10
You can also apply the insight from Question 10 to multi-column pivot operation as well. There are two cases:
"long-to-long": reshape by augmenting the indices
Code:
"long-to-wide": reshape by augmenting the columns
Code:
minimum case using the
set_index
+unstack
syntax:Code:
1
pivot_table()
aggregates the values and unstacks it. Specifically, it creates a single flat list out of index and columns, callsgroupby()
with this list as the grouper and aggregates using the passed aggregator methods (the default ismean
). Then after aggregation, it callsunstack()
by the list of columns. So internally, pivot_table = groupby + unstack. Moreover, iffill_value
is passed,fillna()
is called.In other words, the method that produces
pv_1
is the same as the method that producesgb_1
in the example below.2
crosstab()
callspivot_table()
, i.e., crosstab = pivot_table. Specifically, it builds a DataFrame out of the passed arrays of values, filters it by the common indices and callspivot_table()
. It's more limited thanpivot_table()
because it only allows a one-dimensional array-like asvalues
, unlikepivot_table()
that can have multiple columns asvalues
.熊猫中的枢轴函数具有与Excel中的枢轴操作相同的功能。我们可以将数据集从长度格式转换为广泛的格式。
让我们有一个示例
我们可以使用枢轴函数执行此数据操作。
旋转数据集,
我们可以通过重置索引来将新列与索引列数据相同。
重置修改列级
pivot_df = pivot_df.reset_index()
i.sstatic.net/boxhg.png“ alt =”在此处输入图像说明”>
The pivot function in pandas has the same functionality as the pivot operation in excel. We can transform a dataset from a long format to a wide format.
Lets have a example
We want to convert the dataset into a form such that each country becomes a column and the new confirmed cases as values corresponding to the countries. We can perform this data manipulation using the pivot function.
Pivot the dataset
We can bring the new columns to the same level as the index column Data by resetting the index.
reset the index to modify the column levels
pivot_df = pivot_df.reset_index()