Pandas 对文件进行排序并对值进行分组
我正在学习大熊猫,但是遇到了一些麻烦。 我将数据作为数据帧导入,并希望将2017年人口值汇入四个相等大小的组。 并计算group4的数量
,但是系统打印出来:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-52-05d9f2e7ffc8> in <module>
2
3 df=pd.read_excel('C:/Users/Sam/Desktop/商業分析/Python_Jabbia1e/Chapter 2/jaggia_ba_1e_ch02_Data_Files.xlsx',sheet_name='Population')
----> 4 df=df.sort_values('2017',ascending=True)
5 df['Group'] = pd.qcut(df['2017'], q = 4, labels = range(1, 5))
6 splitData = [group for _, group in df.groupby('Group')]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position, ignore_index, key)
5453
5454 by = by[0]
-> 5455 k = self._get_label_or_level_values(by, axis=axis)
5456
5457 # need to rewrap column in Series to apply key function
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_label_or_level_values(self, key, axis)
1682 values = self.axes[axis].get_level_values(key)._values
1683 else:
-> 1684 raise KeyError(key)
1685
1686 # Check for duplicates
KeyError: '2017'
有什么问题? 谢谢〜
这是数据框:
,我尝试了:
df=pd.read_excel('C:/Users/Sam/Desktop/商業分析/Python_Jabbia1e/Chapter 2/jaggia_ba_1e_ch02_Data_Files.xlsx',sheet_name='Population')
df=df.sort_values('2017',ascending=True)
df['Group'] = pd.qcut(df['2017'], q = 4, labels = range(1, 5))
splitData = [group for _, group in df.groupby('Group')]
print('The number of group4 is :',splitData[3].shape[0])
I'm learning pandas,but having some trouble.
I import data as DataFrame and want to bin the 2017 population values into four equal-size groups.
And count the number of group4
However the system print out:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-52-05d9f2e7ffc8> in <module>
2
3 df=pd.read_excel('C:/Users/Sam/Desktop/商業分析/Python_Jabbia1e/Chapter 2/jaggia_ba_1e_ch02_Data_Files.xlsx',sheet_name='Population')
----> 4 df=df.sort_values('2017',ascending=True)
5 df['Group'] = pd.qcut(df['2017'], q = 4, labels = range(1, 5))
6 splitData = [group for _, group in df.groupby('Group')]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position, ignore_index, key)
5453
5454 by = by[0]
-> 5455 k = self._get_label_or_level_values(by, axis=axis)
5456
5457 # need to rewrap column in Series to apply key function
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_label_or_level_values(self, key, axis)
1682 values = self.axes[axis].get_level_values(key)._values
1683 else:
-> 1684 raise KeyError(key)
1685
1686 # Check for duplicates
KeyError: '2017'
What's wrong with it?
Thanks~
Here's the dataframe:
And I tried:
df=pd.read_excel('C:/Users/Sam/Desktop/商業分析/Python_Jabbia1e/Chapter 2/jaggia_ba_1e_ch02_Data_Files.xlsx',sheet_name='Population')
df=df.sort_values('2017',ascending=True)
df['Group'] = pd.qcut(df['2017'], q = 4, labels = range(1, 5))
splitData = [group for _, group in df.groupby('Group')]
print('The number of group4 is :',splitData[3].shape[0])
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您正在将
df.sort_values()
的键作为str
插入。您可以将其作为列表中的元素提供,也可以不提供。或
仅当列值与您传递的字符串完全匹配时才有效。如果它不是字符串或者该字符串包含一些空格,则它将不起作用。您可以在排序之前删除任何尾随空格,
如果它不是您应该使用的字符串,
You are inserting the key for
df.sort_values()
as astr
. You can either give it as an element in a list or not.or
This only works if the column value is exactly matching the string you pass. If it is not a string or if that string contains some white spaces it won't work. You can remove any trailing white spaces before sorting by,
and if it is not a string you should use,
首先,您有4条与排序有问题,您告诉排序功能以寻找String 2017,但它是整数。尝试此操作,然后继续使用您的代码:
Firstly, you have problem in 4 line with the sort, you tell sort function to look for string 2017, but it's integer. Try this then move on on your code: