在 Python 中使用百分位数创建交叉表
我正在做涉及股票市场研究的工作,我想创建一个交叉表来运行卡方检验。我将股票市场价格变化数据作为数据框架,并且我想根据其中两列的百分位计数创建另一个交叉表。理想情况下它看起来像这样:
0.25 | 0.5 | 0.75 | 1.0 | |
---|---|---|---|---|
0.25 | 12 | 45 | 13 | 12 |
0.5 | 2 | 27 | 9 | 15 |
0.75 | 14 | 11 | 89 | 23 |
1.0 | 10 | 52 | 11 | 7 |
其中,例如 (.75,.5) 条目是位于 0.5 和 0.75 百分位之间的数据点的计数对于第一个变量,以及第二个变量的 0.25 和 0.5 百分位数。显然这些数字实际上可能是不可能的,但你明白了。
到目前为止,我能想到的只是通过蛮力来实现,您分别获取每个变量的每个百分位数,然后获取每个变量的计数并将它们手动添加到表中。有没有更短的方法来做到这一点?
I'm doing work involving stock market research and I wanted to create a crosstab to run a chi squared test on. I have stock market price change data as a data frame, and I wanted to create another crosstab based on counts by percentile of two of the columns. Ideally it'd look something like this:
0.25 | 0.5 | 0.75 | 1.0 | |
---|---|---|---|---|
0.25 | 12 | 45 | 13 | 12 |
0.5 | 2 | 27 | 9 | 15 |
0.75 | 14 | 11 | 89 | 23 |
1.0 | 10 | 52 | 11 | 7 |
Where for example the (.75,.5) entry is the count of data points that lie between the 0.5 and 0.75 percentiles for the first variable and the 0.25 and 0.5 percentiles for the second variable. obviously those numbers probably aren't actually possible but you get the point.
All I can think of so far is just doing it by brute force where you get each percentile for each variable individually and then get the counts for each and add them in manually to a table. Is there any shorter way of doing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
准备样本数据集
可以使用 qcut 计算百分位数。 4 是您想要拆分变量的百分位数。
计算每个百分位中的记录数
最后,您可以旋转数据框
Preparing a sample dataset
The percentiles can be computed using the qcut. The 4 is the number of percentiles you want to split your variable.
Counts the number of records in each percentile
Finally you can pivot the dataframe