在 Python 中使用百分位数创建交叉表

发布于 01-15 22:00 字数 744 浏览 3 评论 0原文

我正在做涉及股票市场研究的工作，我想创建一个交叉表来运行卡方检验。我将股票市场价格变化数据作为数据框架，并且我想根据其中两列的百分位计数创建另一个交叉表。理想情况下它看起来像这样：

	0.25	0.5	0.75	1.0
0.25	12	45	13	12
0.5	2	27	9	15
0.75	14	11	89	23
1.0	10	52	11	7

其中，例如 (.75,.5) 条目是位于 0.5 和 0.75 百分位之间的数据点的计数对于第一个变量，以及第二个变量的 0.25 和 0.5 百分位数。显然这些数字实际上可能是不可能的，但你明白了。

到目前为止，我能想到的只是通过蛮力来实现，您分别获取每个变量的每个百分位数，然后获取每个变量的计数并将它们手动添加到表中。有没有更短的方法来做到这一点？

原文

I'm doing work involving stock market research and I wanted to create a crosstab to run a chi squared test on. I have stock market price change data as a data frame, and I wanted to create another crosstab based on counts by percentile of two of the columns. Ideally it'd look something like this:

	0.25	0.5	0.75	1.0
0.25	12	45	13	12
0.5	2	27	9	15
0.75	14	11	89	23
1.0	10	52	11	7

Where for example the (.75,.5) entry is the count of data points that lie between the 0.5 and 0.75 percentiles for the first variable and the 0.25 and 0.5 percentiles for the second variable. obviously those numbers probably aren't actually possible but you get the point.

All I can think of so far is just doing it by brute force where you get each percentile for each variable individually and then get the counts for each and add them in manually to a table. Is there any shorter way of doing this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泛泛之交2025-01-22 22:00:36

准备样本数据集

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(100,2), columns=['A', 'B'])

可以使用 qcut 计算百分位数。 4 是您想要拆分变量的百分位数。

df['A_binned'] = pd.qcut(df['A'], 4)
df['B_binned'] = pd.qcut(df['B'], 4)

计算每个百分位中的记录数

dff = df.groupby(by=['A_binned', 'B_binned']).count().reset_index()

最后，您可以旋转数据框

dff.pivot_table(index='A_binned', columns = 'B_binned', values='A')

Preparing a sample dataset

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(100,2), columns=['A', 'B'])

The percentiles can be computed using the qcut. The 4 is the number of percentiles you want to split your variable.

df['A_binned'] = pd.qcut(df['A'], 4)
df['B_binned'] = pd.qcut(df['B'], 4)

Counts the number of records in each percentile

dff = df.groupby(by=['A_binned', 'B_binned']).count().reset_index()

Finally you can pivot the dataframe

dff.pivot_table(index='A_binned', columns = 'B_binned', values='A')

回复收藏 0 原文

~没有更多了~

关于作者

请爱~陌生人

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

在 Python 中使用百分位数创建交叉表

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

5040234068

樱花雨梦

≈。彩虹

雨轻弹

血之狂魔

qq_0bIjwE

友情链接

在 Python 中使用百分位数创建交叉表

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

5040234068

樱花雨梦

≈。彩虹

雨轻弹

血之狂魔

qq_0bIjwE

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。