使用相同的边比较不同的 histogram2d binnings
我有一个如下所示的数据集:
tsne_results_x tsne_results_y team_id
0 -22.796648 -26.514051 107
1 11.985229 40.674446 107
2 -28.231720 -49.302216 107
3 31.942875 -14.427114 107
4 -46.436501 -7.750005 107
76 24.252718 -20.551889 8071
77 2.362172 17.170067 8071
78 7.212677 -9.056982 8071
79 -5.865472 -32.999077 8071
我想对 tsne_results_x 和 tsne_results_y 列进行分箱,为此我使用 numpy 函数 histogram2d
grid, xe, ye = np.histogram2d(df['tsne_results_x'], df['tsne_results_y'], bins=15)
gridx = np.linspace(min(df['tsne_results_x']),max(df['tsne_results_x']),15)
gridy = np.linspace(min(df['tsne_results_y']),max(df['tsne_results_y']),15)
plt.figure()
#plt.plot(x, y, 'ro')
plt.grid(True)
#plt.figure()
plt.pcolormesh(gridx, gridy, grid)
plt.colorbar()
plt.show()
但是,如您所见,我有一些 team_id
在数据框中,我想将一个团队的各个数据箱与整个数据框进行比较。例如,对于一个团队,在一个特定的容器中,我想将其除以包括所有团队的总数。
因此,我认为在特定团队数据集上运行 histogram2d
并为整个数据集使用相同的行空间就可以解决问题。事实并非如此,因为 histogram2d
会对 one_team_df
进行不同的分类,因为数据具有不同的范围
one_team_df = df.loc[(df['team_id'] == str(299))]
grid_team, a, b = np.histogram2d(one_team_df['tsne_results_x'], one_team_df['tsne_results_y'], bins=15)
gridx = np.linspace(min(df['tsne_results_x']),max(df['tsne_results_x']),15)
gridy = np.linspace(min(df['tsne_results_y']),max(df['tsne_results_y']),15)
plt.figure()
#plt.plot(x, y, 'ro')
plt.grid(True)
#plt.figure()
plt.pcolormesh(gridx, gridy, grid_team)
#plt.plot(x, y, 'ro')
plt.colorbar()
plt.show()
我想知道如何制作这些两种表述具有可比性。是否可以运行 histogram2d
并给出 xedges
和 yedges
?这样我就可以使用整体分组的边缘对一个团队进行分组。
问候
I have a dataset that looks like this:
tsne_results_x tsne_results_y team_id
0 -22.796648 -26.514051 107
1 11.985229 40.674446 107
2 -28.231720 -49.302216 107
3 31.942875 -14.427114 107
4 -46.436501 -7.750005 107
76 24.252718 -20.551889 8071
77 2.362172 17.170067 8071
78 7.212677 -9.056982 8071
79 -5.865472 -32.999077 8071
I want to bin the tsne_results_x
and tsne_results_y
columns and for that I am using numpy
function histogram2d
grid, xe, ye = np.histogram2d(df['tsne_results_x'], df['tsne_results_y'], bins=15)
gridx = np.linspace(min(df['tsne_results_x']),max(df['tsne_results_x']),15)
gridy = np.linspace(min(df['tsne_results_y']),max(df['tsne_results_y']),15)
plt.figure()
#plt.plot(x, y, 'ro')
plt.grid(True)
#plt.figure()
plt.pcolormesh(gridx, gridy, grid)
plt.colorbar()
plt.show()
However, as you can see, I have a few team_id
s in the data frame and I would like to compare one team's individual bins to the whole data frame. For example, for one team, at one specific bin, I want to divide it by the total count that includes all the teams.
So, I thought that running histogram2d
on a specific team dataset, using the same linespace for the whole dataset would do the trick. It does not, because the histogram2d
will bin the one_team_df
differently because the data has different ranges
one_team_df = df.loc[(df['team_id'] == str(299))]
grid_team, a, b = np.histogram2d(one_team_df['tsne_results_x'], one_team_df['tsne_results_y'], bins=15)
gridx = np.linspace(min(df['tsne_results_x']),max(df['tsne_results_x']),15)
gridy = np.linspace(min(df['tsne_results_y']),max(df['tsne_results_y']),15)
plt.figure()
#plt.plot(x, y, 'ro')
plt.grid(True)
#plt.figure()
plt.pcolormesh(gridx, gridy, grid_team)
#plt.plot(x, y, 'ro')
plt.colorbar()
plt.show()
I would like to know how do I make these two representations comparable. Is it possible to run histogram2d
giving the xedges
and yedges
? This way I could bin one team using the edges of the overall binning.
Regards
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
np.histomgram2d 文档
这意味着您可以根据需要指定垃圾箱。例如:
documentation of np.histomgram2d
This means you can specify the bins as you want. For instance: