使用稀疏的数据添加相关系数
我试图将一个配对式的元素组合在一起,以证明各组值之间的相关性。但是,我遇到了计算相关系数的问题,因为数据很少,只有一些组对于所有值都有条目(请参见下面的示例DF)。
group,PMTS049_X1,PMTS049_X2,PMTS049_X3,PMTS050_X1,PMTS050_X2,PMTS050_X3,PMTS051_X1,PMTS051_X2,PMTS051_X3
A-1155463_0.0045724735 uM,,,,,,,1.04971460376213,1.179179562152282,1.108186227034926
A-1155463_0.0137174204 uM,,,,,,,0.8988914257248843,1.1081975403870097,1.1212163392527827
A-1155463_0.0411522626 uM,,,,,,,0.7568287138412346,1.3430358264779123,0.6976356991070309
A-1155463_0.1234567894 uM,,,,,,,0.8722751660077679,1.2813232679016255,0.7168068340749701
A-1155463_0.3703703698 uM,,,,,,,1.0048500564303646,0.9822764937914188,0.7848035710436727
A-1155463_1.1111111109 uM,,,,,,,0.714489988829366,1.0212592818965238,0.9073404868051366
A-1155463_10.0 uM,,,,,,,0.8653477680327408,1.0148035026211837,1.078068956627115
A-1155463_3.3333333328 uM,,,,,,,0.7886497200478821,1.2509355686549108,1.102089542546691
AR00536370_0.004572474 uM,0.904860165175536,0.630159462892308,0.9489118135217488,,,,,,
AR00536370_0.013717421 uM,1.0775483229823892,1.0225981211028854,0.882809722682908,,,,,,
AR00536370_0.041152263 uM,0.651388505631196,0.6464619116451127,0.9482309449208448,,,,,,
AR00536370_0.12345679 uM,1.1258840716614356,1.086943922239876,0.9390622801310888,,,,,,
AR00536370_0.37037037 uM,0.8376593712267018,0.6137443429066174,1.1138346723430423,,,,,,
AR00536370_1.111111111 uM,0.889524913610136,0.614134774405636,1.024540177391546,,,,,,
AR00536370_10.0 uM,1.1265016447481484,1.0079550835923723,1.020468601391634,,,,,,
AR00536370_3.333333333 uM,0.7712691128712621,0.5893153808880822,0.9392664499480852,,,,,,
AZ-628_0.0045724738 uM,0.9076622425465212,1.0934450174529409,0.8998208357308275,0.5434529688654449,1.0562636134080905,0.7715563402897756,0.8276719202512567,1.186335347522918,0.8472096021168107
AZ-628_0.0137174214 uM,1.0963705071916314,0.7842348321294206,0.8315757098390226,1.1030070502236278,0.9908839527491846,0.6890801893628911,1.0576768873554188,0.6458275700418388,0.6252105837953932
AZ-628_0.0411522642 uM,0.4638659293322327,0.448229525781371,0.7834117035243882,0.6719774212414784,0.7929078280555364,0.5459489088307414,0.5319067062568421,0.7597553877201356,0.7555114458518108
AZ-628_0.1234567905 uM,0.7477873760694489,0.6876014256016453,0.7062173803657827,0.638953551272889,0.7319274702673808,0.5026056304516787,0.670610155828824,0.7502067504385757,0.728226016276163
AZ-628_0.3703703714 uM,0.6866247604514487,0.6283063570534719,0.4496522225873166,0.6832591578588455,0.5954091111794844,0.3100417445307057,0.6715651569282897,0.618884745101185,0.3682229011490131
AZ-628_1.1111111121 uM,0.5067878800206379,0.4427746455438304,0.3437829115127019,0.5443902391591303,0.4502681586077313,0.4061182601803592,0.5963543157154589,0.54950245634917,0.2611084549558647
# copy the data to the clipboard and use
df = pd.read_clipboard(sep=',')
到目前为止,我想到的是以下内容:
def corr_func(x, y, **kws):
r, _ = stats.pearsonr(x, y)
ax = plt.gca()
ax.annotate("{:.2f}".format(r),
xy=(.3, .45), xycoords=ax.transAxes, size=30)
def plot_corrplot(df):
g = sns.PairGrid(df, diag_sharey=False, corner=False)
g.map_lower(sns.scatterplot, s=10, color='black')
g.map_diag(sns.kdeplot, color='grey')
g.map_upper(corr_func)
plt.show()
散点图和KDE图显示,但是,由于NAN的存在,我无法正确计算相关性:
ValueError: array must not contain infs or NaNs
本质上,我需要做的是计算该系数在忽略任何未打开的条目(NAN)的同时。
下面显示了一个示例图,在不映射corr_function的情况下生成。在这种情况下,皮尔逊系数应在上图中。
I'm attempting to put together a PairGrid demonstrating correlations between values for various groups. However, I'm running into an issue with calculating the correlation coefficients because the data are sparse, with only some groups having entries for all values (see example df below).
group,PMTS049_X1,PMTS049_X2,PMTS049_X3,PMTS050_X1,PMTS050_X2,PMTS050_X3,PMTS051_X1,PMTS051_X2,PMTS051_X3
A-1155463_0.0045724735 uM,,,,,,,1.04971460376213,1.179179562152282,1.108186227034926
A-1155463_0.0137174204 uM,,,,,,,0.8988914257248843,1.1081975403870097,1.1212163392527827
A-1155463_0.0411522626 uM,,,,,,,0.7568287138412346,1.3430358264779123,0.6976356991070309
A-1155463_0.1234567894 uM,,,,,,,0.8722751660077679,1.2813232679016255,0.7168068340749701
A-1155463_0.3703703698 uM,,,,,,,1.0048500564303646,0.9822764937914188,0.7848035710436727
A-1155463_1.1111111109 uM,,,,,,,0.714489988829366,1.0212592818965238,0.9073404868051366
A-1155463_10.0 uM,,,,,,,0.8653477680327408,1.0148035026211837,1.078068956627115
A-1155463_3.3333333328 uM,,,,,,,0.7886497200478821,1.2509355686549108,1.102089542546691
AR00536370_0.004572474 uM,0.904860165175536,0.630159462892308,0.9489118135217488,,,,,,
AR00536370_0.013717421 uM,1.0775483229823892,1.0225981211028854,0.882809722682908,,,,,,
AR00536370_0.041152263 uM,0.651388505631196,0.6464619116451127,0.9482309449208448,,,,,,
AR00536370_0.12345679 uM,1.1258840716614356,1.086943922239876,0.9390622801310888,,,,,,
AR00536370_0.37037037 uM,0.8376593712267018,0.6137443429066174,1.1138346723430423,,,,,,
AR00536370_1.111111111 uM,0.889524913610136,0.614134774405636,1.024540177391546,,,,,,
AR00536370_10.0 uM,1.1265016447481484,1.0079550835923723,1.020468601391634,,,,,,
AR00536370_3.333333333 uM,0.7712691128712621,0.5893153808880822,0.9392664499480852,,,,,,
AZ-628_0.0045724738 uM,0.9076622425465212,1.0934450174529409,0.8998208357308275,0.5434529688654449,1.0562636134080905,0.7715563402897756,0.8276719202512567,1.186335347522918,0.8472096021168107
AZ-628_0.0137174214 uM,1.0963705071916314,0.7842348321294206,0.8315757098390226,1.1030070502236278,0.9908839527491846,0.6890801893628911,1.0576768873554188,0.6458275700418388,0.6252105837953932
AZ-628_0.0411522642 uM,0.4638659293322327,0.448229525781371,0.7834117035243882,0.6719774212414784,0.7929078280555364,0.5459489088307414,0.5319067062568421,0.7597553877201356,0.7555114458518108
AZ-628_0.1234567905 uM,0.7477873760694489,0.6876014256016453,0.7062173803657827,0.638953551272889,0.7319274702673808,0.5026056304516787,0.670610155828824,0.7502067504385757,0.728226016276163
AZ-628_0.3703703714 uM,0.6866247604514487,0.6283063570534719,0.4496522225873166,0.6832591578588455,0.5954091111794844,0.3100417445307057,0.6715651569282897,0.618884745101185,0.3682229011490131
AZ-628_1.1111111121 uM,0.5067878800206379,0.4427746455438304,0.3437829115127019,0.5443902391591303,0.4502681586077313,0.4061182601803592,0.5963543157154589,0.54950245634917,0.2611084549558647
# copy the data to the clipboard and use
df = pd.read_clipboard(sep=',')
So far, what I have come up with is the following:
def corr_func(x, y, **kws):
r, _ = stats.pearsonr(x, y)
ax = plt.gca()
ax.annotate("{:.2f}".format(r),
xy=(.3, .45), xycoords=ax.transAxes, size=30)
def plot_corrplot(df):
g = sns.PairGrid(df, diag_sharey=False, corner=False)
g.map_lower(sns.scatterplot, s=10, color='black')
g.map_diag(sns.kdeplot, color='grey')
g.map_upper(corr_func)
plt.show()
The scatterplots and kde plots display as I would expect, however, I can't properly calculate the correlations due to the presence of NaNs:
ValueError: array must not contain infs or NaNs
Essentially, what I need to do is calculate the coefficient while ignoring any unplotted entries (NaNs).
An example plot is shown below, generated without mapping the corr_function. In this case, the pearson coefficients should be in the upper plots.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论