从多个加权海洋分布图中检索线数据？

发布于 01-22 18:26 字数 1339 浏览 2 评论 0原文

我在下面有随机生成的数据框的代码，我想提取两个绘制线的x和y值。这些线图显示了Y轴上的价格，并且体积加权。

由于某种原因，第二个分布图的线值不能存储在变量“ df_2_x”，“ df_2_y”上。 “ df_1_x”，“ df_1_y”的值也写在其他变量上。两个打印语句返回true，因此数组完全相等。

如果我将它们放在笔记本中的单独单元格中，则确实有效。

，但这对加权散布不起作用。

import pandas as pd
import random
import seaborn as sns
import matplotlib.pyplot as plt

Price_1 = [round(random.uniform(2,12), 2) for i in range(30)]
Volume_1 = [round(random.uniform(100,3000)) for i in range(30)]
Price_2 = [round(random.uniform(0,10), 2) for i in range(30)]
Volume_2 = [round(random.uniform(100,1500)) for i in range(30)]

df_1 = pd.DataFrame({'Price_1' : Price_1,
                    'Volume_1' : Volume_1})
df_2 = pd.DataFrame({'Price_2' : Price_2,
                    'Volume_2' :Volume_2})

df_1_x, df_1_y = sns.distplot(df_1.Price_1, hist_kws={"weights":list(df_1.Volume_1)}).get_lines()[0].get_data()
df_2_x, df_2_y = sns.distplot(df_2.Price_2, hist_kws={"weights":list(df_2.Volume_2)}).get_lines()[0].get_data()

print((df_1_x == df_2_x).all())
print((df_1_y == df_2_y).all())

为什么会发生这种情况，我该如何解决？

原文

I have the code below with randomly generated dataframes and I would like to extract the x and y values of both plotted lines. These line plots show the Price on the Y-axis and are Volume weighted.

For some reason, the line values for the second distribution plot, cannot be stored on the variables "df_2_x", "df_2_y". The values of "df_1_x", "df_1_y" are also written on the other variables. Both print statements return True, so the arrays are completely equal.

If I put them in separate cells in a notebook, it does work.

I also looked at this solution: How to retrieve all data from seaborn distribution plot with mutliple distributions?

But this does not work for weighted distplots.

import pandas as pd
import random
import seaborn as sns
import matplotlib.pyplot as plt

Price_1 = [round(random.uniform(2,12), 2) for i in range(30)]
Volume_1 = [round(random.uniform(100,3000)) for i in range(30)]
Price_2 = [round(random.uniform(0,10), 2) for i in range(30)]
Volume_2 = [round(random.uniform(100,1500)) for i in range(30)]

df_1 = pd.DataFrame({'Price_1' : Price_1,
                    'Volume_1' : Volume_1})
df_2 = pd.DataFrame({'Price_2' : Price_2,
                    'Volume_2' :Volume_2})

df_1_x, df_1_y = sns.distplot(df_1.Price_1, hist_kws={"weights":list(df_1.Volume_1)}).get_lines()[0].get_data()
df_2_x, df_2_y = sns.distplot(df_2.Price_2, hist_kws={"weights":list(df_2.Volume_2)}).get_lines()[0].get_data()

print((df_1_x == df_2_x).all())
print((df_1_y == df_2_y).all())

Why does this happen, and how can I fix this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

缱绻入梦 2025-01-29 18:26:23

是否使用权重，在这里不会有所作为。
主要问题是您在df_2_x，df_2_y = sns.distplot（df_2 ....）。get_lines（）[0] .get_data（）.get_data（）中再次提取第一个曲线。您需要第二个曲线：df_2_x，df_2_y = sns.distplot（df_2 ....）。get_lines（）[1] [1] .get_data（）。

请注意，Seaborn并不是真的要连接命令。有时它有效，但通常会增加很多混乱。例如sns.distPlot返回ax（代表子图）。图形元素（例如行）添加到该ax中。

另请注意，sns.distPlot已弃用。下一个版本之一将从Seaborn中删除。它被sns.histplot和sns.kdeplot取代。

这是代码的外观：

import pandas as pd
import random
import seaborn as sns
import matplotlib.pyplot as plt

Price_1 = [round(random.uniform(2, 12), 2) for i in range(30)]
Volume_1 = [round(random.uniform(100, 3000)) for i in range(30)]
Price_2 = [round(random.uniform(0, 10), 2) for i in range(30)]
Volume_2 = [round(random.uniform(100, 1500)) for i in range(30)]

df_1 = pd.DataFrame({'Price_1': Price_1,
                     'Volume_1': Volume_1})
df_2 = pd.DataFrame({'Price_2': Price_2,
                     'Volume_2': Volume_2})

ax = sns.histplot(x=df_1.Price_1, weights=list(df_1.Volume_1), bins=10, kde=True, kde_kws={'cut': 3})
sns.histplot(x=df_2.Price_2, weights=list(df_2.Volume_2), bins=10, kde=True, kde_kws={'cut': 3}, ax=ax)

df_1_x, df_1_y = ax.lines[0].get_data()
df_2_x, df_2_y = ax.lines[1].get_data()

# use fill_between to demonstrate where the extracted curves lie
ax.fill_between(df_1_x, 0, df_1_y, color='b', alpha=0.2)
ax.fill_between(df_2_x, 0, df_2_y, color='r', alpha=0.2)
plt.show()

Whether or not weight is used, doesn't make a difference here.
The principal problem is that you are extracting again the first curve in df_2_x, df_2_y = sns.distplot(df_2....).get_lines()[0].get_data(). You'd want the second curve instead: df_2_x, df_2_y = sns.distplot(df_2....).get_lines()[1].get_data().

Note that seaborn isn't really meant to concatenate commands. Sometimes it works, but it usually adds a lot of confusion. E.g. sns.distplot returns an ax (which represents a subplot). Graphical elements such as lines are added to that ax.

Also note that sns.distplot has been deprecated. It will be removed from Seaborn in one of the next versions. It is replaced by sns.histplot and sns.kdeplot.

Here is how the code could look like:

import pandas as pd
import random
import seaborn as sns
import matplotlib.pyplot as plt

Price_1 = [round(random.uniform(2, 12), 2) for i in range(30)]
Volume_1 = [round(random.uniform(100, 3000)) for i in range(30)]
Price_2 = [round(random.uniform(0, 10), 2) for i in range(30)]
Volume_2 = [round(random.uniform(100, 1500)) for i in range(30)]

df_1 = pd.DataFrame({'Price_1': Price_1,
                     'Volume_1': Volume_1})
df_2 = pd.DataFrame({'Price_2': Price_2,
                     'Volume_2': Volume_2})

ax = sns.histplot(x=df_1.Price_1, weights=list(df_1.Volume_1), bins=10, kde=True, kde_kws={'cut': 3})
sns.histplot(x=df_2.Price_2, weights=list(df_2.Volume_2), bins=10, kde=True, kde_kws={'cut': 3}, ax=ax)

df_1_x, df_1_y = ax.lines[0].get_data()
df_2_x, df_2_y = ax.lines[1].get_data()

# use fill_between to demonstrate where the extracted curves lie
ax.fill_between(df_1_x, 0, df_1_y, color='b', alpha=0.2)
ax.fill_between(df_2_x, 0, df_2_y, color='r', alpha=0.2)
plt.show()