按物种划分的标准差图 - python
我正在尝试为物种开发一个标准的开发图,但所有相等线的结果图并没有多大意义。有人可以告诉我发生这种情况是因为我做错了什么还是之前没有做吗?
我也不明白为什么每个物种有 50 个却达到了 14 个
from sklearn.datasets import load_iris
import pandas as pd
import seaborn as sns
iris = load_iris()
iris_df=pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['species_id'] = iris.target
iris_df['species_id'] = iris_df['species_id'].replace([0,1,2],iris.target_names)
iris_df['x_pos'] = np.arange(len(iris_df))
print(iris_df)
plt.figure(figsize=(10,5))
ax = sns.barplot(x = "species_id", y = "x_pos", data = iris_df, estimator = np.std)
ax.set_xlabel("Frequency", fontsize = 10)
ax.set_ylabel("Species", fontsize = 10)
ax.set_title("Standard Deviation of Species", fontsize = 15)
I'm trying to develop a standard dev plot for species but resulting graph for all equal lines doesn't really make much sense. Could someone let me know if this happens because of something I'm doing wrong or just not doing previously?
And I don't get it either why they're reaching 14 when it's 50 for each specie
from sklearn.datasets import load_iris
import pandas as pd
import seaborn as sns
iris = load_iris()
iris_df=pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['species_id'] = iris.target
iris_df['species_id'] = iris_df['species_id'].replace([0,1,2],iris.target_names)
iris_df['x_pos'] = np.arange(len(iris_df))
print(iris_df)
plt.figure(figsize=(10,5))
ax = sns.barplot(x = "species_id", y = "x_pos", data = iris_df, estimator = np.std)
ax.set_xlabel("Frequency", fontsize = 10)
ax.set_ylabel("Species", fontsize = 10)
ax.set_title("Standard Deviation of Species", fontsize = 15)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的论点
y=x_pos
是这里的问题,因为要评估的数据(例如 setosa)将是[0,1,..., 49, 50]
,这会导致标准差为np.std(range(50)) = 14.43
。对于np.std(range(50,100)) = 14.43
和np.std(range(100,150)) = 14.43
也是如此。您想要做的是获取按物种进行的每次测量的标准偏差。这可以通过完成
并产生一些漂亮的图
请注意,
seaborn.barplot
不支持参数y
的多个列名称。如果你愿意,你可以在可能的情况下使用 pandas 重写整个过程。导致
you argument
y=x_pos
is the problem here as the data to evaluate for example for setosa would be[0,1,..., 49, 50]
which results in a standard deviation ofnp.std(range(50)) = 14.43
. The same holds fornp.std(range(50,100)) = 14.43
andnp.std(range(100,150)) = 14.43
.What you want to do is get the standard deviation for each measurement by species. This can be done via
and results in some nice looking plots
Note that
seaborn.barplot
does not support multiple column names for the parametery
. If you wanted you could rewrite the whole thing using pandas where it would be possible.resulting in
每行
x_pos
增加 1。数据集按物种排序,&每个物种有 50 个测量值,因此对于每个物种,您将获得相同的标准差。下面的图有助于解释原因:
从 0 到 49 的一系列整数的标准差与从 50 到 99 的一系列整数的标准差相同,所以 在。
更有趣的图是任何特征的标准差。例如:
x_pos
increases by 1 for each row. the dataset is ordered by species, & there are 50 measurements per species, so for each species, you'll get the same standard deviation.the following plot would help to explain why:
the standard deviation of a series of integers from 0 to 49 is the same as the standard deviation of a series of integers from 50 to 99 and so on.
More interesting plots would be the standard deviation of any feature. example: