如何为我的散点图创建一个与图中使用的颜色匹配的传奇？

发布于 2025-02-05 21:09:32 字数 1786 浏览 1 评论 0原文

我已经使用matplotlib.pyplot创建了一个散点图（实际上是两个相似的子图），我正在用于口测文本分析。我用来制作图的代码如下：

import matplotlib.pyplot as plt
import numpy as np

clusters = 4
two_d_matrix = np.array([[0.00617068, -0.53451777], [-0.01837677, -0.47131886], ...])
my_labels = [0, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]

fig, (plot1, plot2) = plt.subplots(1, 2, sharex=False, sharey=False, figsize=(20, 10))

plot1.axhline(0, color='#afafaf')
plot1.axvline(0, color='#afafaf')
for i in range(clusters):
    try:
        plot1.scatter(two_d_matrix[i:, 0], two_d_matrix[i:, 1], s=30, c=my_labels, cmap='viridis')
    except (KeyError, ValueError) as e:
        pass
plot1.legend(my_labels)
plot1.set_title("My First Plot")

plot2.axhline(0, color='#afafaf')
plot2.axvline(0, color='#afafaf')
for i in range(clusters):
    try:
        plot2.scatter(two_d_matrix[i:, 0], two_d_matrix[i:, 1], s=30, c=my_labels, cmap='viridis')
    except (KeyError, ValueError) as e:
        pass
plot2.legend(my_labels)
plot2.set_title("My Second Plot")

plt.show()

因为my_labels中有四个不同的值，图上有四种颜色，所以这些颜色应与我期望找到的四个群集相对应。

”

问题是，传说只有三个值，对应于my_labels中的前三个值。看来，传奇不是显示每种颜色的钥匙，而是针对每个轴，然后是其中一种颜色。这意味着图中出现的颜色与传说中出现的颜色不匹配，因此传说不准确。我不知道为什么会发生这种情况。

理想情况下，传说应在my_labels中为每个唯一值显示一种颜色，因此看起来应该像这样：

准确显示其应显示的所有值的传说，即图片中出现的每种颜色的传说？

原文

I've created a scatter plot (actually two similar subplots) using matplotlib.pyplot which I'm using for stylometric text analysis. The code I'm using to make the plot is as follows:

import matplotlib.pyplot as plt
import numpy as np

clusters = 4
two_d_matrix = np.array([[0.00617068, -0.53451777], [-0.01837677, -0.47131886], ...])
my_labels = [0, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]

fig, (plot1, plot2) = plt.subplots(1, 2, sharex=False, sharey=False, figsize=(20, 10))

plot1.axhline(0, color='#afafaf')
plot1.axvline(0, color='#afafaf')
for i in range(clusters):
    try:
        plot1.scatter(two_d_matrix[i:, 0], two_d_matrix[i:, 1], s=30, c=my_labels, cmap='viridis')
    except (KeyError, ValueError) as e:
        pass
plot1.legend(my_labels)
plot1.set_title("My First Plot")

plot2.axhline(0, color='#afafaf')
plot2.axvline(0, color='#afafaf')
for i in range(clusters):
    try:
        plot2.scatter(two_d_matrix[i:, 0], two_d_matrix[i:, 1], s=30, c=my_labels, cmap='viridis')
    except (KeyError, ValueError) as e:
        pass
plot2.legend(my_labels)
plot2.set_title("My Second Plot")

plt.show()

Because there are four distinct values in my_labels there are four colours which appear on the plot, these should correspond to the four clusters I expected to find.

The problem is that the legend only has three values, corresponding to the first three values in my_labels. It also appears that the legend isn't displaying a key for each colour, but for each of the axes and then for one of the colours. This means that the colours appearing in the plot are not matched to what appears in the legend, so the legend is inaccurate. I have no idea why this is happening.

Ideally, the legend should display one colour for each unique value in my_labels, so it should look like this:

How can I get the legend to accurately display all the values it should be showing, i.e. one for each colour which appears in the plot?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小糖芽 2025-02-12 21:09:32

在调用plot1.legend或plot2.legend2.legend之前代码> axvline （并且与plot2.axhline或plot2.axvline。）这将确保它不会干扰绘制散点点的传说而且也不标记这些行。

要获取所有类别散点点的标签，您必须调用plot1.scatter或plot2.scatt.scatter通过传递标签并仅从two_d_d_matrix选择值其索引与my_labels中的标签索引匹配。

您可以如下：

import matplotlib.pyplot as plt
import numpy as np

# Generate some (pseudo) random data which is reproducible
generator = np.random.default_rng(seed=121)
matrix = generator.uniform(size=(40, 2))
matrix = np.sort(matrix)

clusters = 4
my_labels = np.array([0, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

fig, ax = plt.subplots(1, 1)

# Select data points wisely
for i in range(clusters):
    pos = np.where(my_labels == i)
    ax.scatter(matrix[pos, 0], matrix[pos, 1], s=30, cmap='viridis', label=i)  

ax.axhline(0, color='#afafaf', label=None)
ax.axvline(0, color='#afafaf', label=None)

ax.legend()
ax.set_title("Expected output")
plt.show()

”“在此处输入图像说明”

当前输出和预期输出的比较

观察数据点的选择（在 loops 内完成在下面的代码中）影响输出：

代码：

import matplotlib.pyplot as plt
import numpy as np

# Generate some (pseudo) random data which is reproducible
generator = np.random.default_rng(seed=121)
matrix = generator.uniform(size=(40, 2))
matrix = np.sort(matrix)

clusters = 4
my_labels = np.array([0, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

fig, ax = plt.subplots(1, 2)

# Question plot
for i in range(clusters):
    ax[0].scatter(matrix[i:, 0], matrix[i:, 1], s=30, cmap='viridis', label=i)  

ax[0].axhline(0, color='#afafaf', label=None)
ax[0].axvline(0, color='#afafaf', label=None)

ax[0].legend()
ax[0].set_title("Current output (with label = None)")

# Answer plot
for i in range(clusters):
    pos = np.where(my_labels == i) # <- choose index of data points based on label position in my_labels
    ax[1].scatter(matrix[pos, 0], matrix[pos, 1], s=30, cmap='viridis', label=i)  

ax[1].axhline(0, color='#afafaf', label=None)
ax[1].axvline(0, color='#afafaf', label=None)

ax[1].legend()
ax[1].set_title("Expected output")

plt.show()

Before calling plot1.legend or plot2.legend, you can pass label = None to plot1.axhline or axvline (and similarly to plot2.axhline or plot2.axvline.) This will make sure it doesn't interfere with plotting legends of the scatter points and also not label those lines.

To get labels for all categories of scatter points, you'll have to call plot1.scatter or plot2.scatter by passing the label and choosing only values from two_d_matrix whose index matches with the index of label in my_labels.

You can do it as follows:

import matplotlib.pyplot as plt
import numpy as np

# Generate some (pseudo) random data which is reproducible
generator = np.random.default_rng(seed=121)
matrix = generator.uniform(size=(40, 2))
matrix = np.sort(matrix)

clusters = 4
my_labels = np.array([0, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

fig, ax = plt.subplots(1, 1)

# Select data points wisely
for i in range(clusters):
    pos = np.where(my_labels == i)
    ax.scatter(matrix[pos, 0], matrix[pos, 1], s=30, cmap='viridis', label=i)  

ax.axhline(0, color='#afafaf', label=None)
ax.axvline(0, color='#afafaf', label=None)

ax.legend()
ax.set_title("Expected output")
plt.show()

This gives:

Comparison of current output and expected output

Observe how data points selection (done inside the for loops in the code below) affects the output:

Code:

import matplotlib.pyplot as plt
import numpy as np

# Generate some (pseudo) random data which is reproducible
generator = np.random.default_rng(seed=121)
matrix = generator.uniform(size=(40, 2))
matrix = np.sort(matrix)

clusters = 4
my_labels = np.array([0, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

fig, ax = plt.subplots(1, 2)

# Question plot
for i in range(clusters):
    ax[0].scatter(matrix[i:, 0], matrix[i:, 1], s=30, cmap='viridis', label=i)  

ax[0].axhline(0, color='#afafaf', label=None)
ax[0].axvline(0, color='#afafaf', label=None)

ax[0].legend()
ax[0].set_title("Current output (with label = None)")

# Answer plot
for i in range(clusters):
    pos = np.where(my_labels == i) # <- choose index of data points based on label position in my_labels
    ax[1].scatter(matrix[pos, 0], matrix[pos, 1], s=30, cmap='viridis', label=i)  

ax[1].axhline(0, color='#afafaf', label=None)
ax[1].axvline(0, color='#afafaf', label=None)

ax[1].legend()
ax[1].set_title("Expected output")

plt.show()

回复收藏 0 原文

~没有更多了~