If you do the partial dependence plot of column a and you want to interpret the y value at x = 0.0, the y-axis value represent the average probability of class 1 computed by
changing value of column a in all rows in your dataset to 0.0
Generally speaking, we can produce a classifier from a function, f, producing a real-value output plus a threshold. We call the output an 'activation'. If the activation meets a threshold condition is met, the we say the class is detected:
is_class := ( f(x0, x1, ...) > threshold )
and
activation = f(x0, x1, ...)
PDP plots simply show activation values as they change in response to changes in an input value (we ignore the threshold). That is might plot: f(x0, x, x2, x3, ...)
as a single input x varies. Typically, we hold the others constant, although we can also plot in 2d and 3d.
Sometimes we're interested in:
how a single change the activation
how multiple inputs independently change the activation
how multiple activations change based on different inputs, and so on.
Strictly speaking, we need not even be talking about a classifier when looking a PDP plots. Any function that productions a real-value output (an activation) in response to one of more real-valued feature inputs that we can vary allows us to produce PDP plots.
Classifier activations need not be, and often should not be, interpreted as probabilities, as others have written. In very many cases, this is simply just incorrect. Nevertheless, the analysis of the activation levels is of interest to us, independently of whether the activations represent probabilities: in PDP plots, we can see, for example, which feature values produce strong change - more horizontal plots may imply a worthless feature.
Similarly, in RoC plots, we explicitly examine information about the true-positive and false-position detection rates that result for varying the threshold of activation values.
In both cases, there's no necessity that the classifier produce probabilities as its activation.
Interpretation of PDP plots is fraught with dangers. At a minimum, you need to be clear about what is being held constant as a input feature is varied. Were the other features set to zero (a good choice for linear models)? Did we the set them to their most common values in the test set? Or the most common values for a known class in a sample? Without this information, the vertical axis may be less helpful.
Knowing that an activation is a probability also doesn't seem to helpful in PDP plots -- you can't expect the area under it to sum to one. Perhaps the most useful thing you might find is error cases, where output probabilities are not in the range 0..1.
发布评论
评论(2)
如果您执行列
a
的部分依赖图,并且要解释y值x = 0.0
,则y轴值表示类1 由
0.0
我可能不擅长解释,但是您可以在。希望这个帮助:)
If you do the partial dependence plot of column
a
and you want to interpret the y value atx = 0.0
, the y-axis value represent the average probability of class1
computed bya
in all rows in your dataset to0.0
I may not good at explaining but you can read more about PDP at https://christophm.github.io/interpretable-ml-book/pdp.html. Hope this help :)
一般而言,我们可以从函数中产生分类器,
f
,产生真实价值输出加上阈值。我们将输出称为“激活”。如果激活符合阈值条件,我们说该类已检测到:is_class:=(f(x0,x1,...)> threshold)
和
activation = f(x0,x1,...)
pdp图只是显示激活值,因为它们会对输入值的变化而变化(我们忽略阈值)。那可能会绘制:
f(x0,x,x2,x3,...)
作为单个输入
X
的变化。通常,我们将其他人保持不变,尽管我们也可以在2D和3D中绘制。有时我们对我们感兴趣:
严格来说,在查看PDP图时,我们甚至不必谈论分类器。生产实际价值输出(激活)的任何功能,以响应我们可以改变的更真实的功能输入之一,从而使我们能够产生PDP图。
分类器的激活不必像其他人所写的那样被解释为概率。在很多情况下,这只是不正确。然而,对激活水平的分析对我们来说是感兴趣的,独立地激活是否代表概率:在PDP图中,我们可以看到,例如,哪些特征值会产生强大的变化 - 更加水平图可能意味着一个毫无价值的功能。
同样,在ROC图中,我们明确检查有关改变激活值阈值的真实阳性和假位检测率的信息。
在这两种情况下,分类器都没有必要产生概率作为激活。
PDP阴谋的解释充满了危险。至少,您需要清楚随着输入功能的变化而保持恒定。其他功能是否设置为零(线性模型的一个不错选择)?我们是否将它们设置为测试集中最常见的值?还是样本中已知类别的最常见值?没有这些信息,垂直轴可能会较小。
知道激活是一种概率,似乎也对PDP图也没有帮助 - 您不能指望其下方的区域总和为一个。也许您可能发现的最有用的是错误情况,其中输出概率不在范围0..1。
Generally speaking, we can produce a classifier from a function,
f
, producing a real-value output plus a threshold. We call the output an 'activation'. If the activation meets a threshold condition is met, the we say the class is detected:is_class := ( f(x0, x1, ...) > threshold )
and
activation = f(x0, x1, ...)
PDP plots simply show activation values as they change in response to changes in an input value (we ignore the threshold). That is might plot:
f(x0, x, x2, x3, ...)
as a single input
x
varies. Typically, we hold the others constant, although we can also plot in 2d and 3d.Sometimes we're interested in:
Strictly speaking, we need not even be talking about a classifier when looking a PDP plots. Any function that productions a real-value output (an activation) in response to one of more real-valued feature inputs that we can vary allows us to produce PDP plots.
Classifier activations need not be, and often should not be, interpreted as probabilities, as others have written. In very many cases, this is simply just incorrect. Nevertheless, the analysis of the activation levels is of interest to us, independently of whether the activations represent probabilities: in PDP plots, we can see, for example, which feature values produce strong change - more horizontal plots may imply a worthless feature.
Similarly, in RoC plots, we explicitly examine information about the true-positive and false-position detection rates that result for varying the threshold of activation values.
In both cases, there's no necessity that the classifier produce probabilities as its activation.
Interpretation of PDP plots is fraught with dangers. At a minimum, you need to be clear about what is being held constant as a input feature is varied. Were the other features set to zero (a good choice for linear models)? Did we the set them to their most common values in the test set? Or the most common values for a known class in a sample? Without this information, the vertical axis may be less helpful.
Knowing that an activation is a probability also doesn't seem to helpful in PDP plots -- you can't expect the area under it to sum to one. Perhaps the most useful thing you might find is error cases, where output probabilities are not in the range 0..1.