如何从 SparklyR 中的模型中提取 feature_importances?
我想从 SparklyR 中的模型中提取 feature_importances
。到目前为止,我有以下正在运行的可重现代码:
library(sparklyr)
library(dplyr)
sc <- spark_connect(method = "databricks")
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"Chinese Macao",
"Tokyo Japan Chinese"),
doc_id = 1:4,
class = c(1, 1, 1, 0))
dtrain_spark <- copy_to(sc, dtrain, overwrite = TRUE)
pipeline <- ml_pipeline(
ft_tokenizer(sc, input_col = "text", output_col = "tokens"),
ft_count_vectorizer(sc, input_col = 'tokens', output_col = 'myvocab'),
ml_decision_tree_classifier(sc, label_col = "class",
features_col = "myvocab",
prediction_col = "pcol",
probability_col = "prcol",
raw_prediction_col = "rpcol")
)
model <- ml_fit(pipeline, dtrain_spark)
当我尝试运行下面的 ml_stage
步骤时,我发现我无法提取 feature_importances
向量,而是提取它是一个函数。之前的帖子(如何在 Sparklyr 中提取特征重要性?)将其显示为我想获得的向量。我在这里可能有什么错误?我还需要采取另一个步骤来解开函数并在此处获取值向量吗?
ml_stage(model, 3)$feature_importances
这是我对 ml_stage
的输出(而不是值向量):
function (...)
{
tryCatch(.f(...), error = function(e) {
if (!quiet)
message("Error: ", e$message)
otherwise
}, interrupt = function(e) {
stop("Terminated by user", call. = FALSE)
})
}
<bytecode: 0x559a0d438278>
<environment: 0x559a0ce8e840>
I would like to extract feature_importances
from my model in SparklyR. So far I have the following reproducible code that is working:
library(sparklyr)
library(dplyr)
sc <- spark_connect(method = "databricks")
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"Chinese Macao",
"Tokyo Japan Chinese"),
doc_id = 1:4,
class = c(1, 1, 1, 0))
dtrain_spark <- copy_to(sc, dtrain, overwrite = TRUE)
pipeline <- ml_pipeline(
ft_tokenizer(sc, input_col = "text", output_col = "tokens"),
ft_count_vectorizer(sc, input_col = 'tokens', output_col = 'myvocab'),
ml_decision_tree_classifier(sc, label_col = "class",
features_col = "myvocab",
prediction_col = "pcol",
probability_col = "prcol",
raw_prediction_col = "rpcol")
)
model <- ml_fit(pipeline, dtrain_spark)
When I try to run the ml_stage
step below, I find that I cannot extract a vector of feature_importances
, but rather it is a function. A prior post (how to extract the feature importances in Sparklyr?) displays it as a vector which I would like to obtain. What could be my error here? Is there another step I need to take to unwrap the function and get a vector of values here?
ml_stage(model, 3)$feature_importances
Here is what my output to the ml_stage
looks like (instead of a vector of values):
function (...)
{
tryCatch(.f(...), error = function(e) {
if (!quiet)
message("Error: ", e$message)
otherwise
}, interrupt = function(e) {
stop("Terminated by user", call. = FALSE)
})
}
<bytecode: 0x559a0d438278>
<environment: 0x559a0ce8e840>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不确定这是否是您想要的,但可以结合矢量化模型和词汇来提取模型的
feature_importances
,这将生成一个包含文本重要性的表格。您可以使用以下代码:I am not sure if this is what you want, but could combine the vectorizer model and vocaculary to extract the
feature_importances
of your model which will results in a table with the importances of your text. You could use the following code: