为什么当存在以角色作为 ID 的变量时，使用香根草部署 tidymodel 会引发错误？

发布于 2025-01-13 00:32:57 字数 1278 浏览 1 评论 0原文

当模型包含一个角色为配方中 ID 的变量时，我无法使用香根草部署 tidymodel 并获得预测。请参阅图像中的以下错误：

{ "error": "500 - 内部服务器错误", "message": "错误：缺少以下必需列：'Fake_ID'。\n" 虚拟示例的代码

如下。我是否需要从模型和配方中删除 ID 变量才能使 Plumber API 正常工作？

#Load libraries
library(recipes)
library(parsnip)
library(workflows)
library(pins)
library(plumber)
library(stringi)



#Upload data
data(Sacramento, package = "modeldata")


#Create fake IDs for testing
Sacramento$Fake_ID <- stri_rand_strings(nrow(Sacramento), 10)


# Train model
Sacramento_recipe <- recipe(formula = price ~ type + sqft + beds + baths + zip + Fake_ID, data = Sacramento) %>% 
  update_role(Fake_ID, new_role = "ID") %>% 
  step_zv(all_predictors())

rf_spec <- rand_forest(mode = "regression") %>% set_engine("ranger")

rf_fit <-
  workflow() %>%
  add_model(rf_spec) %>%
  add_recipe(Sacramento_recipe) %>%
  fit(Sacramento)


# Create vetiver object
v <- vetiver::vetiver_model(rf_fit, "sacramento_rf")
v


# Allow for model versioning and sharing
model_board <- board_temp()
model_board %>% vetiver_pin_write(v)


# Deploying model
pr() %>%
  vetiver_api(v) %>%
  pr_run(port = 8088)

运行 Plumber API 示例

原文

I'm unable to deploy a tidymodel with vetiver and get a prediction when the model includes a variable with role as ID in the recipe. See the following error in the image:

{
"error": "500 - Internal server error",
"message": "Error: The following required columns are missing: 'Fake_ID'.\n"
}

The code for the dummy example is below.
Do I need to remove the ID-variable from both the model and recipe to make the Plumber API work?

#Load libraries
library(recipes)
library(parsnip)
library(workflows)
library(pins)
library(plumber)
library(stringi)



#Upload data
data(Sacramento, package = "modeldata")


#Create fake IDs for testing
Sacramento$Fake_ID <- stri_rand_strings(nrow(Sacramento), 10)


# Train model
Sacramento_recipe <- recipe(formula = price ~ type + sqft + beds + baths + zip + Fake_ID, data = Sacramento) %>% 
  update_role(Fake_ID, new_role = "ID") %>% 
  step_zv(all_predictors())

rf_spec <- rand_forest(mode = "regression") %>% set_engine("ranger")

rf_fit <-
  workflow() %>%
  add_model(rf_spec) %>%
  add_recipe(Sacramento_recipe) %>%
  fit(Sacramento)


# Create vetiver object
v <- vetiver::vetiver_model(rf_fit, "sacramento_rf")
v


# Allow for model versioning and sharing
model_board <- board_temp()
model_board %>% vetiver_pin_write(v)


# Deploying model
pr() %>%
  vetiver_api(v) %>%
  pr_run(port = 8088)

Running the example of the Plumber API

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

撩起发的微风 2025-01-20 00:32:57

截至今天，香根草寻找“模具”workflows::extract_mold(rf_fit) 并且只取出预测变量来创建 ptype。但是，当您从工作流程中进行预测时，它确实需要所有变量，包括非预测变量。如果您已经使用非预测变量训练了模型，那么从今天开始，您可以通过传入自定义 ptype 来使 API 工作：

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(parsnip)
library(workflows)
library(pins)
library(plumber)
library(stringi)

data(Sacramento, package = "modeldata")
Sacramento$Fake_ID <- stri_rand_strings(nrow(Sacramento), 10)


Sacramento_recipe <- 
    recipe(formula = price ~ type + sqft + beds + baths + zip + Fake_ID, 
           data = Sacramento) %>% 
    update_role(Fake_ID, new_role = "ID") %>% 
    step_zv(all_predictors())

rf_spec <- rand_forest(mode = "regression") %>% set_engine("ranger")

rf_fit <-
    workflow() %>%
    add_model(rf_spec) %>%
    add_recipe(Sacramento_recipe) %>%
    fit(Sacramento)


library(vetiver)
## this is probably easiest because this model uses a simple formula
## if there is more complex preprocessing, select the variables
## from `Sacramento` via dplyr or similar
sac_ptype <- extract_recipe(rf_fit) %>% 
    bake(new_data = Sacramento, -all_outcomes()) %>% 
    vctrs::vec_ptype()

v <- vetiver_model(rf_fit, "sacramento_rf", save_ptype = sac_ptype)
v
#> 
#> ── sacramento_rf ─ <butchered_workflow> model for deployment 
#> A ranger regression modeling workflow using 6 features

pr() %>%
    vetiver_api(v)
#> # Plumber router with 2 endpoints, 4 filters, and 0 sub-routers.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/ping (GET)
#> └──/predict (POST)

^{创建于 2022 年 3 月 10 日，由 reprex 包 (v2.0.1)}

您是否正在使用非预测变量训练生产模型？您介意在 GitHub 上提出问题来进一步解释您的用例吗？

As of today, vetiver looks for the "mold" workflows::extract_mold(rf_fit) and only get the predictors out to create the ptype. But then when you predict from a workflow, it does require all the variables, including non-predictors. If you have trained a model with non-predictors, as of today you can make the API work by passing in a custom ptype:

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(parsnip)
library(workflows)
library(pins)
library(plumber)
library(stringi)

data(Sacramento, package = "modeldata")
Sacramento$Fake_ID <- stri_rand_strings(nrow(Sacramento), 10)


Sacramento_recipe <- 
    recipe(formula = price ~ type + sqft + beds + baths + zip + Fake_ID, 
           data = Sacramento) %>% 
    update_role(Fake_ID, new_role = "ID") %>% 
    step_zv(all_predictors())

rf_spec <- rand_forest(mode = "regression") %>% set_engine("ranger")

rf_fit <-
    workflow() %>%
    add_model(rf_spec) %>%
    add_recipe(Sacramento_recipe) %>%
    fit(Sacramento)


library(vetiver)
## this is probably easiest because this model uses a simple formula
## if there is more complex preprocessing, select the variables
## from `Sacramento` via dplyr or similar
sac_ptype <- extract_recipe(rf_fit) %>% 
    bake(new_data = Sacramento, -all_outcomes()) %>% 
    vctrs::vec_ptype()

v <- vetiver_model(rf_fit, "sacramento_rf", save_ptype = sac_ptype)
v
#> 
#> ── sacramento_rf ─ <butchered_workflow> model for deployment 
#> A ranger regression modeling workflow using 6 features

pr() %>%
    vetiver_api(v)
#> # Plumber router with 2 endpoints, 4 filters, and 0 sub-routers.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/ping (GET)
#> └──/predict (POST)

^{Created on 2022-03-10 by the reprex package (v2.0.1)}

Are you training models for production with non-predictor variables? Would you mind opening an issue on GitHub to explain your use case a little more?

回复收藏 0 原文

~没有更多了~