H2O 与 Sparklyr 配合使用时遇到问题

发布于 2025-01-17 04:42:01 字数 2541 浏览 0 评论 0原文

我正在尝试让 H2O 在我的 Spark 集群(纱线)上与 Sparklyr 配合使用

spark_version(sc) = 2.4.4 我的 Spark 集群正在运行 V2.4.4

根据 页面与我的 Spark 兼容的版本是 Sparkling Water 2.4.5,H2O 版本是 rel-xu 补丁版本 3。但是,当我安装此版本时,我提示将我的 H2O 安装更新到下一个版本 (REL-ZORN)。 H2O 指南和 Sparklyr 指南之间有时非常令人困惑和矛盾。

输入图片此处描述

由于这是纱线部署而不是本地部署,因此不幸的是我无法提供代表来帮助排除故障。

url <- "http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/5/sparkling-water-2.4.5.zip"

download.file(url = url,"sparkling-water-2.4.5.zip")

unzip("sparkling-water-2.4.5.zip")

# RUN THESE CMDs FROM THE TERMINAL
cd sparkling-water-2.4.5
bin/sparkling-shell --conf "spark.executor.memory=1g"

# RUN THESE FROM WITHIN RSTUDIO
install.packages("sparklyr")
library(sparklyr)

# REMOVE PRIOR INSTALLS OF H2O
detach("package:rsparkling", unload = TRUE)
if ("package:h2o" %in% search()) { detach("package:h2o", unload = TRUE) }
if (isNamespaceLoaded("h2o")){ unloadNamespace("h2o") }
remove.packages("h2o")

# INSTALLING REL-ZORN (3.36.0.3) WHICH IS REQUIRED FOR SPARKLING WATER 3.36.0.3
install.packages("h2o", type = "source", repos = "https://h2o-release.s3.amazonaws.com/h2o/rel-zorn/3/R")

# INSTALLING FROM S3 SINCE CRAN NO LONGER SUPPORTED
install.packages("rsparkling", type = "source", repos = "http://h2o-release.s3.amazonaws.com/sparkling-water/spark-2.4/3.36.0.3-1-2.4/R")

# AS PER THE GUIDE
options(rsparkling.sparklingwater.version = "2.4.5")
library(rsparkling)

# SPECIFY THE CONFIGURATION
config <- sparklyr::spark_config()
config[["spark.yarn.queue"]] <- "my_data_science_queue"
config[["sparklyr.backend.timeout"]] <- 36000
config[["spark.executor.cores"]] <- 32
config[["spark.driver.cores"]] <- 32
config[["spark.executor.memory"]] <- "40g"
config[["spark.executor.instances"]] <- 8
config[["sparklyr.shell.driver-memory"]] <- "16g"
config[["spark.default.parallelism"]] <- "8"
config[["spark.rpc.message.maxSize"]] <- "256"

# MAKE A SPARK CONNECTION
sc <- sparklyr::spark_connect(
  master = "yarn",
  spark_home = "/opt/mapr/spark/spark",
  config = config,
  log = "console",
  version = "2.4.4"
)

当我尝试使用下一个块建立 H2O 上下文时,我收到以下错误

h2o_context(sc)

Error in h2o_context(sc) : could not find function "h2o_context"

任何有关我出错的地方的指示将不胜感激。

I am trying to get H2O working with Sparklyr on my spark cluster (yarn)

spark_version(sc) = 2.4.4
My spark cluster is running V2.4.4

According to this page the compatible version with my spark is 2.4.5 for Sparkling Water and the H2O release is rel-xu patch version 3. However when I install this version I am prompted to update my H2O install to the next release (REL-ZORN). Between the H2O guides and the sparklyr guides it's very confusing and contradictory at times.

enter image description here

Since this is a yarn deployment and not local, unfortunately I can't provide a repex to help with trobleshooting.

url <- "http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/5/sparkling-water-2.4.5.zip"

download.file(url = url,"sparkling-water-2.4.5.zip")

unzip("sparkling-water-2.4.5.zip")

# RUN THESE CMDs FROM THE TERMINAL
cd sparkling-water-2.4.5
bin/sparkling-shell --conf "spark.executor.memory=1g"

# RUN THESE FROM WITHIN RSTUDIO
install.packages("sparklyr")
library(sparklyr)

# REMOVE PRIOR INSTALLS OF H2O
detach("package:rsparkling", unload = TRUE)
if ("package:h2o" %in% search()) { detach("package:h2o", unload = TRUE) }
if (isNamespaceLoaded("h2o")){ unloadNamespace("h2o") }
remove.packages("h2o")

# INSTALLING REL-ZORN (3.36.0.3) WHICH IS REQUIRED FOR SPARKLING WATER 3.36.0.3
install.packages("h2o", type = "source", repos = "https://h2o-release.s3.amazonaws.com/h2o/rel-zorn/3/R")

# INSTALLING FROM S3 SINCE CRAN NO LONGER SUPPORTED
install.packages("rsparkling", type = "source", repos = "http://h2o-release.s3.amazonaws.com/sparkling-water/spark-2.4/3.36.0.3-1-2.4/R")

# AS PER THE GUIDE
options(rsparkling.sparklingwater.version = "2.4.5")
library(rsparkling)

# SPECIFY THE CONFIGURATION
config <- sparklyr::spark_config()
config[["spark.yarn.queue"]] <- "my_data_science_queue"
config[["sparklyr.backend.timeout"]] <- 36000
config[["spark.executor.cores"]] <- 32
config[["spark.driver.cores"]] <- 32
config[["spark.executor.memory"]] <- "40g"
config[["spark.executor.instances"]] <- 8
config[["sparklyr.shell.driver-memory"]] <- "16g"
config[["spark.default.parallelism"]] <- "8"
config[["spark.rpc.message.maxSize"]] <- "256"

# MAKE A SPARK CONNECTION
sc <- sparklyr::spark_connect(
  master = "yarn",
  spark_home = "/opt/mapr/spark/spark",
  config = config,
  log = "console",
  version = "2.4.4"
)

When I try to establish a H2O context using the next chunk I get the following error

h2o_context(sc)

Error in h2o_context(sc) : could not find function "h2o_context"

Any pointers as to where I'm going wrong would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

佼人 2025-01-24 04:42:01

请参阅本教程。较新版本的 Rsparkling 使用 {H2OContext.getOrCreate(h2oConf)} 而不是 {h2o_context(sc)}。

See this tutorial please. The newer versions of Rsparkling use {H2OContext.getOrCreate(h2oConf)} instead of {h2o_context(sc)}.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文