H2O 与 Sparklyr 配合使用时遇到问题
我正在尝试让 H2O 在我的 Spark 集群(纱线)上与 Sparklyr 配合使用
spark_version(sc) = 2.4.4
我的 Spark 集群正在运行 V2.4.4
根据 此页面与我的 Spark 兼容的版本是 Sparkling Water 2.4.5,H2O 版本是 rel-xu 补丁版本 3。但是,当我安装此版本时,我提示将我的 H2O 安装更新到下一个版本 (REL-ZORN)。 H2O 指南和 Sparklyr 指南之间有时非常令人困惑和矛盾。
由于这是纱线部署而不是本地部署,因此不幸的是我无法提供代表来帮助排除故障。
url <- "http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/5/sparkling-water-2.4.5.zip"
download.file(url = url,"sparkling-water-2.4.5.zip")
unzip("sparkling-water-2.4.5.zip")
# RUN THESE CMDs FROM THE TERMINAL
cd sparkling-water-2.4.5
bin/sparkling-shell --conf "spark.executor.memory=1g"
# RUN THESE FROM WITHIN RSTUDIO
install.packages("sparklyr")
library(sparklyr)
# REMOVE PRIOR INSTALLS OF H2O
detach("package:rsparkling", unload = TRUE)
if ("package:h2o" %in% search()) { detach("package:h2o", unload = TRUE) }
if (isNamespaceLoaded("h2o")){ unloadNamespace("h2o") }
remove.packages("h2o")
# INSTALLING REL-ZORN (3.36.0.3) WHICH IS REQUIRED FOR SPARKLING WATER 3.36.0.3
install.packages("h2o", type = "source", repos = "https://h2o-release.s3.amazonaws.com/h2o/rel-zorn/3/R")
# INSTALLING FROM S3 SINCE CRAN NO LONGER SUPPORTED
install.packages("rsparkling", type = "source", repos = "http://h2o-release.s3.amazonaws.com/sparkling-water/spark-2.4/3.36.0.3-1-2.4/R")
# AS PER THE GUIDE
options(rsparkling.sparklingwater.version = "2.4.5")
library(rsparkling)
# SPECIFY THE CONFIGURATION
config <- sparklyr::spark_config()
config[["spark.yarn.queue"]] <- "my_data_science_queue"
config[["sparklyr.backend.timeout"]] <- 36000
config[["spark.executor.cores"]] <- 32
config[["spark.driver.cores"]] <- 32
config[["spark.executor.memory"]] <- "40g"
config[["spark.executor.instances"]] <- 8
config[["sparklyr.shell.driver-memory"]] <- "16g"
config[["spark.default.parallelism"]] <- "8"
config[["spark.rpc.message.maxSize"]] <- "256"
# MAKE A SPARK CONNECTION
sc <- sparklyr::spark_connect(
master = "yarn",
spark_home = "/opt/mapr/spark/spark",
config = config,
log = "console",
version = "2.4.4"
)
当我尝试使用下一个块建立 H2O 上下文时,我收到以下错误
h2o_context(sc)
Error in h2o_context(sc) : could not find function "h2o_context"
任何有关我出错的地方的指示将不胜感激。
I am trying to get H2O working with Sparklyr on my spark cluster (yarn)
spark_version(sc) = 2.4.4
My spark cluster is running V2.4.4
According to this page the compatible version with my spark is 2.4.5 for Sparkling Water and the H2O release is rel-xu patch version 3. However when I install this version I am prompted to update my H2O install to the next release (REL-ZORN). Between the H2O guides and the sparklyr guides it's very confusing and contradictory at times.
Since this is a yarn deployment and not local, unfortunately I can't provide a repex to help with trobleshooting.
url <- "http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/5/sparkling-water-2.4.5.zip"
download.file(url = url,"sparkling-water-2.4.5.zip")
unzip("sparkling-water-2.4.5.zip")
# RUN THESE CMDs FROM THE TERMINAL
cd sparkling-water-2.4.5
bin/sparkling-shell --conf "spark.executor.memory=1g"
# RUN THESE FROM WITHIN RSTUDIO
install.packages("sparklyr")
library(sparklyr)
# REMOVE PRIOR INSTALLS OF H2O
detach("package:rsparkling", unload = TRUE)
if ("package:h2o" %in% search()) { detach("package:h2o", unload = TRUE) }
if (isNamespaceLoaded("h2o")){ unloadNamespace("h2o") }
remove.packages("h2o")
# INSTALLING REL-ZORN (3.36.0.3) WHICH IS REQUIRED FOR SPARKLING WATER 3.36.0.3
install.packages("h2o", type = "source", repos = "https://h2o-release.s3.amazonaws.com/h2o/rel-zorn/3/R")
# INSTALLING FROM S3 SINCE CRAN NO LONGER SUPPORTED
install.packages("rsparkling", type = "source", repos = "http://h2o-release.s3.amazonaws.com/sparkling-water/spark-2.4/3.36.0.3-1-2.4/R")
# AS PER THE GUIDE
options(rsparkling.sparklingwater.version = "2.4.5")
library(rsparkling)
# SPECIFY THE CONFIGURATION
config <- sparklyr::spark_config()
config[["spark.yarn.queue"]] <- "my_data_science_queue"
config[["sparklyr.backend.timeout"]] <- 36000
config[["spark.executor.cores"]] <- 32
config[["spark.driver.cores"]] <- 32
config[["spark.executor.memory"]] <- "40g"
config[["spark.executor.instances"]] <- 8
config[["sparklyr.shell.driver-memory"]] <- "16g"
config[["spark.default.parallelism"]] <- "8"
config[["spark.rpc.message.maxSize"]] <- "256"
# MAKE A SPARK CONNECTION
sc <- sparklyr::spark_connect(
master = "yarn",
spark_home = "/opt/mapr/spark/spark",
config = config,
log = "console",
version = "2.4.4"
)
When I try to establish a H2O context using the next chunk I get the following error
h2o_context(sc)
Error in h2o_context(sc) : could not find function "h2o_context"
Any pointers as to where I'm going wrong would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
请参阅本教程。较新版本的 Rsparkling 使用 {H2OContext.getOrCreate(h2oConf)} 而不是 {h2o_context(sc)}。
See this tutorial please. The newer versions of Rsparkling use {H2OContext.getOrCreate(h2oConf)} instead of {h2o_context(sc)}.