使用 databricks-connect 在工作区之间切换
是否可以使用 databricks-connect 切换工作区?
我目前正在尝试切换: spark.conf.set('spark.driver.host', cluster_config['host'])
但这会返回以下错误: AnalysisException:无法修改 Spark 配置的值:spark.driver.host
Is it possible to switch workspace with the use of databricks-connect?
I'm currently trying to switch with: spark.conf.set('spark.driver.host', cluster_config['host'])
But this gives back the following error:AnalysisException: Cannot modify the value of a Spark config: spark.driver.host
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您查看有关设置的文档client,然后您将看到配置 Databricks Connect 的三种方法:
databricks-connect configure
生成的配置文件 - 文件名始终为~/.databricks-connect
、DATABRICKS_ADDRESS
、DATABRICKS_API_TOKEN
、...spark.databricks.service .address
,spark.databricks.service.token
, ... 但是使用此方法时,Spark Session 可能已经初始化,因此您可能无法开启飞行,无需重新启动 Spark。但如果您使用不同的 DBR 版本,那么仅更改配置属性是不够的,您还需要切换包含相应版本的 Databricks Connect 发行版的 Python 环境。
对于我自己的工作,我编写了以下 Zsh 脚本,该脚本允许在不同设置(分片)之间轻松切换 - 但它允许一次仅使用一个分片。先决条件是:
-shard
创建 Python 环境databricks-connect
安装到激活的 conda 环境中:~/.databricks-connect-
文件中,该文件将符号链接到~/.databricks-connect
If you look into documentation on setting the client, then you will see that there are three methods to configure Databricks Connect:
databricks-connect configure
- the file name is always~/.databricks-connect
,DATABRICKS_ADDRESS
,DATABRICKS_API_TOKEN
, ...spark.databricks.service.address
,spark.databricks.service.token
, ... But when using this method, Spark Session could be already initialized, so you may not able switch on the fly, without restarting Spark.But if you use different DBR versions, then it's not enough to change configuration properties, you also need to switch Python environments that contains corresponding version of Databricks Connect distribution.
For my own work I wrote following Zsh script that allows easy switch between different setups (shards) - it allows to use only one shard at time although. Prerequisites are:
<name>-shard
databricks-connect
is installed into activated conda environment with:~/.databricks-connect-<name>
file that will be symlinked to~/.databricks-connect
我创建了一个简单的 python 脚本来更改
.databricks-connect
配置文件中的cluster_id
。要执行,请确保您的虚拟环境已配置环境变量 DATABRICKS_CLUSTER。 此处显示了获取集群 ID< /a> 在官方
databricks-connect
文档中。设置环境变量:
设置环境变量后,只要激活新的虚拟环境,只需使用以下 python 脚本即可切换集群。
I created a simple python script to change the
cluster_id
within the.databricks-connect
configuration file.To execute, ensure your virtual env has environment variable DATABRICKS_CLUSTER configured. Obtaining the cluster ID is shown here in the official
databricks-connect
documentation.Set the environment variable with:
Once the environment variable is set, simply use the following python script to switch cluster whenever your new virtual environment is activated.
它可能不会直接回答您的问题,但也可以使用 Visual Studio Databricks 插件,该插件将使用 databricks 连接,并且从那里可以很容易地切换到不同的环境。 https://marketplace.visualstudio.com/items?itemName=paiqo.databricks- vscode。
Probably it doesn't answer your question directly, but it's also possible to use Visual Studio Databricks plugin that will use databricks connect and from there is very easy to switch to different envs. https://marketplace.visualstudio.com/items?itemName=paiqo.databricks-vscode.