运行时错误:无法在Spark中设置数据库! [dbt+火花+节约]
谁能帮我吗? 我会遇到错误,***运行时错误:在Spark!***
通过远程Hive Metastore运行DBT模型时无法设置数据库。
我需要使用Apache Spark作为适配器来改变DBT中的一些模型。现在,我在本地机器上本地运行Spark。 我从远程蜂巢Metastore URI开始了下面的旧服务服务器。
- 启动主人
./ sbin/start-master.sh
- 启动工人
./ sbin/start-worker.sh spark:// master_url:7077
- 启动thrift服务器
./ sbin/start-thriftserver.sh--主机spark:// master_url:7077 -packages org.apache.iceberg:Iceberg-Spark3-runtime:0.13.1 -hiveConf hive.metastore.uris = thrift = thrift:// ip:9083
我的DBT项目,
project_name: 输出: 开发人员: 主持人:Localhost 方法:节俭 端口:10000 架构:test_dbt 线程:4 类型:火花 用户:管理员 目标:开发
执行DBT运行时, 获取以下错误。
dbt run --select test -t dev
Running with dbt=1.1.0
Partial parse save file not found. Starting full parse.
Encountered an error:
Runtime Error
Cannot set database in spark!
没有太多信息
请注意,dbt.log 解决方案
由于源YML文件中提交的“数据库”,因此 。
总是模式,从不 Apache Spark可以互换使用术语“架构”和“数据库”。 DBT了解数据库的存在比模式更高。因此,在运行DBT Spark时,切勿将或将数据库设置为节点配置或目标配置文件。 https://docs.getdbt.com/参考/资源configs/spark-configs#始终schema-never-database
Can anyone help me on this?
I'm getting error,***Runtime Error: Cannot set database in spark!***
while running dbt model via Spark thrift mode with remote Hive metastore.
I need to transform some models in DBT using Apache Spark as the adapter. Now, I'm running spark locally on my local machine.
I started the thrift server as below with remote hive metastore URI.
- Started master
./sbin/start-master.sh
- Started worker
./sbin/start-worker.sh spark://master_url:7077
- Started Thrift Server
./sbin/start-thriftserver.sh --master spark://master_url:7077
--packages org.apache.iceberg:iceberg-spark3-runtime:0.13.1 --hiveconf hive.metastore.uris=thrift://ip:9083
In my DBT project,
project_name: outputs: dev: host: localhost method: thrift port: 10000 schema: test_dbt threads: 4 type: spark user: admin target: dev
While executing dbt run,
getting the following error.
dbt run --select test -t dev
Running with dbt=1.1.0
Partial parse save file not found. Starting full parse.
Encountered an error:
Runtime Error
Cannot set database in spark!
Please note that there is not much info in dbt.log
SOLUTION
This error was getting because of the " database" filed in the source yml file.
Always schema, never database
Apache Spark uses the terms "schema" and "database" interchangeably. dbt understands database to exist at a higher level than schema. As such, you should never use or set database as a node config or in the target profile when running dbt-spark.
https://docs.getdbt.com/reference/resource-configs/spark-configs#always-schema-never-database
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
架构 test_dbt 在蜂巢中不存在
我认为您需要在Hive
step1 中创建 test_dbt 数据库。登录以激发群集并停止旧服务服务器并运行Spark-SQL
step2 。创建数据库test_dbt
step3 。重新启动旧服务服务器
,或者
您可以使用默认架构如下
schema test_dbt not exist in the hive
I think you need to create test_dbt database in Hive
step1. log in to spark cluster and stop thrift server and run spark-sql
step2. create database test_dbt
step3. restart thrift server
OR
you can use default schema like below