Databricks和Informatica Delta Lake Connector Spark配置

发布于 2025-01-25 19:12:15 字数 439 浏览 3 评论 0原文

我正在与Informatica数据集成器合作,并试图与数据链球集群建立连接。到目前为止,一切似乎都很好,但是一个问题是,在Spark配置下,我们必须为ADLS Gen 2存储帐户放置SAS密钥。

这样做的原因是,当Informatica试图写入Databricks时,首先必须将该数据写入ADLS Gen 2中的文件夹,然后Databricks基本上将该文件写入并将其写入Delta Lake表。

现在,一个问题是我们放置Spark配置的字段包含完整的SAS值(URL Plus plus令牌和密码)。除非我们只让1人成为管理员,否则这并不是一件好事。

有人使用Informatica和Databricks吗?是否可以将Spark配置放在文件中,然后将Informatica连接器读取该文件?还是可以将SAS密钥添加到Spark群集(我们使用的交互式群集)并让该群集从该文件读取信息?

感谢您对此的任何帮助。

I am working with Informatica Data Integrator and trying to set up a connection with a Databricks cluster. So far everything seems to work fine, but one issue is that under Spark configuration we had to put the SAS key for the ADLS gen 2 storage account.

The reason for this is that when Informatica tries to write to Databricks it first has to write that data into a folder in ADLS gen 2 and then Databricks essentially takes that file and writes it as a Delta Lake table.

Now one issue is that the field where we put the Spark config contains the full SAS value (url plus token and password). That is not really a good thing unless we only make 1 person an admin.

Did anyone work with Informatica and Databricks? Is it possible to put the Spark config as a file and then have the Informatica connector read that file? Or is it possible to add that SAS key to the Spark cluster (the interactive cluster we use) and have that cluster read the info from that file?

Thank you for any help with this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

深海少女心 2025-02-01 19:12:15

您确实不需要将SAS密钥值放入SPARK配置中,但是您需要将该值存储在 azure keyVault-baked-baked-baked-baked secret Scope (在azure上)或 DataBricks Secret Scope (在其他云中),然后使用Syntax {{{{{{{ secrets/< secret-scope-name>/< secret-key>}}}(请参阅 doc ) - 在这种情况下,群集开始时将读取SAS密钥值,并且可以访问的用户无法使用到集群UI。

You really don't need to put SAS key value into Spark configuration, but instead you need to store that value in the Azure KeyVault-baked secret scope (on Azure) or Databricks secret scope (in other clouds), and then refer to that value from Spark configuration using the syntax {{secrets/<secret-scope-name>/<secret-key>}} (see doc) - in this case, SAS key value will be read on the cluster start, and won't available to the users who have access to a cluster UI.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文