从Azure函数应用程序写入Delta表
我正在实施一个3跳数据管道,将每一层数据都作为Azure存储帐户中的Delta表。当前,我使用数据工厂或功能应用程序将数据摄入JSON,并在Databricks中进行所有处理。
我的问题是:是否有任何.NET软件包可以将摄入的JSON文件从Azure功能应用程序直接编写为存储帐户Delta表?
I am implementing a 3-hop data pipeline holding every layer of data as delta tables in azure storage accounts. Currently I ingest data as json using either data factory or function apps, and do all processing in databricks.
My question is this: Is there any .Net package that enables writing the ingested json files from an Azure Function App directly to a Storage Account delta table?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
nuget软件包是:
问题在于,它只是此Java类,并与之交流通过IPC套接字。换句话说,它仅在安装火花发动机的机器上起作用。
从理论上讲,将所有依赖关系树(Spark,Java等)安装到基于.NET的Azure函数实例上是可能的,但不值得付出努力。更容易的是写Azure用另一种语言函数。例如这是一个python示例从Azure Service Bus队列到Delta Lake桌子的JSON-Formatted活动。可以相应地调整以获取来自其他任何地方的数据。
The NuGet package is this one: Microsoft.Spark.Extensions.Delta
The problem with it is that it's just a thin wrapper around this Java class, and communicates with it via an IPC socket. In other words, it only works on a machine with spark engine installed.
Installing all the dependency tree (spark, java, etc.) onto a .NET-based Azure Functions instance is theoretically possible, but isn't worth the effort. Much easier would be to write that Azure Function in another language. E.g. here is a python example, it pumps JSON-formatted events from an Azure Service Bus queue into a Delta Lake table. Can be adjusted accordingly to take data from anywhere else.