用分区的Azure外部存储复制

发布于 2025-02-13 11:51:13 字数 802 浏览 0 评论 0原文

我有一个带有分区式镶木文件的容器,我想将其与该副本一起使用到命令中。我的目录看起来像下面。

corporate_sales ==> Container 
dimension ===> Folder
region  ==> Sub Folder
99424-0191019-snappy.parquet
99434-0191020-snappy.parquet
salesperson ==> SubFolder
99425-0191021-snappy.parquet
99426-0191022-snappy.parquet
facts ==> Folder
sales
2022 ==> Year Sub Folder
12   ==> Month Sub Folder
01   ==> Day Sub Folder
SALES-99499-0191022-snappy.parquet

如何在Azure存储上创建最佳外部阶段? 这是正常的做法吗?

create stage my_azure_stage
  storage_integration = azure_int
  url = 'azure://myaccount.blob.core.windows.net/mycontainer/load/corporate_sales/'
  file_format = my_parquet_format;

我如何将复制中的模式参数正确使用到命令中以从我的blob存储中写入SF表?

在上面的示例中,我想将文件放在区域文件夹中,并动态加载到区域SF表中。

任何帮助将不胜感激。

I have a container with partitioned parquet files that I want to use with the copy into command. My directories look like the below.

corporate_sales ==> Container 
dimension ===> Folder
region  ==> Sub Folder
99424-0191019-snappy.parquet
99434-0191020-snappy.parquet
salesperson ==> SubFolder
99425-0191021-snappy.parquet
99426-0191022-snappy.parquet
facts ==> Folder
sales
2022 ==> Year Sub Folder
12   ==> Month Sub Folder
01   ==> Day Sub Folder
SALES-99499-0191022-snappy.parquet

How do I create an optimal External Stage on the Azure Storage ?
Is this normal practice ?

create stage my_azure_stage
  storage_integration = azure_int
  url = 'azure://myaccount.blob.core.windows.net/mycontainer/load/corporate_sales/'
  file_format = my_parquet_format;

How do I properly use the pattern parameter in the copy into command to write to a SF table from my Blob Storage?

In the above example I would like to take the files in the region folder and load into Region SF Table dynamically.

Any help would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

阳光的暖冬 2025-02-20 11:51:13

如果您的文件位于不同的文件夹下,并且无法设置模式以摄取它们,则可以为不同文件夹创建不同的存储集成,并指向具有文件的子文件夹。
您只会扫描那里的目录,而不是整个存储空间,这将减少延迟,还将优化成本。

以防您将文件加载到存储中,例如Day \ time等一些前缀。

您也可以使用分区,
url ='azure://myaccount.blob.core.windows.net/mycontainer/load/corporate_sales/'
这将扫描Corporate_Sales中的所有文件夹,并从每个文件夹加载文件。
因此,如果仅要添加区域文件夹文件,只需创建一个指向区域文件夹的存储集成,并且当您运行副本时,则将区域文件夹中的文件添加到SF表中。

希望这可以回答您的查询。

If your files are under different folders and you cannot set a pattern to ingest them, you can then create different storage integration for different folders and point to the sub-folders which have the files.
You will only be scanning the directories present there and not the entire storage, this will reduce the latency and will also optimize the cost.

You can also use partitioning in case you have files getting loaded into storage with some prefix like Day\time etc.

With the URL mentioned in your COPY INTO statement
url = 'azure://myaccount.blob.core.windows.net/mycontainer/load/corporate_sales/'
This will scan all the folders inside the Corporate_sales and load the files from each folder.
So if you only want Region folder files to be added just create a Storage Integration pointing to the Region folder and when you run COPY INTO, the files inside the Region folder will be added to SF table.

Hope this answers your query.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文