将数据从 Azure Blob 存储移动到 Azure Synapse(Sql 专用池)
我需要将Azure Blob存储数据移至Azure Synapse(SQL专用池)。 Azure Blob存储容器具有约850 GB的数据(以多个JSON文件的形式)。我创建了一个突触管线。我已经使用Polybase将数据从BLOB存储转移到SQL专用池。如果使用多轴酶,我们将需要一个分阶段环境,我使用了一个分阶段的斑点容器。
Azure Blob存储 - >登台容器 - > SQL专用池(Azure Synapse)
我没有对DIU和并行处理任何限制,因此它使用32 diu,并且并行处理数字达到120-130。
第一阶段已在5小时内完成850GB的数据到分期容器,但第二阶段现在仍运行15个小时,但尚未完成,我可以看到的是2和并行处理1。
我是否需要明确指定DIU和并行处理。 除了polybase外,还有更好的方法吗?
I have a requirement to move Azure Blob Storage data to Azure Synapse (SQL dedicated pools).
Azure Blob Storage container has around 850 Gb of data(in form of multiple json files). I have created a Synapse pipeline . I have used polybase to move the data from blob storage to SQL dedicated pools. In case of Polybase we would need a staging environment for which i have used a staging blob container.
Azure Blob storage -> staging container -> SQL dedicated pool(Azure Synapse)
I have not kept any restrictions on DIU and parallel processing so it uses 32 DIU and parallel goes processing numbers goes upto 120-130 .
first stage is completed in 5 hrs moving 850gb of data to staging container but the second stage it still runs for 15 hours now but not yet completed and DIU i can see is 2 and parallel processing 1 .
Do i need to explicitly specify the DIU and parallel processing .
Is there any better way to do this except polybase.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的问题中缺少3个钥匙点:
根据最佳方法:
因此,您尚未提及您的方法是提取,转换和负载(ETL) 或提取,负载和转换( ELT);登台表的原因尚不清楚。如果您的方法是ETL,则分期是好的,不需要。
其次,分期表应该在SQL池中,而不是Blob存储中。
在将数据加载到登台表之前,您需要在数据仓库中定义外部表。 polybase用途外部表来定义和访问Azure存储中的数据。外部表类似于数据库视图。外部表包含表模式,并指向数据仓库之外存储的数据。
您还需要照顾资源类和分销方法。请参阅此第三方文章< /a>要了解这两个工作负载管理术语。
由于需要考虑很多参数才能找出问题到底是什么,我建议您首先浏览重要的官方文档,然后对架构进行适当的更改。
有用的链接:
设计用于Azure Synapse Analytics中专用SQL池的polybase数据加载策略
Workload Management在Azure Synapse Analytics中使用资源类别的Workload Management
“ https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/data-loading-best-best-practices”突触分析
There are 3 keys points missing from your question:
As per the best approach:
So, as you haven't mentioned if your approach is Extract, Transform and Load (ETL) or Extract, Load and Transform (ELT); reason for staging table isn't clear. If your approach is ETL, staging is good else it is not required.
Secondly, Staging Tables should be in SQL Pool, not in blob storage.
Before loading the data in Staging Table, you need to define external tables in your data warehouse. PolyBase uses external tables to define and access the data in Azure Storage. An external table is similar to a database view. The external table contains the table schema and points to data that is stored outside the data warehouse.
You also need to take care of resource class and distribution method. Refer this third-party article to know more about these 2 workload management terminologies.
As there are so many parameters you need to consider to find out what exactly the issue is, I suggest you to first go through important official documents and then make appropriate changes in your architecture.
Helpful links:
Design a PolyBase data loading strategy for dedicated SQL pool in Azure Synapse Analytics
Workload management with resource classes in Azure Synapse Analytics
Best practices for loading data into a dedicated SQL pool in Azure Synapse Analytics