Azure Synapse,外部表或内部表的设计问题

发布于 2025-01-23 07:57:14 字数 342 浏览 1 评论 0原文

我正在使用SQL池设计在Azure Synapse中的Dataware House,但我面临一些设计问题。

上下文:我的计划是使用Azure Data Lake Storage(ADLS)加载分区的镶木木材文件,然后,使用SQL池创建外部表来查询这些文件。

我的问题是:

  • 仅使用外部表提供解决方案的性能就更好吗?也就是说,没有创建内部表,既没有CTA,BCP或复制方法,请从ADL到数据库中的存储。
  • 是否可以在外部表中执行分区?通过由日期命名的文件夹组织镶木木材是否足够?
  • 如何影响用户与外部表和内部表的并发?一些经验丰富的建议?

感谢您的时间。 乔什

I'm designing a Dataware house in Azure Synapse using SQL Pool, but I'm facing some design questions.

Context: My plan is to load Partitioned Parquet files using Azure Data Lake Storage (ADLS), then, with SQL pool create External Tables to query those files.

My questions are:

  • Is it better in terms of performance to provide the solution just with the external tables? that is, with no create internal tables neither CTAS, BCP, or copy methods from the ADLS to storage in the database.
  • Is it possible to perform partitioning in external tables? is it enough to organize the parquet by folders named by date?
  • How does affect the user concurrency to the external tables and the internal tables? some experienced recommendations?.

Thanks for your time.
Josh

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

沫尐诺 2025-01-30 07:57:14

仅使用外部表提供解决方案的性能就更好了吗?

否。内部表是分布式的列店,具有多个缓存级别,通常是表现外部镶木壁板表。内部表还支持批处理模式扫描,柱状订购,段消除,分区消除,实现的视图和结果集缓存。

可以在外部表中执行分区吗?

目前在专用的SQL池中不可能,请参见文件夹分区消除

如何影响用户与外部表和内部表的并发?

并发是查询性能的问题。查询执行的速度越快,速度更快地放弃了并发插槽。因此,任何提高查询性能的事物都可以提高有效并发性(通过合理的查询运行时可以支持的并发用户数)。

当前,无服务器SQL池具有更高级的功能,可以在数据湖中使用存储为Parquet或Delta的数据。

Is it better in terms of performance to provide the solution just with the external tables?

No. Internal Tables are distributed columnstores, with multiple levels of caching, and typically out-perform external parquet tables. Internal tables additionally support batch-mode scanning, columnstore ordering, segment elimination, partition elimination, materialized views, and resultset caching.

Is it possible to perform partitioning in external tables?

This is not currently possible in Dedicated SQL Pools, see Folder Partition Elimination

How does affect the user concurrency to the external tables and the internal tables?

Concurrency is a matter of query performance. The faster your queries perform, the faster sessions give up their concurrency slot. So anything that improves query performance improves the effective concurrency (the number of concurrent users you can support with reasonable query runtime).

Serverless SQL Pools currently have more advanced capabilities for working with data stored as Parquet or Delta in the Data Lake.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文