通过数据流中的Azure数据工厂的数据压缩
我们有一个Azure数据工厂管道,该管道执行一个简单的数据流,该数据流从 cosmosdb 中获取数据,并在数据湖中汇集。作为目标优化逻辑,我们将分区类型用作键和唯一的价值分区作为cosmosdb 标识符。目的地数据集还具有压缩类型作为 gzip 和压缩级别
问题:
数据按预期进行分区,但我们看不到创建的文件的压缩。这是预期的行为还是错误?有人可以帮忙吗?
We have a Azure Data Factory Pipeline which executes a simple Data Flow which takes data from cosmosdb and sinks in Data Lake . As destination Optimize logic , we are using Partition Type as Key and unique value partition as a cosmosdb identifier. The destination Dataset also has a compression type as gzip and compression level to Fastest
Problem:
The data is partitioned as expected but we do not see the compression on the files created. Is this the expected behavior or is it a bug ? Can some one please help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您应该将压缩级别更改为:最佳
这将需要更多时间来执行,但它可以保证您的文件将在目标数据源中被压缩。
Microsoft文档中写的:
检查此链接: https://learn.microsoft.com/en-us/azure/data-factory/supported-file-file-formats-and-compression-compression-codecs-legacy
i think you should change your compression level to : Optimal
that will take more time to execute but it will guarantee that your files will be compressed in the destination data source.
as written in Microsoft docs :
check this link : https://learn.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs-legacy