Polybase元数据查询
所有,
Polybase 都会每 1 分钟 +/- 运行一次此查询或查询的变体。这显示了它从中获取数据的源系统。虽然通常非常快,但如果在维护期间重建任何聚集索引,则会阻止此查询。
这似乎不是外部表调用的一部分,因为我已停止运行该作业,并且我们仍然看到这一点。有趣的是,它似乎只在启用了分区的表上调用(或者至少我没有捕获对未分区的表的查询)。
问题:
- 有人对 Polybase 在这里做什么有想法吗?
- 为什么需要每 1 分钟左右运行一次这个查询?
- 有没有办法在索引维护等过程中防止/减慢调用速度?
- 终止源系统上的进程有问题吗?
感谢帮助和意见。
SELECT PARTITION_FUNCTIONS.name
,QUOTENAME(COLUMNS.name)
,cast(PARTITIONS.partition_number AS NVARCHAR(10))
FROM "<DBName>".sys.indexes AS INDEXES
,"<DBName>".sys.partitions AS PARTITIONS
,"<DBName>".sys.index_columns AS INDEX_COLUMNS
,"<DBName>".sys.columns AS COLUMNS
,"<DBName>".sys.partition_functions AS PARTITION_FUNCTIONS
,"<DBName>".sys.partition_schemes AS PARTITION_SCHEMES
WHERE INDEXES.object_id = object_id(@0)
AND INDEXES.type IN (@1,@2,@3)
AND INDEX_COLUMNS.partition_ordinal = @4
AND INDEXES.object_id = PARTITIONS.object_id
AND INDEXES.index_id = PARTITIONS.index_id
AND INDEXES.object_id = INDEX_COLUMNS.object_id
AND INDEXES.index_id = INDEX_COLUMNS.index_id
AND INDEXES.data_space_id = PARTITION_SCHEMES.data_space_id
AND PARTITION_SCHEMES.function_id = PARTITION_FUNCTIONS.function_id
AND INDEXES.object_id = COLUMNS.object_id
AND INDEX_COLUMNS.column_id = COLUMNS.column_id```
All,
Polybase keeps running this query, or variation of, every 1 minute +/-. This shows on the source system it's getting it's data from. While usually very fast, if any clustered index is rebuilt during maintenance is blocks this query.
This doesn't appear to be part of the external table call as I've stopped the job running that and we are still seeing this. Interestingly, it appears to only be called on tables with partitioning enabled (or at least I haven't captured the query on tables that aren't partitioned).
Question:
- Anyone have thoughts on what polybase is doing here?
- Why it needs to run this query every 1 minute or so?
- Is there a way to prevent/slowdown the calls during stuff like index maintenance?
- Issues with killing the process on the source system?
Appreciate the help and input.
SELECT PARTITION_FUNCTIONS.name
,QUOTENAME(COLUMNS.name)
,cast(PARTITIONS.partition_number AS NVARCHAR(10))
FROM "<DBName>".sys.indexes AS INDEXES
,"<DBName>".sys.partitions AS PARTITIONS
,"<DBName>".sys.index_columns AS INDEX_COLUMNS
,"<DBName>".sys.columns AS COLUMNS
,"<DBName>".sys.partition_functions AS PARTITION_FUNCTIONS
,"<DBName>".sys.partition_schemes AS PARTITION_SCHEMES
WHERE INDEXES.object_id = object_id(@0)
AND INDEXES.type IN (@1,@2,@3)
AND INDEX_COLUMNS.partition_ordinal = @4
AND INDEXES.object_id = PARTITIONS.object_id
AND INDEXES.index_id = PARTITIONS.index_id
AND INDEXES.object_id = INDEX_COLUMNS.object_id
AND INDEXES.index_id = INDEX_COLUMNS.index_id
AND INDEXES.data_space_id = PARTITION_SCHEMES.data_space_id
AND PARTITION_SCHEMES.function_id = PARTITION_FUNCTIONS.function_id
AND INDEXES.object_id = COLUMNS.object_id
AND INDEX_COLUMNS.column_id = COLUMNS.column_id```
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
查询外部表元数据来识别字段、数据类型和长度是很正常的,但 Polybase 只有两次这样做,一次是在外部表创建时,一次是在查询数据时。如果您每分钟都看到它有变化,请检查:
如果您无法确定在何处触发它,则应在不使用时删除外部表,以避免索引维护问题,这比创建一个进程来无限期终止这些会话更好。
您可以在我的书“Polybase 数据虚拟化实践”中找到有关 Polybase 如何工作的更多信息。
It is normal to query the external table metadata to identify the fields, data types and lengths, but there are only two times when Polybase does this, at the external table creation, and when querying the data. If you're seeing it every minute with variations, check:
If you can't identify where it is being triggered, you should drop your external table while not in use to avoid issues with the index maintenance, better than creating a process to kill these sessions indefinitely.
You can find additional information about how Polybase works in my book "Hands-on data virtualization with Polybase".