合并的Azure Spark Delta表中的数据词未显示预期值

发布于 2025-01-29 13:24:35 字数 604 浏览 5 评论 0原文

我通过以下方式创建了一个简单的Synapse Delta Lake Table，

CREATE TABLE IF NOT EXISTS db1.tbl1 (
id INT NOT NULL,
name STRING NOT NULL
)
USING DELTA

我将一行数据合并到了它的多次中，其中“名称”的测试值不同。如果选择行，我会看到我的最新合并，例如：

+---+-------+
| id|   name|
+---+-------+
|  1|   adam|
|  2|   bob8|
|  3|charles|
+---+-------+

因此，我还使用“复制数据工具”有一条突触管道（将 *通配符用于文件），然后使用“替代键”字段的“替代键”字段配置在数据verse表上的水槽，用于“ upsert”到dataverse。

我已经多次测试了这一点，并且数据词表中的“名称”值似乎几乎随机地从我的一个合并中获取一个值。有人知道我在做什么错吗？我的定义源的方式似乎可以肯定可疑，只是为文件使用通配符，但我不知道该怎么做。表上的Pyspark SQL选择显示了正确的最新合并行，但是当尝试用Dataverse表凹陷时，我可能会做错了什么。

原文

I created a simple synapse delta lake table via:

CREATE TABLE IF NOT EXISTS db1.tbl1 (
id INT NOT NULL,
name STRING NOT NULL
)
USING DELTA

I've merged rows of data into it multiple times with different test values for 'name'. If I select the rows, I see my most recent merge as expected, e.g.:

+---+-------+
| id|   name|
+---+-------+
|  1|   adam|
|  2|   bob8|
|  3|charles|
+---+-------+

so, I also have a synapse pipeline using the 'copy data tool' that reads from the ADLS container and folder containing the parquet files for the synapse delta lake table (using a * wildcard for the files), then has a sink configured to the dataverse table using an 'alternate key name' of the 'id' field for the 'upsert' to dataverse.

I've tested this multiple times, and the 'name' value in the dataverse table seems to almost randomly take a value from one of my merges to it. anyone know what I'm doing wrong? The way I'm defining my source seems suspicious for sure, just using a wildcard for the files, but I don't know how else to do it. A pyspark sql select on the table shows the correct most recent merged rows, but I'm probably doing something wrong when trying to sink this with my dataverse table.

分享到QQ

分享到微博